Well, the first West Coast Stata User Group meeting is now history, and by all reports, it was a success. There were lots of interesting talks, some about Stata, and some just about statistical issues. In addition to the ATS Stat Consulting group, the meeting was attended by many graduate students, researchers, several Stata employees, and some authors of Stata Press books. Compared to the user group meetings for other stat packages, this meeting may seem small with only about 60 attendees per day. However, many people found the small size to be a big benefit, because there were many opportunities to chat with everyone there. At the end of the first day (it was a two day event), many people went to dinner and enjoyed good food and conversations. The second day ended with the popular "Wishes and Grumbles" session, during which users could tell members of the Stata team about their ideas for improvements or additions to Stata.
If you would like to see the slides of this year's presentations, please see http://www.stata.com/meeting/wcsug07/abstracts.html . If you didn't make it to this year's meeting, you may want to think about attending the next one. The meeting was a lot of fun, and there were plenty of interesting things to learn!
-crw
Discussion about stat consulting, stat packages, applied statistics and data analysis by statistical consultants of Academic Technology Services at UCLA. Please leave your comments or send us email at stattalk at ats.ucla.edu
Wednesday, November 7, 2007
Thursday, October 4, 2007
A new tool for the toolbelt
Well, the Fall Quarter has just begun so I guess I better get writing again.
We are waiting with great anticipation the release of the program SuperMix from Scientific Software International. This looks to be a very useful tool for data analysts.
Basically, SuperMix performs mixed-effects models for continuous, binary, ordinal, nominal and count response variables. It can handle two- and three-level models with random intercepts and random slopes. Although SuperMix is a new product, it is really just the combination of four programs written by Donald Hedeker and Robert Gibbons (both of University of Illinois, Chicage) that were previously available online. SuperMix combines MIXREG, MIXOR, MIXNO, and MIXPREG into one package with a consistent user interface.
The webpage for SuperMix indicates that its planned release is September 2007. Since it is a bit past that time, we are looking the the official release anytime now.
pbe
We are waiting with great anticipation the release of the program SuperMix from Scientific Software International. This looks to be a very useful tool for data analysts.
Basically, SuperMix performs mixed-effects models for continuous, binary, ordinal, nominal and count response variables. It can handle two- and three-level models with random intercepts and random slopes. Although SuperMix is a new product, it is really just the combination of four programs written by Donald Hedeker and Robert Gibbons (both of University of Illinois, Chicage) that were previously available online. SuperMix combines MIXREG, MIXOR, MIXNO, and MIXPREG into one package with a consistent user interface.
The webpage for SuperMix indicates that its planned release is September 2007. Since it is a bit past that time, we are looking the the official release anytime now.
pbe
Friday, September 28, 2007
Free research resources?!
This is just a short little note to let people know that we have compiled a page of free, research-related resources for UCLA researchers. This page can be found at http://www.ats.ucla.edu/stat/research_resources.htm . This page is actually a modification of a flier that we put together to let graduate students know about the free, research-related resources that are available to them. The flier (in .pdf form) is linked from the web page. If you know of any other research-related resources that are free and available to all UCLA researchers or to all UCLA graduate students, please email us and let us know. Research in general, and graduate school in particular, are tough enough, and a little free help is always welcome!
-crw
-crw
Tuesday, September 18, 2007
What's the meaning of this?
If you have ever wondered "what's the meaning of this?" when looking at your output, help may be on the way. We have added lots of new annotated outputs to our web site. We have also reorganized the listing of these pages to make it easier for you to find what you're looking for. We have even added an Annotated Outputs link to our home page. The analyses annotated on these pages are the same analyses that are discussed on our Data Analysis Examples pages, so you can learn about an analysis on the Data Analysis Examples page and then learn more about each part of the output on the corresponding annotated output page. If you don't see an annotated output page for a particular analysis, please check back periodically. We are continuing to add new annotated outputs (and Data Analysis Examples) pages. We hope that you find these pages useful!
-crw
-crw
Friday, August 10, 2007
Where did all the consultants go?
Consulting was closed for most of last week, and some of you may have wondered why. This wasn’t our typical end of the qaurter “dark week.” Instead, three of us (Xiao, Christine and Rose) went to the major North American gathering for our tribe, the Joint Statistical Meetings in Salt Lake City, Utah. Now, Salt Lake City in early August may not sound like paradise to everyone, but hidden away from the heat in a convention center, we enjoyed three days of learning, discussing and networking. We even got to see other statisticians dance, and we were surprised at the number who actually do so quite well.
So what goodies did we bring back to share with all of you and each other? I (Rose) took a full day course on adjusting for multiple hypothesis tests, which introduced me to a lot of new (to me) techniques for dealing with multiplicity. I still have some homework to do, but this should be really useful when working with clients who are running a large number of significance tests. Christine enjoyed talks on the use of the ACS (the American Community Survey, which is replacing the long form of the Census), methods for handling missing data in survey non-response, imputation methods in survey research, and finding out about the new features in SUDAAN 9.2 (which should be out any day now).
When asked about the conference, Xiao said, "I really enjoyed the sessions, especially some of the invited sessions, such as the session on causality, with Paul Holland, Judea Peal and Donald Rubin as the three speakers. The round table lunch with Professor James Robins from the School of Public Health at Harvard University was extremely educational and entertaining. I also enjoyed the opportunities to talk to some really awesome statisticians from SAS and Mplus. I learned a lot from them. I (Xiao) also enjoyed my little afternoon trips, such as visiting the University of Utah by tram, having a glimpse of the gleaming Salt Lake and riding the tram to Sandy, watching the great Wasatch mountains go by from afar."
So that’s where we were and what we were doing.
r.a.m.
So what goodies did we bring back to share with all of you and each other? I (Rose) took a full day course on adjusting for multiple hypothesis tests, which introduced me to a lot of new (to me) techniques for dealing with multiplicity. I still have some homework to do, but this should be really useful when working with clients who are running a large number of significance tests. Christine enjoyed talks on the use of the ACS (the American Community Survey, which is replacing the long form of the Census), methods for handling missing data in survey non-response, imputation methods in survey research, and finding out about the new features in SUDAAN 9.2 (which should be out any day now).
When asked about the conference, Xiao said, "I really enjoyed the sessions, especially some of the invited sessions, such as the session on causality, with Paul Holland, Judea Peal and Donald Rubin as the three speakers. The round table lunch with Professor James Robins from the School of Public Health at Harvard University was extremely educational and entertaining. I also enjoyed the opportunities to talk to some really awesome statisticians from SAS and Mplus. I learned a lot from them. I (Xiao) also enjoyed my little afternoon trips, such as visiting the University of Utah by tram, having a glimpse of the gleaming Salt Lake and riding the tram to Sandy, watching the great Wasatch mountains go by from afar."
So that’s where we were and what we were doing.
r.a.m.
Wednesday, June 27, 2007
Seemingly Unrelated News
The first copies of Stata 10 arrived on Monday while I was out of town. The copies for our lab and my Mac arrived on Wednesday around noon. At 3pm, a client comes into consulting who is analyzing MRI data on several different probes over 300 time points. She has been doing non-linear modeling separately for each of the probes, however, it is the same exponential model in each case. The client wants to test whether the parameters estimated by the exponential models are statistically different or not. This turns out to be a perfect job for the new nlsur (non-linear seemingly unrelated regression) command. Ten minutes of work and it is up running. Its fun to play with new toys.
pbe
pbe
Wednesday, June 13, 2007
The Best of Stata 10
Its dangerous to try to pick out the best new features of a software package before you actually get your hands on it but I'm feeling in a daring mood. I chose features that I have either been waiting for or features that will make my statistical life easier. Since I'm the Mac guy around here anything that allows me to avoid windows software will make my life easier.
First feature: Stata 10 will have a mechanism for dealing with strata with singleton PSUs. This will make life much easier because it is a common occurance among our clients.
Second feature: Stata 10 has a new command for multilevel logit models. This can, of course, already be done in -gllamm- but it be interesting to see if it runs faster and is easier to use than -gllamm-. The HLM people will be including this in their SuperMixed program that comes out in the fall.
Third feature: Stata 10 has a new exact logistic estimation command. No more having to run LogExact in windows. I hope.
Bonus feature: Stata 10 finally gets a full discriminant analysis procedure. I know this is not on top of everybody's wish list but I like discriminant analysis and find it useful in interpreting some maonva's. Further, I will get to retire my -daoneway- ado program.
So these are my pick's, what are your's?
pbe
First feature: Stata 10 will have a mechanism for dealing with strata with singleton PSUs. This will make life much easier because it is a common occurance among our clients.
Second feature: Stata 10 has a new command for multilevel logit models. This can, of course, already be done in -gllamm- but it be interesting to see if it runs faster and is easier to use than -gllamm-. The HLM people will be including this in their SuperMixed program that comes out in the fall.
Third feature: Stata 10 has a new exact logistic estimation command. No more having to run LogExact in windows. I hope.
Bonus feature: Stata 10 finally gets a full discriminant analysis procedure. I know this is not on top of everybody's wish list but I like discriminant analysis and find it useful in interpreting some maonva's. Further, I will get to retire my -daoneway- ado program.
So these are my pick's, what are your's?
pbe
Monday, June 4, 2007
Stata 10
Stata Corp announced today that it will release version 10.0 on June 25th. There was a long list of new features and analyses, many of which have been long awaited by the Stata faithful. I don't really want to get into the new stuff in this blog, instead I want to discuss how ATS Stat Consulting deals with major new software releases.
First off, I am not sure when we will be receiving our copies of the software. The timing of this release is not optimal for our organization because our fiscal year ends on June 30th and the purchasing database shuts down several weeks before that. Furthermore, once the new fiscal year starts you can't order stuff right away because all the finance and business people are involved in preparing end of year fiscal closing reports. So, I'm not sure when we will have our hands on the software.
Basically, ATS Stat Consulting looks at new versions in terms of what web pages need to be revised, new pages that need to be created, and possibly pages that need to be removed.
Let's start with revised pages. When Stata changes how an existing command works, we need to update every page that uses the command. The biggest changes in our short history occurred when Stata did a massive overhaul of the graphics commands in Stata 8. Stata 9 also required numerous revisions, due in large part to the expanded use of prefix commands. On the surface it doesn't look like Stata 10 will require as many revisions, although there are some changes in options and features for some commands.
The tricky part here is that we have to show both the old and the new until most of our users have migrated to Stata 10.
New pages are required for new commands. There are quite a few of these. In particular the new graph editor will require pages showing how it works. We will also have to develop pages and live presentations demonstrating the best features of Stata 10.
It is not a all clear as to whether many or any pages will need to be removed. I will remove pages that are related to my -daoneway- (discriminant analysis) program since Stata 10 will provide several ways of doing disciminant analysis, but they will be replaced by pages for the new built-in procedures -discrim lda- and -candisc-.
We will be so busy in July and August with all the web stuff that I will hardly have time to work on my Stata 11 wish list.
Update: 6/8/07 -- Looks like we managed to get our purchase order in just under the wire before fiscal closing. Now its just a matter of waiting for delivery.
pbe
First off, I am not sure when we will be receiving our copies of the software. The timing of this release is not optimal for our organization because our fiscal year ends on June 30th and the purchasing database shuts down several weeks before that. Furthermore, once the new fiscal year starts you can't order stuff right away because all the finance and business people are involved in preparing end of year fiscal closing reports. So, I'm not sure when we will have our hands on the software.
Basically, ATS Stat Consulting looks at new versions in terms of what web pages need to be revised, new pages that need to be created, and possibly pages that need to be removed.
Let's start with revised pages. When Stata changes how an existing command works, we need to update every page that uses the command. The biggest changes in our short history occurred when Stata did a massive overhaul of the graphics commands in Stata 8. Stata 9 also required numerous revisions, due in large part to the expanded use of prefix commands. On the surface it doesn't look like Stata 10 will require as many revisions, although there are some changes in options and features for some commands.
The tricky part here is that we have to show both the old and the new until most of our users have migrated to Stata 10.
New pages are required for new commands. There are quite a few of these. In particular the new graph editor will require pages showing how it works. We will also have to develop pages and live presentations demonstrating the best features of Stata 10.
It is not a all clear as to whether many or any pages will need to be removed. I will remove pages that are related to my -daoneway- (discriminant analysis) program since Stata 10 will provide several ways of doing disciminant analysis, but they will be replaced by pages for the new built-in procedures -discrim lda- and -candisc-.
We will be so busy in July and August with all the web stuff that I will hardly have time to work on my Stata 11 wish list.
Update: 6/8/07 -- Looks like we managed to get our purchase order in just under the wire before fiscal closing. Now its just a matter of waiting for delivery.
pbe
Sunday, June 3, 2007
More Control Groups Gone Wild
We had a client, come in to consulting recently, who was studying people receiving treatment in mental health clinics. He classified these patients into three groups; Group 1) individuals who had a personal history of depression, group 2) individuals with a family history of depression and Group 3) individuals with no history of depression. The last group was the control group. The outcome variable was a binary indicator, whether or not they had experienced a depressive episode since their last visit to the clinic.
The problem was that the control group did not experience any depressive episodes. This, in turn, creates a problem for logistic regression. There was an error message indicating that group 3 not equal zero predicts failure perfectly. And, instead of two degrees of freedom for group there was only one degree of freedom (comparing Group 1 versus Group 2) and no coefficient for Group 3 versus Group 1.
This could be dealt with by changing one response score in Group 3 at random from zero to one. There was a further complexity however. Each individual in each of the three groups was measured on 12 occasions, i.e., once a month for a year. And during those twelve months none of the individuals in the control group ever experienced a depressive episode. Since change over time was one of the research questions, it didn't seem right to randomly chance responses in the control group to one.
In the end, there just wasn't not any useful information available from Group 3. It was clearly one more case of control groups gone wild.
pbe
The problem was that the control group did not experience any depressive episodes. This, in turn, creates a problem for logistic regression. There was an error message indicating that group 3 not equal zero predicts failure perfectly. And, instead of two degrees of freedom for group there was only one degree of freedom (comparing Group 1 versus Group 2) and no coefficient for Group 3 versus Group 1.
This could be dealt with by changing one response score in Group 3 at random from zero to one. There was a further complexity however. Each individual in each of the three groups was measured on 12 occasions, i.e., once a month for a year. And during those twelve months none of the individuals in the control group ever experienced a depressive episode. Since change over time was one of the research questions, it didn't seem right to randomly chance responses in the control group to one.
In the end, there just wasn't not any useful information available from Group 3. It was clearly one more case of control groups gone wild.
pbe
Monday, May 14, 2007
Real Missing Data
We have a client, a doctoral student, who is close to finishing her dissertation. Over the weekend her house was burglarized and all of the family's computers, external hard drives and USB drives, along with other valuables, were taken. She has lost all of her dissertation data, her original proposal and all her chapter drafts.
She has offered a reward, no questions asked, checked with pawn shops and is looking on eBay. But it is too soon for things to show up. So, there is no happy ending to this story yet. The only bright spot is that she had visited our consulting office last week and there were copies of most of her data on one of our temp drives.
pbe
She has offered a reward, no questions asked, checked with pawn shops and is looking on eBay. But it is too soon for things to show up. So, there is no happy ending to this story yet. The only bright spot is that she had visited our consulting office last week and there were copies of most of her data on one of our temp drives.
pbe
Tuesday, April 10, 2007
We Feel Like A Million
We have just run the stats on our web server and are pleased to announce that we achieved over one million hits last month. In March, we had 1,003,103 total hits. These million hits are the total numbers of pages hit at our website after we subtract out all of the hits generated in our lab and by our personal office machines.
This was not our first million hit month, we also had one in March of 2006. Since then we have been hovering around 900,000 hits per month.
As of the 1st of April, the total cumulative hits since mid-1999 is just over 37 million. If you look at the results by stat package, the most popular pages are on SAS followed by Stata and then SPSS with the remaining specialty packages trailing behind.
In the last three months the SAS pages had 939,953 hits, Stata 793,129 hits and SPSS with 518,087 hits. For the specialty packages Mplus lead the pack with 66,685 hits in the last three months. The next closest was HLM with 18,473.
One of the more surprising findings was that our limited Splus and R pages managed 39,963 hits since January 1st.
All-in-all not bad results for a website that consists entirely of geeky stat stuff.
pbe
This was not our first million hit month, we also had one in March of 2006. Since then we have been hovering around 900,000 hits per month.
As of the 1st of April, the total cumulative hits since mid-1999 is just over 37 million. If you look at the results by stat package, the most popular pages are on SAS followed by Stata and then SPSS with the remaining specialty packages trailing behind.
In the last three months the SAS pages had 939,953 hits, Stata 793,129 hits and SPSS with 518,087 hits. For the specialty packages Mplus lead the pack with 66,685 hits in the last three months. The next closest was HLM with 18,473.
One of the more surprising findings was that our limited Splus and R pages managed 39,963 hits since January 1st.
All-in-all not bad results for a website that consists entirely of geeky stat stuff.
pbe
Wednesday, April 4, 2007
Roadtrip
This past Saturday the all the ATS Stat Consultants piled into the Statmobile and went on a roadtrip to the 26th Annual Workshop in Applied Statistics put on by the Southern California Chapter of the American Statistical Association. Fortunately, it was a short drive since the meeting was held on the UCLA campus in the Bradley International Center.
This year's speaker was Bengt Muthén talking on recent developments in statistical analysis with latent variables. The presentation went into how the idea of latent variables captures a wide variety of statistical concepts, including random effects, sources of variation in hierarchical data, frailties, missing data, finite mixtures, latent classes and clusters.
The presentation began a little after 9 am with cross sectional models and finished around 4:30 pm somewhere in longitudinal models. The presentation moved along nicely thanks in part to Professors Muthén's subtle Swedish sense of humor. Although there was no hands-on component the crowd got into the swing of things during the lively question and answer periods. Even with a whole day discussing these topics the material covered was only a fraction of what Professor Muthén usually covers in his five-day workshop.
I'm sure the conference would have run much later but many wanted to get home to see the UCLA-Florida basketball game. Too bad the game didn't go as well as the conference did.
pbe
This year's speaker was Bengt Muthén talking on recent developments in statistical analysis with latent variables. The presentation went into how the idea of latent variables captures a wide variety of statistical concepts, including random effects, sources of variation in hierarchical data, frailties, missing data, finite mixtures, latent classes and clusters.
The presentation began a little after 9 am with cross sectional models and finished around 4:30 pm somewhere in longitudinal models. The presentation moved along nicely thanks in part to Professors Muthén's subtle Swedish sense of humor. Although there was no hands-on component the crowd got into the swing of things during the lively question and answer periods. Even with a whole day discussing these topics the material covered was only a fraction of what Professor Muthén usually covers in his five-day workshop.
I'm sure the conference would have run much later but many wanted to get home to see the UCLA-Florida basketball game. Too bad the game didn't go as well as the conference did.
pbe
Tuesday, March 20, 2007
Another Little Gem
Here's another great little free program, G*Power 3. It does many of the common anova power analyses but also includes manova and Hotelling's T-squared. Throw in multiple regression and you have a pretty useful package.
There are versions for both windows and macs. This in itself is pretty unusual.
The authors are Franz Faul, Edgar Erdfelder and Axel Buchner. They all appear to be psychologists but that is no reason to avoid G*Power. I highly recommend it. The website is at Heinrich Heine University Dusseldorf and here is the link.
pbe
There are versions for both windows and macs. This in itself is pretty unusual.
The authors are Franz Faul, Edgar Erdfelder and Axel Buchner. They all appear to be psychologists but that is no reason to avoid G*Power. I highly recommend it. The website is at Heinrich Heine University Dusseldorf and here is the link.
pbe
Tuesday, March 6, 2007
What The Heck(man) Is Going On?
We had a client come in with a question about a Heckman selection model that was giving her trouble. She had run it several weeks earlier and everything was working fine. She had some missing data among her predictors and decided to do a multiple imputation. After imputing the missing data she got the following error message:
She couldn't figure out what was wrong until I asked her if she had also imputed the response (dependent) variable. Instantly, she realized what the problem was. Since she had imputed all the variables in her dataset, there were no longer any missing values on her response variable and therefore no way to estimate a selection model.
pbe
Dependent variable never censored due to selection.
She couldn't figure out what was wrong until I asked her if she had also imputed the response (dependent) variable. Instantly, she realized what the problem was. Since she had imputed all the variables in her dataset, there were no longer any missing values on her response variable and therefore no way to estimate a selection model.
pbe
Friday, March 2, 2007
Parallel Universe
I am a Mac user. This creates problems in that there are several stat packages that only run on Windows. For the past couple of years the solution has been for me to Timbuktu into a campus computer to use Windows based software. This solution is relatively slow and clunky.
That was then, this is now. I have installed Parallels Desktop on my relatively new MacBook Pro. Then, after a few hours installing Windows and a few more hours installing stat software (I had problems installing SAS), I'm all set to go. I now run Mplus and SAS on my Mac. And actually, they run pretty fast as long as I close down some of the larger programs on the Mac side. Overall, my throughput is much faster than when I was using Timbuktu. So, for now, I am a happy camper working in a parallel universe.
pbe
That was then, this is now. I have installed Parallels Desktop on my relatively new MacBook Pro. Then, after a few hours installing Windows and a few more hours installing stat software (I had problems installing SAS), I'm all set to go. I now run Mplus and SAS on my Mac. And actually, they run pretty fast as long as I close down some of the larger programs on the Mac side. Overall, my throughput is much faster than when I was using Timbuktu. So, for now, I am a happy camper working in a parallel universe.
pbe
Saturday, February 24, 2007
The Parent Trap
Here is a question emailed to us by a client:
There are many ways this could be handled, here is the one I suggested:
pcomp = v1 + 10*v2 + 100*v3 + 1000*v4
recode pcomp 11=1 1001=2 110=3 1=4 10=5 else=6
two_parent = pcomp < 4
Note: This is in pseudo-code and can be adapted to any stat package that supports recode. Also, the client will have to decide how to deal with missing values as this approach will not work with missing values for v1-v4.
pbe
I have 4 variables from which I want to create a new variable.
v1 = do you live with mom? (0, 1)
v2 = do you live with dad?
v3 = do you live with stepmother?
v4 = do you live with stepfather?
I want to create a variable PCOMP and code
1 if live with mom and dad
2 if live with mom and stepfather
3 if live with stepmother and dad
4 if live with mom only
5 if live with dad only
6 Other
I'd also like to identify who is from a two-parent household.
How do I do this?
There are many ways this could be handled, here is the one I suggested:
pcomp = v1 + 10*v2 + 100*v3 + 1000*v4
recode pcomp 11=1 1001=2 110=3 1=4 10=5 else=6
two_parent = pcomp < 4
Note: This is in pseudo-code and can be adapted to any stat package that supports recode. Also, the client will have to decide how to deal with missing values as this approach will not work with missing values for v1-v4.
pbe
Wednesday, February 21, 2007
Power Series
Today we had the second in our series of three power analysis presentations. Jason Cole (UCLA alum and outside consultant) talked in depth about effect size, the proper alpha level and multiplicity issues to an audience of 42. Within the next week, we hope to post the audio portion of Jason's presentation along with a pdf of his powerpoint slides.
Last week, for Christine's presentation, we had almost sixty people in attendance. This created problems because the Visualization Portal is not meant to hold that many people. For this week's presentation we had to have people register on-line to limit the number attending.
We are looking forward to another big crowd for our third power analysis presentation in two weeks.
pbe
Last week, for Christine's presentation, we had almost sixty people in attendance. This created problems because the Visualization Portal is not meant to hold that many people. For this week's presentation we had to have people register on-line to limit the number attending.
We are looking forward to another big crowd for our third power analysis presentation in two weeks.
pbe
Wednesday, February 14, 2007
A Rose By Any Other Name...
UCLA Academic Technology Services Statistical Consulting is pleased to announce that Rose Medeiros starts work today as our new full-time stat consultant. She comes to us from the University of New Hampshire where she is completing her PhD in sociology. She just spent the last five days driving cross country with her mother and her one-eyed cat named Bentley.
Rose has experience using SPSS, Stata, HLM, Mplus and sometimes R. She "enjoys" multilevel modeling, longitudinal data analysis and structural equation modeling.
And, of course, Rose is "famous" for the Stata ado-program -njc- that provides timely quotes from the statistically-minded geographer Nicholas J Cox (note: the -njc- program does not use maximum likelihood estimation).
Please drop by walk-in consulting and say "Hi," to Rose.
pbe
Rose has experience using SPSS, Stata, HLM, Mplus and sometimes R. She "enjoys" multilevel modeling, longitudinal data analysis and structural equation modeling.
And, of course, Rose is "famous" for the Stata ado-program -njc- that provides timely quotes from the statistically-minded geographer Nicholas J Cox (note: the -njc- program does not use maximum likelihood estimation).
Please drop by walk-in consulting and say "Hi," to Rose.
pbe
Friday, February 9, 2007
A Little Gem of a Program
A tip by legendary stat consultant Michael Mitchell pointed us to a little gem of a program, Optimal Design.
Optimal Design does power analysis for a wide variety of multilevel level and repeated measures designs. The program is a product of the Survey Research Center of the Institute of Social Research at the University of Michigan. The program is fast, easy to use, and the best part, it's free. You can find out about it at http://sitemaker.umich.edu/group-based/optimal_design_software.
This program comes at an opportune time for us since ATS Stat Consulting will be giving the first of three presentations on power analysis starting next Wednesday, February 14, 2007 at 10am. The other two presentations will be Feb 14th and Mar 7th.
pbe
Optimal Design does power analysis for a wide variety of multilevel level and repeated measures designs. The program is a product of the Survey Research Center of the Institute of Social Research at the University of Michigan. The program is fast, easy to use, and the best part, it's free. You can find out about it at http://sitemaker.umich.edu/group-based/optimal_design_software.
This program comes at an opportune time for us since ATS Stat Consulting will be giving the first of three presentations on power analysis starting next Wednesday, February 14, 2007 at 10am. The other two presentations will be Feb 14th and Mar 7th.
pbe
Tuesday, January 30, 2007
It's Always The Semicolon
If you've ever taught SAS or even used SAS much you will recall that the most common error is to leave out a semicolon. Its always the semicolon. I wish I had remembered this fact today when I was working with a client who was doing a power analysis.
This was not a difficult question at all. It involved finding the power of a correlation for a given sample size. SAS's Proc Power is generally very straight forward and easy to use. I hadn't used it for correlation before so I looked the command up in the online SAS documentation. There was a simple example that I modified for my client. The problem was, it didn't work. I tried a bunch of things but it just wouldn't run.
Finally, I copied the example straight from the documentation, pasted it into the SAS editor and ran it. Except, it didn't run either. Here is what the example looks like:
Now remember, this is a straight cut-and-paste from SAS' own documentation. There was one hint though, SAS kept on claiming the Proc Power was still running. But once it encounters a run statement with a semicolon, it should stop running. This means that there was a missing semicolon. Looking at Proc Power for multiple regression showed that there needs to be a semicolon before the run statement (at the end of the power statement).
There last two lines should look like this:
pbe
This was not a difficult question at all. It involved finding the power of a correlation for a given sample size. SAS's Proc Power is generally very straight forward and easy to use. I hadn't used it for correlation before so I looked the command up in the online SAS documentation. There was a simple example that I modified for my client. The problem was, it didn't work. I tried a bunch of things but it just wouldn't run.
Finally, I copied the example straight from the documentation, pasted it into the SAS editor and ran it. Except, it didn't run either. Here is what the example looks like:
proc power;
onecorr dist=fisherz
null = 0.15
corr = 0.35
ntotal = 180
power = .
run;
Now remember, this is a straight cut-and-paste from SAS' own documentation. There was one hint though, SAS kept on claiming the Proc Power was still running. But once it encounters a run statement with a semicolon, it should stop running. This means that there was a missing semicolon. Looking at Proc Power for multiple regression showed that there needs to be a semicolon before the run statement (at the end of the power statement).
There last two lines should look like this:
So, if you're having so problems with SAS, just remember, it's always the semicolon. Well, almost always.
power = . ;
run;
pbe
Tuesday, January 23, 2007
Minitab, My Bad.
Back in We Got Mail (Part 2) I stated that we do not have a copy of Minitab. Legendary stat consultant Michael Mitchell sent me an email to let me know that, yes indeed, we do have a copy of Minitab 14 on our server. I was wrong, my bad. So, I decided to sit down and try it out for myself.
It uses pull down menus and dialog boxes which remind me a bit of SPSS. The small examples that I tried ran very quickly. I didn't read the documentation (I'm a Mac user) but I explored the menus and tried as many things as I could. I did notice that Minitab has very strong quality control procedures, but that it was lacking in areas like poisson and negative binomial regression, instrumental variable (2SLS) models and mixed models with random slopes and intercepts.
I am not a very big fan of point-and-click interfaces for stat programs. I know that Minitab can be run using syntax or commands because it is possible to run it in batch mode. However, when running interactively it is not readily apparent how to make it respond directly to commands.
So, while Minitab is a very nice program, I don't think that it would meet the needs of many of the researchers on campus. It does look like it would be a good program for students to use in thier coursework. It isn't used much at ucla because faculty teach using the stat software they use in their own reaserch. I know it really isn't fair to judge a program based on such a quick examination but Minitab doesn't seem to add anything to the "big" three (SAS, SPSS and Stata) that we already use.
pbe
It uses pull down menus and dialog boxes which remind me a bit of SPSS. The small examples that I tried ran very quickly. I didn't read the documentation (I'm a Mac user) but I explored the menus and tried as many things as I could. I did notice that Minitab has very strong quality control procedures, but that it was lacking in areas like poisson and negative binomial regression, instrumental variable (2SLS) models and mixed models with random slopes and intercepts.
I am not a very big fan of point-and-click interfaces for stat programs. I know that Minitab can be run using syntax or commands because it is possible to run it in batch mode. However, when running interactively it is not readily apparent how to make it respond directly to commands.
So, while Minitab is a very nice program, I don't think that it would meet the needs of many of the researchers on campus. It does look like it would be a good program for students to use in thier coursework. It isn't used much at ucla because faculty teach using the stat software they use in their own reaserch. I know it really isn't fair to judge a program based on such a quick examination but Minitab doesn't seem to add anything to the "big" three (SAS, SPSS and Stata) that we already use.
pbe
Friday, January 19, 2007
Control Groups Gone Wild
This story goes back to my early days at ucla many years ago. I had a student that was studying test anxiety. She proposed dividing students, who were preparing to take their final exams in freshman science courses, into three groups. Group 1 would receive training in test taking skills, while Group 2 would meet in small groups and talk about their fears and how they deal with test anxiety. Group 3 was a do-nothing control group.
My clever idea was to create a placebo control group, that is, a group that receives some kind of treatment that is completely unrelated to test anxiety. To this end, I made a recording on cassette tape of various beeps coming out of my microwave oven. The tape was five minutes long and was played to groups of students who were told that the frequencies of the beeps were designed to influence brain wave patterns to reduce anxiety over the forth coming tests.
I won't keep you in suspense but you can probably guess the what the results were. The placebo control group had the greatest reduction in anxiety. In fact, the placebo group was significantly lower than either the test taking skills group (Group 1) or the talk therapy group (Group 2). Groups 1 and 2 were not significantly different from each other but both showed significant reduction in test anxiety over the do nothing control group.
My best guess is that we did too good of a job in selling the placebo control to the subjects. All was not lost however, we have been selling copies of the tape for over 25 years as an anxiety reduction treatment.
pbe
My clever idea was to create a placebo control group, that is, a group that receives some kind of treatment that is completely unrelated to test anxiety. To this end, I made a recording on cassette tape of various beeps coming out of my microwave oven. The tape was five minutes long and was played to groups of students who were told that the frequencies of the beeps were designed to influence brain wave patterns to reduce anxiety over the forth coming tests.
I won't keep you in suspense but you can probably guess the what the results were. The placebo control group had the greatest reduction in anxiety. In fact, the placebo group was significantly lower than either the test taking skills group (Group 1) or the talk therapy group (Group 2). Groups 1 and 2 were not significantly different from each other but both showed significant reduction in test anxiety over the do nothing control group.
My best guess is that we did too good of a job in selling the placebo control to the subjects. All was not lost however, we have been selling copies of the tape for over 25 years as an anxiety reduction treatment.
pbe
Thursday, January 18, 2007
Plus ça change, plus ça change
Yes, its true, the more things change, the more things change. Last October our staffing changed from 4 1/2 stat consultants to 3 1/2 when long time consultant Michael Mitchell left ucla to do secret government work. Actually, he is doing research and data analysis for the Veterans Administration. At the end of this month we will lose our half-time biostat doctoral student Brad McEvoy, who will be devoting his full attention to his dissertation and research work with his advisor.
This reduces us to just three stat consultants for a while. A new full-time consultant should be joining the staff soon, possibly by the middle of February. And, we have just posted a job announcement for a new half-time person. Hopefully, when all the new people are on board and up to speed, things will return to so called normality.
pbe
This reduces us to just three stat consultants for a while. A new full-time consultant should be joining the staff soon, possibly by the middle of February. And, we have just posted a job announcement for a new half-time person. Hopefully, when all the new people are on board and up to speed, things will return to so called normality.
pbe
Monday, January 8, 2007
We Got Mail (Part 2)
I thought that I would talk a bit today about which stat packages are used by the ATS Statistical Consulting Group and how we select that software.
The statistical software that we support is determined by which software is used by researchers here on the UCLA campus. Just to be clear on things, there is no central administrator or committee decides which statistics software is used on campus. Each researcher, research group or department decides on their own what software they want to use.
Let's begin with general purpose statistical packages. The big three on our campus are SAS, SPSS and Stata. SAS and SPSS are legacies from the mainframe days. When I started at UCLA the top three were BMD, SAS and SPSS. BMD has dropped by the wayside and Stata has come on strong in the last seven or eight years. In addition to research usage, many departments teach their methodology courses using one or more of these stat packages.
After the big three there is R which has a smaller but very strong following. We do not get many people coming into consulting asking for help using R. I think this is due to the fact that many of the R users are relatively advanced and do not need a lot of consulting assistance. Another reason is that our group does not have lot expertise in using R. Since we don't get many question we haven't developed the expertise needed to support R at the level it deserves.
After SAS, SPSS, Stata and R there are a number of statistical packages that have a small number of users. The numbers are too small for us to invest the time and effort needed to support the software. These packages include JMP, StatView, Statistica and Datadesk. If I had written this a week ago, I would have said that there aren't any Minitab users on campus but one walked in last Thursday with a question. Fortunately, it was a more general statistical question and not something specific to Minitab. We don't have Minitab and we don't know how use it.
As for JMP, I do have a copy and have played with it some. It does some things very nicely, in fact, it does some things easily that are difficult in other packages. My opinion, based primarily on total lack of esperience, is that it might not be the best package with which to manage and analyze large research databases. If JMP grows in popularity on campus and achieves a significant number of users we would support it along with the other stat packages.
We also occasionally get a MATLAB user coming in with a question. Most of these users seem to be writing their own data analysis programs and so we can only provide limited help. I tend to lump MATLAB, Maple and Mathematica together as programming environments as opposed to traditional data analysis programs which is not to say that you can't do data analysis with them.
Then there are the special purpose statistical software packages including Mplus, EQS, LISREL, HLM, MLwiN, SUDAAN, LIMDEP, WinBUGS, LEM and LatentGOLD. We try to know a little bit about each of these, where the strengths and weaknesses lie. Different consultants in our group have differing levels of skills in these programs so clients may have to wait a bit to talk to someone with more specialized knowledge.
I need to mention one program that is invaluable to us, StatTransfer. In our line of work we could not function without this program. It allows us to move data seamlessly from stat package to another. I believe that DBMS/Copy functions in a similar manner.
Finally, I need to talk a bit about the cost of statistical software. Its expensive, even at the university discount rate. Some people think that because we are with UCLA that either we get everything for free or that we have an unlimited budget for software. Not so. Well, not completely so. In the interest of full disclosure: We do get one courtesy copy of Stata and a number of licenses for SPSS in some years. For the big three and several of the specialized packages we need licenses for all our consultants and for our lab machines. For other stat software we may only have a single copy, such as, JMP. Or, even no copies, as in the case of Minitab. Every time someone suggestion a new statistics program we need to determine if our budget can accommodate it. There is a lot of interesting statistics software that we cannot justify purchasing.
Well, that's the story on what we use in our Stat Consulting Group. I am working on a new blog entry tentatively titled, "What's the Best Stat Package." Look for it in a week or so.
pbe
The statistical software that we support is determined by which software is used by researchers here on the UCLA campus. Just to be clear on things, there is no central administrator or committee decides which statistics software is used on campus. Each researcher, research group or department decides on their own what software they want to use.
Let's begin with general purpose statistical packages. The big three on our campus are SAS, SPSS and Stata. SAS and SPSS are legacies from the mainframe days. When I started at UCLA the top three were BMD, SAS and SPSS. BMD has dropped by the wayside and Stata has come on strong in the last seven or eight years. In addition to research usage, many departments teach their methodology courses using one or more of these stat packages.
After the big three there is R which has a smaller but very strong following. We do not get many people coming into consulting asking for help using R. I think this is due to the fact that many of the R users are relatively advanced and do not need a lot of consulting assistance. Another reason is that our group does not have lot expertise in using R. Since we don't get many question we haven't developed the expertise needed to support R at the level it deserves.
After SAS, SPSS, Stata and R there are a number of statistical packages that have a small number of users. The numbers are too small for us to invest the time and effort needed to support the software. These packages include JMP, StatView, Statistica and Datadesk. If I had written this a week ago, I would have said that there aren't any Minitab users on campus but one walked in last Thursday with a question. Fortunately, it was a more general statistical question and not something specific to Minitab. We don't have Minitab and we don't know how use it.
As for JMP, I do have a copy and have played with it some. It does some things very nicely, in fact, it does some things easily that are difficult in other packages. My opinion, based primarily on total lack of esperience, is that it might not be the best package with which to manage and analyze large research databases. If JMP grows in popularity on campus and achieves a significant number of users we would support it along with the other stat packages.
We also occasionally get a MATLAB user coming in with a question. Most of these users seem to be writing their own data analysis programs and so we can only provide limited help. I tend to lump MATLAB, Maple and Mathematica together as programming environments as opposed to traditional data analysis programs which is not to say that you can't do data analysis with them.
Then there are the special purpose statistical software packages including Mplus, EQS, LISREL, HLM, MLwiN, SUDAAN, LIMDEP, WinBUGS, LEM and LatentGOLD. We try to know a little bit about each of these, where the strengths and weaknesses lie. Different consultants in our group have differing levels of skills in these programs so clients may have to wait a bit to talk to someone with more specialized knowledge.
I need to mention one program that is invaluable to us, StatTransfer. In our line of work we could not function without this program. It allows us to move data seamlessly from stat package to another. I believe that DBMS/Copy functions in a similar manner.
Finally, I need to talk a bit about the cost of statistical software. Its expensive, even at the university discount rate. Some people think that because we are with UCLA that either we get everything for free or that we have an unlimited budget for software. Not so. Well, not completely so. In the interest of full disclosure: We do get one courtesy copy of Stata and a number of licenses for SPSS in some years. For the big three and several of the specialized packages we need licenses for all our consultants and for our lab machines. For other stat software we may only have a single copy, such as, JMP. Or, even no copies, as in the case of Minitab. Every time someone suggestion a new statistics program we need to determine if our budget can accommodate it. There is a lot of interesting statistics software that we cannot justify purchasing.
Well, that's the story on what we use in our Stat Consulting Group. I am working on a new blog entry tentatively titled, "What's the Best Stat Package." Look for it in a week or so.
pbe
Wednesday, January 3, 2007
We Got Mail (Part 1)
Here are two emails that we have received with comments on various statistical packages. The first email is only a few days old, while the second one goes back to last April. Part 1 contains just the emails themselves. In Part 2 we will post our comments. It will take several days to write Part 2 so that we can think of clever things to say.
James Peluso of Nassau Community College writes:
And in response to our old podcasts Bob Solimeno of International Paper send in this email:
James Peluso of Nassau Community College writes:
I'm fortunate to have the following packages on my home laptop:
Stata 9
JMP 6
SAS 9 (from my primary job)
Minitab 14
Maple 9
I'm like a kid in a candy store. I bought STATA thanks to your analysis, and I'm very happy with it. So I thank you for that.
I'd like to mention a terrific capability of JMP (I've used JMP for over 10 years). If you call for histograms for all of the categorical variables at the same time, you'll get a pop-up window with all of the histograms (from left to right).
Now if you click on a bar on one of the histograms, the corresponding values in the histogram bars in all of the other variables get highlighted.
JMP calls this whole process "dynamic linking"... I've used this feature countless times. It allows an analyst to quickly SEE relationships between variables.
Additionally, if you then go back to the original dataset, the corresponding records will be highlighted. This will allow the user the option to quickly create a separate dataset, which only has those records.
I think that the graphics in SAS are much better than before, thanks to ODS. But you are correct: STATA graphics are terrific.
I've just started using Minitab in my Applied Statistics course... I really like it.
I wonder if you will be analyzing Minitab in a future update to your great article?
And in response to our old podcasts Bob Solimeno of International Paper send in this email:
I just recently got an ipod and found your podcast! I enjoyed listening to the 7 podcasts published, and understand you (collectively) teach statistics with the software discussed. However, as a scientist in the corporate world many of us use Minitab or even MATLAB which gives us much more than a dedicated statistics package typically does.
Would these be possibilities for future podcasts? I would be very keen to hear your reviews of the statistics capabilities of these in contrast to MPlus, SPSS, SAS, and Stata. I realize that as educators you need to focus on a few packages and that teaching the software supplants, to some degree, the statistics curriculum. So I'll understand if my requests asks for coverage that is too broad for your podcasts.
Subscribe to:
Posts (Atom)