Discussion about stat consulting, stat packages, applied statistics and data analysis by statistical consultants of Academic Technology Services at UCLA. Please leave your comments or send us email at stattalk at ats.ucla.edu

Tuesday, January 30, 2007

It's Always The Semicolon

If you've ever taught SAS or even used SAS much you will recall that the most common error is to leave out a semicolon. Its always the semicolon. I wish I had remembered this fact today when I was working with a client who was doing a power analysis.

This was not a difficult question at all. It involved finding the power of a correlation for a given sample size. SAS's Proc Power is generally very straight forward and easy to use. I hadn't used it for correlation before so I looked the command up in the online SAS documentation. There was a simple example that I modified for my client. The problem was, it didn't work. I tried a bunch of things but it just wouldn't run.

Finally, I copied the example straight from the documentation, pasted it into the SAS editor and ran it. Except, it didn't run either. Here is what the example looks like:

proc power;
onecorr dist=fisherz
null = 0.15
corr = 0.35
ntotal = 180
power = .
run;


Now remember, this is a straight cut-and-paste from SAS' own documentation. There was one hint though, SAS kept on claiming the Proc Power was still running. But once it encounters a run statement with a semicolon, it should stop running. This means that there was a missing semicolon. Looking at Proc Power for multiple regression showed that there needs to be a semicolon before the run statement (at the end of the power statement).

There last two lines should look like this:

power = . ;
run;
So, if you're having so problems with SAS, just remember, it's always the semicolon. Well, almost always.

pbe

Tuesday, January 23, 2007

Minitab, My Bad.

Back in We Got Mail (Part 2) I stated that we do not have a copy of Minitab. Legendary stat consultant Michael Mitchell sent me an email to let me know that, yes indeed, we do have a copy of Minitab 14 on our server. I was wrong, my bad. So, I decided to sit down and try it out for myself.

It uses pull down menus and dialog boxes which remind me a bit of SPSS. The small examples that I tried ran very quickly. I didn't read the documentation (I'm a Mac user) but I explored the menus and tried as many things as I could. I did notice that Minitab has very strong quality control procedures, but that it was lacking in areas like poisson and negative binomial regression, instrumental variable (2SLS) models and mixed models with random slopes and intercepts.

I am not a very big fan of point-and-click interfaces for stat programs. I know that Minitab can be run using syntax or commands because it is possible to run it in batch mode. However, when running interactively it is not readily apparent how to make it respond directly to commands.

So, while Minitab is a very nice program, I don't think that it would meet the needs of many of the researchers on campus. It does look like it would be a good program for students to use in thier coursework. It isn't used much at ucla because faculty teach using the stat software they use in their own reaserch. I know it really isn't fair to judge a program based on such a quick examination but Minitab doesn't seem to add anything to the "big" three (SAS, SPSS and Stata) that we already use.

pbe

Friday, January 19, 2007

Control Groups Gone Wild

This story goes back to my early days at ucla many years ago. I had a student that was studying test anxiety. She proposed dividing students, who were preparing to take their final exams in freshman science courses, into three groups. Group 1 would receive training in test taking skills, while Group 2 would meet in small groups and talk about their fears and how they deal with test anxiety. Group 3 was a do-nothing control group.

My clever idea was to create a placebo control group, that is, a group that receives some kind of treatment that is completely unrelated to test anxiety. To this end, I made a recording on cassette tape of various beeps coming out of my microwave oven. The tape was five minutes long and was played to groups of students who were told that the frequencies of the beeps were designed to influence brain wave patterns to reduce anxiety over the forth coming tests.

I won't keep you in suspense but you can probably guess the what the results were. The placebo control group had the greatest reduction in anxiety. In fact, the placebo group was significantly lower than either the test taking skills group (Group 1) or the talk therapy group (Group 2). Groups 1 and 2 were not significantly different from each other but both showed significant reduction in test anxiety over the do nothing control group.

My best guess is that we did too good of a job in selling the placebo control to the subjects. All was not lost however, we have been selling copies of the tape for over 25 years as an anxiety reduction treatment.

pbe

Thursday, January 18, 2007

Plus ça change, plus ça change

Yes, its true, the more things change, the more things change. Last October our staffing changed from 4 1/2 stat consultants to 3 1/2 when long time consultant Michael Mitchell left ucla to do secret government work. Actually, he is doing research and data analysis for the Veterans Administration. At the end of this month we will lose our half-time biostat doctoral student Brad McEvoy, who will be devoting his full attention to his dissertation and research work with his advisor.

This reduces us to just three stat consultants for a while. A new full-time consultant should be joining the staff soon, possibly by the middle of February. And, we have just posted a job announcement for a new half-time person. Hopefully, when all the new people are on board and up to speed, things will return to so called normality.

pbe

Monday, January 8, 2007

We Got Mail (Part 2)

I thought that I would talk a bit today about which stat packages are used by the ATS Statistical Consulting Group and how we select that software.

The statistical software that we support is determined by which software is used by researchers here on the UCLA campus. Just to be clear on things, there is no central administrator or committee decides which statistics software is used on campus. Each researcher, research group or department decides on their own what software they want to use.

Let's begin with general purpose statistical packages. The big three on our campus are SAS, SPSS and Stata. SAS and SPSS are legacies from the mainframe days. When I started at UCLA the top three were BMD, SAS and SPSS. BMD has dropped by the wayside and Stata has come on strong in the last seven or eight years. In addition to research usage, many departments teach their methodology courses using one or more of these stat packages.

After the big three there is R which has a smaller but very strong following. We do not get many people coming into consulting asking for help using R. I think this is due to the fact that many of the R users are relatively advanced and do not need a lot of consulting assistance. Another reason is that our group does not have lot expertise in using R. Since we don't get many question we haven't developed the expertise needed to support R at the level it deserves.

After SAS, SPSS, Stata and R there are a number of statistical packages that have a small number of users. The numbers are too small for us to invest the time and effort needed to support the software. These packages include JMP, StatView, Statistica and Datadesk. If I had written this a week ago, I would have said that there aren't any Minitab users on campus but one walked in last Thursday with a question. Fortunately, it was a more general statistical question and not something specific to Minitab. We don't have Minitab and we don't know how use it.

As for JMP, I do have a copy and have played with it some. It does some things very nicely, in fact, it does some things easily that are difficult in other packages. My opinion, based primarily on total lack of esperience, is that it might not be the best package with which to manage and analyze large research databases. If JMP grows in popularity on campus and achieves a significant number of users we would support it along with the other stat packages.

We also occasionally get a MATLAB user coming in with a question. Most of these users seem to be writing their own data analysis programs and so we can only provide limited help. I tend to lump MATLAB, Maple and Mathematica together as programming environments as opposed to traditional data analysis programs which is not to say that you can't do data analysis with them.

Then there are the special purpose statistical software packages including Mplus, EQS, LISREL, HLM, MLwiN, SUDAAN, LIMDEP, WinBUGS, LEM and LatentGOLD. We try to know a little bit about each of these, where the strengths and weaknesses lie. Different consultants in our group have differing levels of skills in these programs so clients may have to wait a bit to talk to someone with more specialized knowledge.

I need to mention one program that is invaluable to us, StatTransfer. In our line of work we could not function without this program. It allows us to move data seamlessly from stat package to another. I believe that DBMS/Copy functions in a similar manner.

Finally, I need to talk a bit about the cost of statistical software. Its expensive, even at the university discount rate. Some people think that because we are with UCLA that either we get everything for free or that we have an unlimited budget for software. Not so. Well, not completely so. In the interest of full disclosure: We do get one courtesy copy of Stata and a number of licenses for SPSS in some years. For the big three and several of the specialized packages we need licenses for all our consultants and for our lab machines. For other stat software we may only have a single copy, such as, JMP. Or, even no copies, as in the case of Minitab. Every time someone suggestion a new statistics program we need to determine if our budget can accommodate it. There is a lot of interesting statistics software that we cannot justify purchasing.

Well, that's the story on what we use in our Stat Consulting Group. I am working on a new blog entry tentatively titled, "What's the Best Stat Package." Look for it in a week or so.

pbe

Wednesday, January 3, 2007

We Got Mail (Part 1)

Here are two emails that we have received with comments on various statistical packages. The first email is only a few days old, while the second one goes back to last April. Part 1 contains just the emails themselves. In Part 2 we will post our comments. It will take several days to write Part 2 so that we can think of clever things to say.

James Peluso of Nassau Community College writes:

I'm fortunate to have the following packages on my home laptop:

Stata 9
JMP 6
SAS 9 (from my primary job)
Minitab 14
Maple 9

I'm like a kid in a candy store. I bought STATA thanks to your analysis, and I'm very happy with it. So I thank you for that.

I'd like to mention a terrific capability of JMP (I've used JMP for over 10 years). If you call for histograms for all of the categorical variables at the same time, you'll get a pop-up window with all of the histograms (from left to right).

Now if you click on a bar on one of the histograms, the corresponding values in the histogram bars in all of the other variables get highlighted.

JMP calls this whole process "dynamic linking"... I've used this feature countless times. It allows an analyst to quickly SEE relationships between variables.

Additionally, if you then go back to the original dataset, the corresponding records will be highlighted. This will allow the user the option to quickly create a separate dataset, which only has those records.


I think that the graphics in SAS are much better than before, thanks to ODS. But you are correct: STATA graphics are terrific.

I've just started using Minitab in my Applied Statistics course... I really like it.

I wonder if you will be analyzing Minitab in a future update to your great article?


And in response to our old podcasts Bob Solimeno of International Paper send in this email:

I just recently got an ipod and found your podcast! I enjoyed listening to the 7 podcasts published, and understand you (collectively) teach statistics with the software discussed. However, as a scientist in the corporate world many of us use Minitab or even MATLAB which gives us much more than a dedicated statistics package typically does.

Would these be possibilities for future podcasts? I would be very keen to hear your reviews of the statistics capabilities of these in contrast to MPlus, SPSS, SAS, and Stata. I realize that as educators you need to focus on a few packages and that teaching the software supplants, to some degree, the statistics curriculum. So I'll understand if my requests asks for coverage that is too broad for your podcasts.

Contributors