Discussion about stat consulting, stat packages, applied statistics and data analysis by statistical consultants of Academic Technology Services at UCLA. Please leave your comments or send us email at stattalk at ats.ucla.edu

Tuesday, April 10, 2007

We Feel Like A Million

We have just run the stats on our web server and are pleased to announce that we achieved over one million hits last month. In March, we had 1,003,103 total hits. These million hits are the total numbers of pages hit at our website after we subtract out all of the hits generated in our lab and by our personal office machines.

This was not our first million hit month, we also had one in March of 2006. Since then we have been hovering around 900,000 hits per month.

As of the 1st of April, the total cumulative hits since mid-1999 is just over 37 million. If you look at the results by stat package, the most popular pages are on SAS followed by Stata and then SPSS with the remaining specialty packages trailing behind.

In the last three months the SAS pages had 939,953 hits, Stata 793,129 hits and SPSS with 518,087 hits. For the specialty packages Mplus lead the pack with 66,685 hits in the last three months. The next closest was HLM with 18,473.

One of the more surprising findings was that our limited Splus and R pages managed 39,963 hits since January 1st.

All-in-all not bad results for a website that consists entirely of geeky stat stuff.

pbe

Wednesday, April 4, 2007

Roadtrip

This past Saturday the all the ATS Stat Consultants piled into the Statmobile and went on a roadtrip to the 26th Annual Workshop in Applied Statistics put on by the Southern California Chapter of the American Statistical Association. Fortunately, it was a short drive since the meeting was held on the UCLA campus in the Bradley International Center.

This year's speaker was Bengt Muthén talking on recent developments in statistical analysis with latent variables. The presentation went into how the idea of latent variables captures a wide variety of statistical concepts, including random effects, sources of variation in hierarchical data, frailties, missing data, finite mixtures, latent classes and clusters.

The presentation began a little after 9 am with cross sectional models and finished around 4:30 pm somewhere in longitudinal models. The presentation moved along nicely thanks in part to Professors Muthén's subtle Swedish sense of humor. Although there was no hands-on component the crowd got into the swing of things during the lively question and answer periods. Even with a whole day discussing these topics the material covered was only a fraction of what Professor Muthén usually covers in his five-day workshop.

I'm sure the conference would have run much later but many wanted to get home to see the UCLA-Florida basketball game. Too bad the game didn't go as well as the conference did.

pbe

Tuesday, March 20, 2007

Another Little Gem

Here's another great little free program, G*Power 3. It does many of the common anova power analyses but also includes manova and Hotelling's T-squared. Throw in multiple regression and you have a pretty useful package.

There are versions for both windows and macs. This in itself is pretty unusual.

The authors are Franz Faul, Edgar Erdfelder and Axel Buchner. They all appear to be psychologists but that is no reason to avoid G*Power. I highly recommend it. The website is at Heinrich Heine University Dusseldorf and here is the link.

pbe

Tuesday, March 6, 2007

What The Heck(man) Is Going On?

We had a client come in with a question about a Heckman selection model that was giving her trouble. She had run it several weeks earlier and everything was working fine. She had some missing data among her predictors and decided to do a multiple imputation. After imputing the missing data she got the following error message:

     Dependent variable never censored due to selection.


She couldn't figure out what was wrong until I asked her if she had also imputed the response (dependent) variable. Instantly, she realized what the problem was. Since she had imputed all the variables in her dataset, there were no longer any missing values on her response variable and therefore no way to estimate a selection model.

pbe

Friday, March 2, 2007

Parallel Universe

I am a Mac user. This creates problems in that there are several stat packages that only run on Windows. For the past couple of years the solution has been for me to Timbuktu into a campus computer to use Windows based software. This solution is relatively slow and clunky.

That was then, this is now. I have installed Parallels Desktop on my relatively new MacBook Pro. Then, after a few hours installing Windows and a few more hours installing stat software (I had problems installing SAS), I'm all set to go. I now run Mplus and SAS on my Mac. And actually, they run pretty fast as long as I close down some of the larger programs on the Mac side. Overall, my throughput is much faster than when I was using Timbuktu. So, for now, I am a happy camper working in a parallel universe.

pbe

Saturday, February 24, 2007

The Parent Trap

Here is a question emailed to us by a client:
I have 4 variables from which I want to create a new variable.

v1 = do you live with mom? (0, 1)
v2 = do you live with dad?
v3 = do you live with stepmother?
v4 = do you live with stepfather?

I want to create a variable PCOMP and code
1 if live with mom and dad
2 if live with mom and stepfather
3 if live with stepmother and dad
4 if live with mom only
5 if live with dad only
6 Other

I'd also like to identify who is from a two-parent household.

How do I do this?

There are many ways this could be handled, here is the one I suggested:

pcomp = v1 + 10*v2 + 100*v3 + 1000*v4

recode pcomp 11=1 1001=2 110=3 1=4 10=5 else=6

two_parent = pcomp < 4

Note: This is in pseudo-code and can be adapted to any stat package that supports recode. Also, the client will have to decide how to deal with missing values as this approach will not work with missing values for v1-v4.

pbe

Wednesday, February 21, 2007

Power Series

Today we had the second in our series of three power analysis presentations. Jason Cole (UCLA alum and outside consultant) talked in depth about effect size, the proper alpha level and multiplicity issues to an audience of 42. Within the next week, we hope to post the audio portion of Jason's presentation along with a pdf of his powerpoint slides.

Last week, for Christine's presentation, we had almost sixty people in attendance. This created problems because the Visualization Portal is not meant to hold that many people. For this week's presentation we had to have people register on-line to limit the number attending.

We are looking forward to another big crowd for our third power analysis presentation in two weeks.

pbe

Contributors