Discussion about stat consulting, stat packages, applied statistics and data analysis by statistical consultants of Academic Technology Services at UCLA. Please leave your comments or send us email at stattalk at ats.ucla.edu

Friday, December 15, 2006

A Special Anova

I had a client who came in with an anova question in Stata. She wanted to run a model with one between-subject factor with six levels and three within-subject factors, each with two levels. The model has seven explicit error terms, each with 30 degrees of freedom, not including the residual error which also has 30 df. The reason that there are so many error terms is that each within-subject factor has its own error term as do each of the interaction combinations of the within-subjects factors. In all, the model used 257 degrees of freedom along with another 30 df for the residual error.

We started with regular Intercooled Stata and immediately got a "too many variables or values (matsize too small error)." I set the matsize to 800, which is the maximum for Intercooled Stata and ran it again. This time it just said, "too many variables or values." Nothing about the matsize. I read in the manual that the limit for anova was eight variables in a single term. This model came close but didn't exceed that limit. Nothing we tried could get it to run.

I gave up on Stata for the moment and flipped the data into SAS using StatTransfer. Using proc glm, it ran perfectly the first time (this doesn't happen for me very often). The client was not familiar with SAS and really wanted output in Stata. I thought maybe it would run using Stata/SE. The SE in Stata/SE stands for special edition (I think) and allows a matsize up to 11,000. I got on one of the computers that had SE, set the memory to 100m and matsize to 1200 (a value I thought would be way too big), and ran it again. It worked fine, producing all the F-tests, the conservative p-values, the covariance matrix, everything.

So, what was going on? Why wouldn't it run in Intercooled Stata? A little bit of investigation revealed the answer. When I manually code a design matrix for anova, I use as many columns as their are degrees of freedom. However, Stata uses an over parameterized design matrix, it uses a many columns as there are parameters. Consider one of the with-subjects effect B*C and its error term, B*C*blocks nested in A (in Stata written as B*C*blocks|A). I would give one df for B*C and 30 df for B*C*blocks|A. Stata with its over parameterized model allocates 4 columns for B*C and 196 for the error term. In total the design needed a matsize of 1184 in order to run. So I really wasn't that far off with my wild guess of 1200.

This situation shows how quickly the matsize can grow for these mixed-effects models.

pbe

No comments:

Contributors