I was checking out Data.gov and found the following data set via My Brother’s Keeper. I downloaded the Excel file and found the following graphs:

Striking, isn’t it?

I was checking out Data.gov and found the following data set via My Brother’s Keeper. I downloaded the Excel file and found the following graphs:

Striking, isn’t it?

…are the worst. OK, maybe not **the** worst but they’ve gotta be in the top 10 list of Things Most People Would Rather Not Do. That being said, I have learned something from every interview: a statistical procedure or software package that I was not familiar with (e.g., EpiInfo).

The biggest lesson so far? There are things that a statistician should be able to recite in his/her sleep. I was asked a very straight-forward question: what are the assumptions for general linear models? It’s like the First Rule of Fight Club and I stumbled through it. I chalk it up to nervousness and lack of sleep.

Anyway, the assumptions for general linear models (using OLS estimation) are:

Also, here’s an interesting take on the normality assumption of the response variable (Y):

The assumptions of normality and homogeneity of variance for linear models are

notabout Y, the dependent variable…The distributional assumptions for linear regression and ANOVA are for the distribution of Y|X — that’s Y given X. You have to takeoutthe effects of all the Xsbeforeyou look at the distribution of Y…I’ve seen too many researchers drive themselves crazy trying to transform skewed Y distributions before they’ve even run the model…

I remember doing several transformations of Y for a take-home final. A few hours and several cups of coffee later, I had it: 1/(y^3). Was it all for naught? Oh well. Builds character.

Before my next interview, I’m going to go through my notes, write out the ultimate (or penultimate) statistical review sheet, and post it on my GitHub repository. Don’t make the same mistake I made! Be able to recite and explain the basics.