Sample Size…what's the deal?
The paradigm
You have deadlines. It's part of your life as the resident concept-test guru. You have a panel of 1000 consumers from your client base and you estimate a response rate of 50% when using your online survey software tool. That gives you somewhere around 500 respondents… Is this enough to make a sound product-launch decision? You call up your statistician friend and ask "How do I compute a sample size?" and he or she will invariably respond "it depends" with a gleeful look in the eye. Statisticians (I admit, I am one of them) love that "I know something that you don't" feeling.
After describing your problem to your friend, he or she returns to the statistician cave. A few minutes later, he or she returns with some astronomical figure, saying "At most, you will need (your current sample size * some large positive number)". You gawk in befuddled amazement and hang up the phone with lingering disgust in your voice.
Welcome to the never-ending battle between the financially feasible and the statistically significant.
The Problem
First, a few definitions:
- Sampling Error: the degree to which your responses don't carry the true opinions of the population of interest.
- Population: the group you're trying to learn about. In the concept-test case, this is the consumer group that you are trying to sell to. Remember that this is not the group that takes your survey, but the group upon which you would like to draw inferences.
Your statistician friend says "at most" because he or she knows no more about the sampling error you'll pick up in your respondent panel than you do. However he or she does know that by making a few reasonable assumptions, a sample-size can easily be computed.
The Basic Solution
First, one has to realize that sample sizes are computed on several values chosen before the concept test is ever sent out. These include your confidence level, the margin of error you want to allow for, and a variance (or standard deviation). For a very large population size (like any group of U.S. consumers larger than, say a few thousand people) we will assume that the standard deviation is equal to p(1-p). 'p' is a proportion, so p(1-p) will max out when p=0.5. (If you want to see any further discussion of this concept, take a look in a basic statistics book.) We can look at several values for the margin of error and the population size. Note that these computations also rely on the assumption of normally distributed responses.
So, because WordPress lacks the capacity to handle equations, here is the breakdown in pseudo-math.
n0 = ((Z-score)*(standard deviation)^2) /((margin of error)^2)
n = n0/(1+no/N) when n0 > .05*N
N: population size (estimate if you don't know)
n: sample size corrected for finite population size
n0: uncorrected sample size
| Population Size | Sample Size Estimates: for noted sampling error at 95% confidence and P= 0.5 | |||||||||
| ± 1% | ± 2% | ± 3% | ± 4% | ± 5% | ± 6% | ± 7% | ± 8% | ± 9% | ± 10% | |
| 100 | 99 | 97 | 92 | 86 | 80 | 73 | 67 | 61 | 55 | 49 |
| 200 | 196 | 185 | 169 | 151 | 132 | 115 | 99 | 86 | 75 | 65 |
| 500 | 476 | 414 | 341 | 273 | 218 | 174 | 141 | 116 | 96 | 81 |
| 1000 | 906 | 706 | 517 | 376 | 278 | 211 | 164 | 131 | 106 | 88 |
| 2000 | 1656 | 1092 | 696 | 462 | 323 | 236 | 179 | 140 | 112 | 97 |
| 5000 | 3289 | 1623 | 880 | 536 | 357 | 254 | 196 | 151 | 119 | 97 |
| 10000 | 4900 | 1937 | 965 | 567 | 385 | 267 | 196 | 151 | 119 | 97 |
| 20000 | 6489 | 2144 | 1014 | 601 | 385 | 267 | 196 | 151 | 119 | 97 |
| 50000 | 8057 | 2401 | 1068 | 601 | 385 | 267 | 196 | 151 | 119 | 97 |
| 100000 | 8763 | 2401 | 1068 | 601 | 385 | 267 | 196 | 151 | 119 | 97 |
| Big | 9604 | 2401 | 1068 | 601 | 385 | 267 | 196 | 151 | 119 | 97 |
The Advanced Solution
For more information about sample size consult a good sampling text. Try Sampling: Design and Analysis by Sharon L. Lohr (1999). (Coincidentally, I used the same book to make sure my equations were accurate…count this as due acknowledgment.)