In Google Analytics, I am able to get a list of all the terms users search for on the site. For a large site over the course of several weeks, this could be upwards of 10,000 terms. I want to create a report that categorizes the types of terms that users searched for, but going through 10,000 terms and categorizing them by hand would be difficult in a reasonable timeframe. So my instinct was the sample and report on that sample.
I want to make sure I am using the right formula to generate a margin of error for the sample and that I am properly reporting it.
What I want to do is pull a random sample of the terms used, then put those terms into a spreadsheet of some kind and code them by hand in the categories (products, personnel, jobs). In the end, I'll have categories with some percentage of the sample for each sampled term.
For a 95% confidence, I was going to use:
Margin of error = (1.96 * 0.5) / sqrt((population_total_count - 1) * sample_search_total_count / (population_total_count - sample_search_total_count))
population_total_count would be the total count of search in the population (the full list) and sample_search_total_count would be the number of searches in a random sample I pull.
If 25% of my sample percentage was "products", and I had a Margin of Error 3%, I would report that as "We expect 25% of searches were for products plus or minus 3% at a 95% confidence." I would the same "plus or minus 3% at a 95% confidence" for any of the other categories in the same survey.
Am I using the right formula and discussing this correctly? Am I correct in using the same +/- Margin of Error for each of the categories?