In surveys research, statistics are applied to randomized samples. These statistics represent the degree to which a researcher can be confident that the study sample is reasonably valid and reliable.
A confidence interval is the margin of error that a researcher would experience if they could ask a particular research question, say, of every member of the target population and receive the same answer back that the members of the sample gave in the survey. For example, if the researcher used a confidence interval of 4 and 60% of the participants in the survey sample answered, "Would recommend to friends," he could be sure that between 56% and 64% of the members of the entire target population would also say "Would recommend to friends" when asked the same question. The confidence interval, in this case, is +/- 4.
A confidence level is an expression of how confident a researcher can be of the data obtained from a sample. Confidence levels are expressed as a percentage and indicate how frequently that percentage of the target population would give an answer that lies within the confidence interval. The most commonly used confidence level is 95%. A related concept is called statistical significance.
A researcher's confidence in the probability that their sample is truly representative of the target population is influenced by a number of factors. A researcher's confidence in their study design and implementation—and an awareness of its limitations—is largely based on three important variables: sample size, frequency of response, and population size. Researchers have long agreed that these variables must be carefully considered during the research planning phase.
Survey Sample Size
Generally speaking, larger samples deliver data that truly reflect the target population. A wide confidence interval is indicative of less confidence in the data because there is a greater margin for error. A wide confidence interval is like hedging your bets. Although there is a relationship between the confidence interval and sample size, it is not a linear relationship. A researcher cannot cut a confidence level in half by doubling the sample size.
The Frequency of Response
The accuracy with which sample data reflects the target population depends also on the percentage of respondents who gave a particular answer or responded in a specific way. The greater the number of respondents who gave a particular answer, say "Very happy," the surer the researcher can be of that response. There will be some variability in the percentage in the middle areas of the normal curve. That is, if a researcher is 50% confident that members of the target populations will respond (within a confidence interval) like members of the sample population, there is likely to be some variation from that 50% level.
Be Aware of Outliers
It is good to remember that outliers (data that is on the far ends, or tails, of the normal curve) are more likely to occur at about the same rate in the population as they do in a sample—there is less variability here because there is a lower frequency. For this reason, it is easier to be confident about the frequency of extreme answers.
Population size is not an important factor in sample size unless a researcher is working with a population that is very small and known to them (e.g., small enough so that all the members of the population can be identified by the researcher).
Creative Research Systems points out that:
The mathematics of probability proves the size of the population is irrelevant unless the size of the sample exceeds a few percent of the total population you are examining. This means that a sample of 500 people is equally useful in examining the opinions of a state of 15,000,000 as it would be a city of 100,000.
Generating a representative sample can be a costly and time-consuming process. Researchers always face a trade-off between the confidence level they would like to obtain—or the degree of accuracy they need to achieve—and the confidence level they can afford.
Sample Size in Qualitative Surveys Research
Qualitative research is exploratory or descriptive in nature and does not focus on numbers or measurements. But concerns about sampling error in qualitative survey research are still valid. As a general rule, if a sample is representative of the target universe, the themes or patterns that emerge from the research will reflect the larger population that is of interest to the researcher. If the sample is both representative and consists of a large percentage of the target population, then confidence in the accuracy of data derived from that sample will tend to be high.
Determining Sample Size in Surveys Research
Different rules apply to quantitative research and qualitative research when it comes to determining sample size. Generally speaking, to be confident in the data generated by qualitative survey research, a researcher needs to have a clear idea of how the data will be used. The data may form the basis for a descriptive narrative (as in a case study or some ethnographic research) or it may serve in an exploratory fashion to identify relevant variables that might later be tested for correlations in a quantitative study.
Sample Size in Quantitative Surveys Research
Quantitative research often involves comparisons between market segments or subgroups of a target market. Because quantitative research is numbers-driven, determining a comfortable sample size can be fairly easy. For each important group or segment in a study, a researcher would hope to survey 100 participants. This number is a recommendation and not an absolute. A market researcher will consider a number of relevant variables to determine the size of a sample in survey research.
When conducting survey market research, the goal is to infer from the sample what is likely to be true of the target universe. A sample provides data that can be observed or known. From this observed or known data, a researcher can estimate the degree to which an unknown value or parameter can be found in a target population.
Quantitative survey research is based on the notion of a normal, symmetrical curve that represents, in the mind of the researcher, the target universe - the population about which the researcher must estimate rather than actually know parameters. A representative sample allows a researcher to calculate—from the sample data—an estimated range of values that are likely to include the unknown value or parameter that is of interest. This estimated range of values represents an area on the normal curve and is generally expressed as a decimal or a percentage.
The Normal Curve and Probability
A normal, symmetrical curve is a visual expression of probability. Let's look at a simple heuristic: An activity at a science center lets a large number of balls fall between two acrylic sheets, one at a time. Every ball falls through the same opening at the top of the display and then drops between any of the vertical, parallel dividers that separate the stacks of balls once they come to rest. After several hours, the balls have formed the shape of a normal curve.
The curve changes a little bit as each newly introduced ball hits the mass of balls that arrived first. But overall, the symmetrical curve is evident and it occurred naturally, independent of any action by the Science Center observers or staff. The curved shape that the balls form reflects the probability that most of the balls will fall into the center and stay there. Fewer balls will make it into the far ends of the curve but some inevitably will, but are few in number.
This normal curve is similar to the concept of a sample. Each time the display is emptied out and the balls once again are allowed to fall into the Galton box, the configuration of the stacks of balls will be only a little bit different. But over time, the shape of the curve will not change much and the pattern will hold true.