Avoid These Bias Errors of Samples in Social Media Research
How to Promote Quality in Social Media Samples
Social media research, as it is currently conducted, is subject to non-participation bias. A number of types of non-participation bias exist and each type has the potential to impact the reliability of research findings—often in ways that are hidden or unknown. In fact, research has shown that those research participants who are difficult to reach, requiring multiple efforts to contact them, differ in significant ways from other respondents. These differences were seen in age, gender, marital status, socioeconomic status, health status, and the number of children.
The extent to which the data at the close of a study includes all the members in a sample is referred to as the response rate. While this concept is clear in a structured survey or set of interviews, it is more ambiguous in social media research. However, it is no less important in social media research than it is in other types of qualitative research. The response rate is calculated by the number of participants who complete surveys—or agree to be interviewed—divided by the total number of people who make up the original sampling effort. The total number must include people who were not successfully contacted or who refused to participate in the research.
The Generalization Issue
Regardless of how data is collected, the importance of a high rate of response cannot be stressed enough. It is not possible to realistically generate a larger population when the response rate of a sample is low. Sample bias increases as response rate drops. In media based surveys, when return rates fall to 20 or 30 percent of the sample, that group of participants bears little resemblance to the overall sampled population. The same tendency of people to return a mail-in survey or agree to participate in a telephone survey occurs with people who engage in social media networks: that is, a particular interest in the subject matter (or product or service, as the case may be).
Smaller samples have larger sampling error than larger samples. Consider that sample data provides an estimate of the attributes of the larger population. Each sample drawn from a sampling frame provides a separate estimate of that larger population. Theoretically, there could be a separate pattern of responses in each sample taken for each question asked. Over time, with enough samples drawn from the sampling frame, the true pattern would converge around the actual (true) pattern of the larger population.
Margin of Error
Sampling error describes the precision of an estimate from any of the samples taken from the larger population. Sampling error is expressed in terms of a margin of error that is associated with a level of confidence, which is a statistical measure. In a Presidential preference poll, for example, the report may show that the incumbent is favored by 64% of the voters. The margin of error would be plus-or-minus 3 points with a 95% confidence level. In other words, if the poll were conducted over again with 100 different samples of voters, out of the 100 voters, 95 voters would indicate that the incumbent is favored by 61% to 67% of the voters. That is, 61% of the voters +3% or –3%.
Decisions About Sample Size
The margin of error associated with sampling goes down as sample size goes up, but only to a certain point. When sample size reaches 1000 to 2000 respondents, the margin of error is sufficiently small so as to make consideration of larger samples (not a cost-effective choice). When subgroups are part of the larger population, larger sample sizes may be justified because the margin of error will vary for each subgroup depending on the number of people in the subgroups. For example, given 1000 members of a social media network and a margin of error that equals somewhere between 1 to 3 percentage points with a 95% confidence interval, analysis of a subgroup of that social media network—say, stay-at-home-moms numbering about 100—would have a higher margin of error of about 4 to 10 points.
Gauging Sample Sufficiency
Samples are typically evaluated according to the selection procedures used rather than the ultimate size or composition. This is fundamental because—in most situations—it is impossible to accurately measure how representative a sample is of the larger population. Statistical procedures are used because they permit convenient and fundamentally reliable estimates. Establishing a reasonable confidence interval and margin of error at the beginning enables researchers to focus on variables such as response rate and adequate sampling frames.