This is the first of two articles detailing a research experiment that 优蜜传媒undertook in 2023 to identify and address potential data quality issues in nonprobability opt-in web panels.
The first article focuses on comparing the quality of the data obtained from several opt-in panel providers. In the second article, we will describe the results of several approaches to dealing with careless responders, regardless of vendor.
WASHINGTON, D.C. -- As internet use has become ubiquitous in the U.S., sampling from online panels recruited via opt-in nonprobability data-collection procedures has become an increasingly common methodology for survey researchers because such panels offer an economical way to collect data quickly.
However, the quality of data from these panels can be compromised by careless responding, which means respondents fail to read questions or pay sufficient attention to them, and provide random, dishonest or inattentive answers.
Little is known about how much data quality can vary from one opt-in panel provider to another. Each opt-in provider has its own unique methodology for recruiting and retaining panel respondents, and even when asked, most share only limited information about their methods for maintaining data quality or keep their procedures closely held as proprietary.
Pulling Back the Curtain on the Quality of Opt-In Data
To fill the knowledge gap about how much opt-in data quality varies, 优蜜传媒asked each of six opt-in panel providers to field an approximately nine-minute survey in English to 1,000 U.S. adults and requested that the sample adhere to general population demographic quotas (i.e., age, gender, region, education and ethnicity). The survey included 24 questions that could be benchmarked against reputable data sources that use gold-standard probability-based sampling methods. The questions spanned a wide range of topics, and participants were required to answer all questions in the survey. This yielded a total of 6,178 respondents across the six panels, which 优蜜传媒labeled “A” through “F” to ensure provider anonymity.
We did not ask the panels to apply any specific data-cleaning procedures, and we did not conduct any cleaning ourselves. This allowed us to assess data quality based on the raw data provided, as received directly from their samples.
Key Findings
To compare data quality across panels, we assessed three main factors: whether the unweighted data matched our requested demographic quotas; how closely the results of the benchmark items aligned with their established values (i.e., previously validated and widely accepted data points used as standards for comparison); and how prevalent careless responding was within each sample.
Some Opt-In Panels Are Better at Hitting Demographic Targets
First, we assessed how closely each panel’s sample aligned with our requested demographic quota groups. These results inform the extent to which the raw panel data needed weighting adjustments or other calibrations to achieve our desired representation of the general population.
We found, using unweighted results, that some panels provided samples that were much closer to our requested quotas than others. For example, we requested that 51% of each panel’s sample identify as female. Only three of the six tested panels achieved a close proximity to this quoted demographic target. This pattern was consistent across other demographic variables, with the same three panels consistently meeting most quota targets. In contrast, the other three panels that missed this gender target also failed to meet most of the other quotas.
Once weights were applied, there was little difference between vendors1 in reaching our quoted demographics. Yet one could consider the unweighted demographic quota results as a quick proxy for data quality, because the three panels that missed the mark were also more frequently flagged for other measures of poor data quality in this experiment.
Opt-In Panels vs. Established Benchmarks From Probability-Based Samples
Second, we evaluated data quality by comparing each panel’s weighted responses to 24 items with established benchmarks for those items from probability-based samples for the U.S. general population (see the PDF link at the end of this article for more information on these items). For example, we included an item from the U.S. Census Bureau's Current Population Survey that asks whether participants spent any time volunteering in the past 12 months, and an item from the U.S. Centers for Disease Control and Prevention’s National Health and Nutrition Examination Survey that asks whether participants have smoked at least 100 cigarettes in their entire life. If results from the opt-in panel aligned with these benchmarks, it confirmed that the opt-in sample is representative of the general population and that respondents are providing accurate, reliable data.
In terms of the average absolute percentage-point difference between observed results for each panel and these benchmarks (i.e., finding the difference between each item’s result and its benchmark value, treating all differences as a positive value, and then finding the mean of the differences across all 24 items), we found notable variations in data quality. Specifically, panels C and E provided data that was furthest from most benchmarks and, therefore, less trustworthy, while panel F provided data that were more closely aligned with most benchmarks and, therefore, more trustworthy. Similar patterns for panels C, E and F were observed for all three data-quality factors.
Detecting Careless Responders in Each Opt-In Panel
Third, we examined data quality by comparing the frequency of careless responding in each panel. This is important because high levels of careless responses can potentially undermine the reliability of the data, leading to biased results or incorrect conclusions. (Note that we will look into ways of accounting for these careless responders in Part 2 of this two-part article series.)
To flag careless-responding behaviors, we employed 20 unique detection methods assessed on a pass/fail basis, and we categorized these methods into two groups (see the PDF link at the end of this article for more information on these detection methods). First, items were added to the survey that were purposefully designed to catch careless responding, such as an attention check (“To show that you are paying attention to these instructions, please only select …”) and a postsurvey self-assessment (“How honest were you when answering the questions in this survey?”).
The second group of methods for detecting careless responding were statistical approaches, such as speeding (finishing the survey so quickly that it’s unlikely the person read or understood most instructions and questions) and straightlining (selecting the same answer repeatedly on a series of questions to which varying responses would be expected).
To estimate the overall prevalence of careless responding, we assigned a “flag” to respondents each time they failed a careless-responding detection method and found that the prevalence of careless responding (i.e., the number of flags assigned per respondent, on average) significantly differed by panel. Once again, Panels C and E demonstrated the most problematic results.
This pattern remained consistent when each of the 20 flags was analyzed individually, indicating that flagging rates vary more across panels than within them. In other words, while the prevalence of careless responding will depend on how one defines a “careless responder” (e.g., failing one or more flags, failing 10 or more flags, or specifically failing speeding and straightlining tests), our findings suggest that the specific flags used matter less than which panel provider is chosen, because the same panels (C and E) consistently produced poorer-quality data, regardless of how careless responding was detected.
Conclusions
Given the increasing scrutiny of opt-in panels’ data quality, this study provides a more nuanced look into data-quality differences across panels. In the absence of a gold standard for defining poor versus acceptable data quality across all surveys, the most effective way to evaluate such quality is through a comprehensive approach that examines multiple indicators.
In terms of matching unweighted demographic quotas, approximating established weighted benchmarks and passing a wide variety of careless-responding detection methods, we consistently found that not all opt-in panels are created equal. In addition to these empirical indices of data quality, we also note that operability factors such as ease of communication, clarity of methodologies and transparency of pricing also differ significantly across panel vendors.
Rather than discount the expanding and often beneficial role that opt-in panels play in survey research (when implemented responsibly), we suggest that researchers carefully conduct their own due diligence -- not only when deciding whether to use an opt-in sample provider, but also when selecting which opt-in provider to use.
Download supplemental materials (PDF download), including the benchmark items and detection methods used in this experiment.
[1] For example, % Female (Quota 51.0%) weighted results: Panel A = 51.0%, Panel B = 50.8%, Panel C = 51.2%, Panel D = 50.7%, Panel E = 51.3%, Panel F = 49.8%.
To stay up to date with the latest 优蜜传媒News insights and updates, follow us on X .