This work examined how researchers determine whether data can be trusted and whether it is fit for use in scientific inquiry. It began as an exploration of tooling and indicators. It EVOLVED into how confidence is constructed under uncertainty.
Across multiple studies with early- and mid-career researchers, data quality did not appear as a fixed property of a dataset. It emerged as an assessment made over time, shaped by context, reputation, collaboration, and accountability. Researchers were not looking for certainty; they were looking for orientation.
Participants described evaluating data through layered signals. Data sources mattered. Institutional reputation mattered. Diversity of represented subjects, volume of records, and documented quality assurance processes mattered. These factors were never considered in isolation. They were weighed together, often informally, as researchers tried to determine whether a dataset could responsibly support their questions.
What surfaced repeatedly was that trust was cumulative and fragile. Researchers sought ways to validate assumptions early, before committing to lengthy administrative processes or restructuring their workflows. Tools that allowed previewing populations, understanding completeness of data, and assessing relevance prior to registration were consistently valued because they reduced risk.
Fitness-for-use, as a concept, revealed its limits in this context. Many participants were unfamiliar with the term. When introduced, they interpreted it pragmatically. Could this data be used now? Would it require extensive manipulation? Would it hold up under scrutiny from collaborators and reviewers? Fitness-for-use was less about data readiness and more about researcher responsibility.
Researchers compensated for uncertainty through social and cognitive labor. They consulted colleagues. They relied on citations. They favored known repositories over unfamiliar ones, even when unfamiliar options promised richer data. Collaboration often felt safer than tooling, because it preserved shared accountability for interpretation.
Attempts to summarize data quality into simplified indicators risked flattening this reality. Researchers distrusted systems that collapsed complexity too aggressively. When systems exposed raw data without guidance, researcher cognitive load increased. Researchers needed to understand how quality judgments were made and how those judgments aligned with their own reasoning.
The work revealed a tension at the heart of data quality design. Institutions seek standardization. Researchers operate through situated judgment. Quality cannot be reduced to a score without erasing the interpretive work that makes science credible.
The contribution of this work was not about defining a universal measure of data quality.
It surfaced that data quality is maintained through judgment, collaboration, and transparency, and not compliance alone.
Design implications followed from this understanding. Tools needed to support exploration before commitment. Indicators needed to be customizable rather than prescriptive. Points of contact mattered as much as dashboards. Shareability was not a feature of convenience, but a way to sustain collective sensemaking across teams.
This insight extends beyond medical research. In any system where decisions depend on data, trust is not established by metrics in isolation. It is established when people can situate information within their own reasoning, understand its limits, and defend its use to others.
Designing for data quality, in this context, meant designing for how people think, understand, and decide.
Context
This work was conducted within the All of Us Research Program at the NIH (Wondros-side) and included a Rapid Iterative Study examining data quality perceptions and fitness-for-use, alongside qualitative interviews, concept exploration, and synthesis intended to inform future tooling and researcher support.