Category Archives: Data quality

Use of Data from Electronic Health Records to Customize Medical Treatments

In a recent segment on NPR’s Morning Edition, commentators discuss the potential of using electronic health records to customize medical treatments.

Dr. Harlan Krumholz, a professor of medicine at Yale University, says comparing data in electronic health records with genomic information holds great promise for customizing individual treatments, but he warns that the quality of data collected in the medical record is not research quality. While researchers are making a positive start with initiatives such as the Precision Medicine Initiative (re-branded as the All of Us research program), medicine still has a long way to go to fully realize the potential of these data.

Dr. Harlan Krumholz will be presenting at an upcoming NIH Collaboratory Grand Rounds on January 13 from 1:00 – 2:00 p.m. ET. “What’s Next: People-Powered Knowledge Generation from Digital Health Data.” Join the meeting here.

The full article and audio can be found on NPR Shots, an online channel for health stories from the NPR Science Desk.

New NIH Collaboratory resource for the transparent reporting of PCTs


The NIH Collaboratory has developed a tool to assist authors in the complete and transparent reporting of their pragmatic clinical trials (PCTs). In the PCT Reporting Template, users will find descriptions of reporting elements based on CONSORT guidance as well as on expertise from the NIH Collaboratory Demonstration Projects and Core working groups.

Particularly relevant to PCTs are recommendations on how to report the use of data from electronic health records. Other elements of importance to PCTs include reporting wider stakeholder engagement, monitoring for unanticipated changes in study arms, and specific approaches to human subjects protection. The template contains numerous links to online material in the Living Textbook, CONSORT, and the Pragmatic–Explanatory Continuum Indicator Summary tool known as PRECIS-2.

This resource is intended to assist authors in developing primary journal publications. It will be updated over time as new best practices emerge for the transparent reporting of PCTs.

Download the PCT Reporting Template.

Please note: this document opens as an Adobe PDF. If you do not have software that can open a PDF, click here to download a free version of Adobe Acrobat Reader.


This work was supported by a cooperative agreement (U54 AT007748) from the NIH Common Fund for the NIH Health Care Systems Research Collaboratory. The views presented in this document are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.


Originally published on September 1, 2016.


  • Questions or comments can be submitted via email. Please add “Living Textbook” to the Subject line of the email.

FDA releases draft guidance for using electronic health records in clinical research

The FDA has released a Draft Guidance for Industry to facilitate the use of data from electronic health record (EHRs) in clinical investigations. The draft guidance provides recommendations on how to use EHRs as a source of data for research, ensure data quality and integrity, and satisfy the FDA’s inspection, recordkeeping, and record retention requirements. An additional goal of the draft guidance is to promote interoperability, or the ability to exchange and use information between EHR systems that capture information during patient care visits and electronic data capture (EDC) systems that support clinical investigations. Sponsors of clinical research must also consider whether there are any reasonably foreseeable risks involved in using the EHR for research—such as an increased risk of data breaches—that should be disclosed in the informed consent document.

Read the full draft guidance here.

New Living Textbook Chapter on Acquiring and Using Electronic Health Record Data for Research

Topic ChaptersMeredith Nahm Zozus and colleagues from the NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core have published a new Living Textbook chapter about key considerations for secondary use of electronic health record (EHR) data for clinical research.

In contrast to traditional randomized controlled clinical trials where data are prospectively collected, many pragmatic clinical trials use data that were primarily collected for clinical purposes and are secondarily used for research. The chapter describes the steps a prospective researcher will take to acquire and use EHR data:

  • Gain permission to use the data. When a prospective researcher wishes to use data, a data use agreement (DUA) is usually required that describes the purpose of the research and the proposed use of the data. This section also describes use of de-identified data and limited data sets.
  • Understand fundamental differences in context. Data collected in routine care settings reflect standard procedures at an individual’s healthcare facility, and are not collected in a standard, structured manner.
  • Assess the availability of health record data. Few assumptions can be made about what is available from an organization’s healthcare records; up-front, detailed discussions about data element collection over time at each facility is required.
  • Understand the available data. A secondary data user must understand both the data meaning and the data quality; both can vary greatly across organizations and affect a study’s ability to support research conclusions.
  • Identify populations and outcomes of interest. Because healthcare facilities are obligated to provide only the minimum necessary data to answer a research question, investigators must identify the needed patients and data elements with specificity and sensitivity to answer the research question given the available data.
  • Consider record linkage. Studies using data from multiple records and sources will require matching data to ensure they refer to the correct patient.
  • Manage the data. The investigator is responsible for receiving, managing, and processing data and must demonstrate that the data are reproducible and support research conclusions.
  • Archive and share the data after the study. Data may be archived and shared to ensure reproducibility, enable auditing for quality assurance and regulatory compliance, or to answer other questions about the research.

ClinicalTrials.gov Analysis Dataset Available from CTTI

Tools for ResearchAs part of a project that examined the degree to which sponsors of clinical research are complying with federal requirements for the reporting of clinical trial results, the Clinical Trials Transformation Initiative (CTTI) and the authors of the study are making the primary dataset used in the analysis available to the public. The full analysis dataset, study variables, and data definitions are available as Excel worksheets from the CTTI website and on the Living Textbook’s Tools for Research page.


Collaboratory Phenotypes, Data Standards, and Data Quality Core Releases Data Quality Assessment White Paper


The NIH Collaboratory’s Phenotypes, Data Standards, and Data Quality Core has released a new white paper on data quality assessment in the setting of pragmatic research. The white paper, titled Assessing Data Quality for Healthcare Systems Data Used in Clinical Research (V1.0) provides guidance, based on the best available evidence and practice, for assessing data quality in pragmatic clinical trials (PCTs) conducted through the Collaboratory. Topics covered include an overview of data quality issues in clinical research settings, data quality assessment dimensions (completeness, accuracy, and consistency), and a series of recommendations for assessing data quality. Also included as appendices are a set of data quality definitions and review criteria, as well as a data quality assessment plan inventory.

The full text of the document can be accessed through the “Tools for Research” tab on the Living Textbook or can be downloaded directly here (PDF).