Standardizing Phenotypes for the Table 1 Project
What is the Table 1 Project?
In a research publication, the baseline characteristics for a study population are conventionally reported in Table 1. The goal of the Table 1 Project is to identify important person characteristics and clinical features, along with explicit definitions and representations, for the reporting of baseline characteristics of research populations in interventional and observational studies. Interpreting a research result without an understanding of the population enrolled in the study is treacherous at best. Validated, reproducible, reliable, and generalizable fundamental patient characteristics could support:
- The submission of datasets from NIH-funded studies for archival and secondary use
- The submission of results from NIH-funded studies for archival, retrieval, and comparison purposes
- The standardized reporting of results from NIH-funded studies to ClinicalTrials.gov
- Better practices for describing research populations in publications submitted to medical journals
- The conduct of both multisite pragmatic clinical trials and observational studies
How do computable phenotype definitions relate to the Table 1 Project?
The characteristics routinely reported in Table 1 of clinical trial publications include demographic characteristics (e.g., age, sex, race, ethnicity), vital signs, physical exam findings, medical history and related elements (personal or family history of conditions), medications, or behavioral or lifestyle characteristics (e.g., level of exercise, dietary intake).
Some of these variables have small sets of permissible values or value sets (e.g., sex, race, ethnicity, marital status), whereas other variables have entire coding systems or terminologies with hundreds or thousands of values comprising the value set. For example, providers use the Current Procedural Terminology (CPT) coding system for procedural variables and the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) coding system for diagnosis variables. While the use of these standard coding systems in these contexts is virtually universal in the United States, there is ample evidence to show that providers and organizations use them differently [1,2].
Proponents of the Table 1 Project argue that the most efficient and appropriate use of these coding systems (including disease-specific code subsets) for the reporting of baseline characteristics should be defined. Phenotype definitions typically name sets of specific codes that should be included and excluded to identify a given condition within administrative (i.e., billing) or electronic health record (EHR) data. Therefore, standardized data elements and phenotype definitions are one way to clearly articulate and report methods for extracting data and determining case status for many conditions and comorbidities commonly included in Table 1.
How is the Collaboratory approaching the Table 1 Project?
The Collaboratory recognizes that it does not include a complete or representative set of researchers. However, within the Collaboratory, there are several different clinical research teams conducting successful pragmatic trials that involve data collection from multiple healthcare systems and locations. This provides a unique and valuable perspective both on a range of target conditions for pragmatic clinical research and on the practical challenges related to EHR data and health systems. Working within the Collaboratory Phenotypes, Data Standards, and Data Quality Core, researchers have begun to identify important characteristics and conditions (through intuition, not systematic review) and suggest standard approaches to defining, extracting, and reporting these data using EHR queries.
Working individually with important conditions, including those that have been explored in Collaboratory-funded demonstration pragmatic trials, a standardized approach to identifying and evaluating standard phenotype definitions is being refined. The process begins by examining existing research and phenotype definitions for conditions relevant to clinical research (i.e., inclusion criteria, comorbidities, risk factors, or outcomes). For each condition, the Collaboratory assesses existing definitions and supporting data to select the best available definitions using a number of criteria. It then develops specifications for including parameters, documentation describing contexts for definition use, and implementation guidance. These materials are dynamic and will be continually updated.
Some Table 1 components, such as sex, race, or even vital signs or certain laboratory tests, might be standardized (or consistently collected or reported) across all or most research studies. However, most Table 1 content is condition or disease specific and can only be standardized within certain groups. The Collaboratory has begun to describe standardization as part of the Table 1 Project but recognizes that true standardization is a long and complex process that needs more thought and time to achieve.
To fully appreciate the challenge, a discussion about the issue can be viewed here: Grand Rounds presentation.
What is the state of standardization for EHR data to support pragmatic clinical trials, and how can the Collaboratory advance this?
Some Table 1 variables are already relatively standard (e.g., sex) or are being standardized as part of Meaningful Use criteria (e.g., the Office of Management and Budget standards for race/ethnicity). The majority of important data for disease-specific and clinical research, however, come from broad data elements (e.g., diagnoses, laboratory results, medications) where each variable has thousands of potential values. Phenotype definitions for individual conditions can support standardized query and retrieval of patient records from different health systems by specifying values for each variable that should be included in the definition of a characteristic or condition of interest.
Currently, phenotype definitions are not standardized by any regulatory body, nor are they explicitly defined or even reported. For example, most published Table 1 listings do not specify the query approach or algorithm used for reported variables. The first step in standardization is encouraging researchers to be specific about their definitions and to fully describe the data source, query approach, and logic used to define the conditions they report.
The Collaboratory’s initial approach to standard phenotype definitions includes 5 steps:
- Develop guidance on identifying and assessing phenotype definitions (Living Textbook chapter)
- Develop recommendations, with supporting justification and implementation guidance, for phenotype definitions for a number of important conditions (ongoing, see Tools for EHR-Based Phenotyping)
- Endorse the need to specify phenotype definition logic in reports
- Develop a strategy to promote EHR and research data standards, including the identification of stakeholders and possible incentives
- Disseminate the phenotyping experiences of Collaboratory projects to illustrate the variety of landscapes within which attempts to leverage EHR data for research take place
It is hoped that an understanding of applications for standard phenotype definitions, based on the initial experience from the Collaboratory projects, will inform strategies (including guidance for physicians/organizations) to minimize variation in the use of EHR data. Demonstration of the use and value of standard phenotype (i.e., query) definitions will likely support this endeavor in the future, bolstered by implementation projects, such as the National Patient-Centered Clinical Research Network (PCORnet) and Clinical and Translational Science Award projects.
Eventually, these efforts may result in a CONSORT-like statement or encouragement of the International Committee of Medical Journal Editors to require 1) specification of all data used in trials and, eventually, 2) that certain recommended or standard phenotype definitions be used. Similarly, the Collaboratory can encourage the NIH to require this of grants and the FDA to motivate industry compliance. These and other potential approaches toward achieving standardized data collection in pragmatic trials are active discussion topics in the Collaboratory Phenotypes, Data Standards, and Data Quality Core.
Preliminary Thoughts on Table 1 Elements
Challenges and Approach
The definition of a single set of research variables that are important to collect across diseases has proven elusive. Many groups, including the Clinical Data Interchange Standards Consortium and the National Cancer Institute, have also tried to establish this. The U.S Food and Drug Administration (FDA) has not identified a comprehensive and required set, although its current data standards strategy does involve a disease-by-disease approach to condition-specific data elements.
It is not clear who has the ultimate authority to mandate a set of data elements for Table 1; regardless, we present a set of elements that are a place to start. We recognize that it is not realistic for research sponsors to dictate to health systems how to collect data, but also see that there is great value in having a vision of an ideal process for the reuse of clinical data for research. This vision can support the development of a clear and reasonable strategy for standardization of data collection and reporting.
How can we determine what (if any) elements should be required to report in Table 1?
There are several ways to approach the identification of important (or even mandatory) data elements for a standard Table 1. The approach could be data driven, for example, by examining research studies reported in medical journals, ClinicalTrials.gov, or the database of Genotypes and Phenotypes (dbGaP), or research data provided to an NIH sponsor or FDA regulator, and determining which variables are reported most often. This approach could further elucidate the variation in reported elements (e.g., how much they vary in value lists/value sets). Though these analyses could be interesting and potentially fruitful, they are beyond the time or resources available to the NIH Collaboratory team involved in the Table 1 Project. An alternate approach is to convene a group of clinical and research experts who ideally represent all diseases or medical specialties. That, too, is beyond our time and resource constraints. Instead we used a small group of experienced researchers representing various disease areas (e.g., cardiology, mental health) to generate a list and sample structure for a standard Table 1. This is a starting point for vetting and refining a shared vision of the Table 1 Project, and a practical, though perhaps limited, start to prioritizing elements for recommendations on standard queries or phenotyping.
What characteristics should be included in Table 1?
We envision Table 1 with a pan-disease piece (including information such as demographics) and then a number of condition-specific sets, as follows:
|Insurance status (to infer access to care)|
|Study-specific relevant comorbidities|
|Study-specific laboratory tests|
|Study-specific nonmedication interventions|
|Cohort identification variables (baseline)|
|Chronic kidney disease|
|Chronic obstructive pulmonary disease|
|Congestive heart failure|
|Connective tissue disease|
|Coronary artery disease|
|Peptic ulcer disease|
|Peripheral vascular disease|
|Tumor, leukemia, lymphoma|
|Other options for identifying comorbidities:|
|Charlson Comorbidity Index|
|Top 10 comorbidities by frequency|
What are the biggest challenges to the vision of a standardized Table 1?
We recognize that there are multiple approaches to defining important baseline characteristics for reporting with research results. Vetting of a standard Table 1 across different medical specialties and experts will be a challenge, as will coordination of hosting and versioning of Table 1 elements across disease areas, complicated by the fact that some variables are collected for many conditions. To reach a consensus and make broad adoption possible, the proposed Table 1 standards need to be comprehensive, relevant, meaningful, and achievable. Recommendations will also need to address variance in the reporting of characteristics based on the time of observation or measurement versus the beginning of a trial. Because pragmatic clinical trials involve EHRs as source data, this effort will involve more stakeholders than a traditional clinical trial. Engaging the breadth of potential users and stakeholders will help ensure future endorsement and adoption of any standardized Table 1. These are important considerations for the realization of efficiently collected, consistent, and meaningful data for pragmatic trials, which is a primary aim of the Collaboratory.