Data Management
There are four D’s related to research and data: Data Collection, Data Management, Data Analysis, and Dissemination of Results. Institutions employ professionals across many fields to help ensure the security, integrity, and provenance of clinical and translational research data.
This introductory online module will help learners identify the goals of research data management and summarize approaches for answering a research question. The learner will be able to define data collection methodology, describe and compare database design best practices, and discuss tips and tricks for collecting data for research purposes.
Resources
Resources:
- Health Insurance Portability and Accountability Act of 1996 (HIPAA)
- "Safe Harbor" de-identification
- FDA 21 CFR Part 11: Electronic Records and Signatures
- Medical Dictionary for Regulatory Activities (MedDRA)
- Common Terminology Criteria for Adverse Events (CTCAE)
- Clinical Data Interchange Standards Consortium (CDISC): Clinical Data Acquisition Standards Harmonization (CDASH)
- The Society for Clinical Data Management
- SOFPROMED - What is an Electronic Data Capture (EDC) System in Clinical Trials?
- USDA National Agricultural Library - Data Management Glossary
- Harvard - Data Management Terminology
- Cornell - Glossary of Data Management Terms
- Regulatory Focus - FDA Explains How to Craft a Data Management Plan
- International Science Council Committee on Data (CODATA) - Research Data Management Terminology
- The Global Health Network - Data Management Steps
- Krishnankutty B, Bellary S, Kumar NB, Moodahadu LS. Data management in clinical research: An overview. Indian J Pharmacol. 2012 Mar;44(2):168-72. doi: 10.4103/0253-7613.93842. PMID: 22529469; PMCID: PMC3326906.
- Oracle InForm
- Medidata RAVE
- Merge Healthcare eClinical OS (eCOS)
- Qualtrics (or other survey software)
- REDCap (the tool used heavily at Duke University for data collection)
Data Integrity: Making sure that all data related to a research study are complete, reliable, consistent, accurate, and processed correctly. Integrity also means that the data are relevant to the purpose for which they were collected.
Data Provenance: This means that the findings from the data are reproducible, both by the same research team and other research teams. The source data and relevant documentation should be managed in such a way that it can be used to reproduce the same results.
Data Security involves data storage, access, and sharing. Data must be stored in a manner limiting access to only those people who need access to the data. This means data must be protected from destructive forces and unwanted actions of unauthorized users. Sensitive information that is private or can be used to identify a person must be protected.
Protected Health Information: When one or more HIPAA identifiers are used in conjunction with one’s physical or mental health or condition, health care, or payment for that health care the data become protected health information or PHI.
De-identification: In many cases, before research data is shared, the dataset needs to be fully de-identified so that the people linked to the data cannot be identified using the dataset. De-identification is the removal of all 18 HIPAA identifiers.
Data Lifecycle: The steps in the data lifecycle include - Plan, Collect/Create, Process, Analyze, Share/Disseminate, Preserve, Reuse.
More terms:
Data Provenance: This means that the findings from the data are reproducible, both by the same research team and other research teams. The source data and relevant documentation should be managed in such a way that it can be used to reproduce the same results.
Data Security involves data storage, access, and sharing. Data must be stored in a manner limiting access to only those people who need access to the data. This means data must be protected from destructive forces and unwanted actions of unauthorized users. Sensitive information that is private or can be used to identify a person must be protected.
HIPAA Identifiers: There are 18 HIPAA direct and indirect identifiers. Identifiers are the information or data that can be a) used to identify, contact, or locate a single individual, or b) used, in combination with other sources, to identify a single individual.
Protected Health Information: When one or more HIPAA identifiers are used in conjunction with one’s physical or mental health or condition, health care, or payment for that health care the data become protected health information or PHI.
De-identification: In many cases, before research data is shared, the dataset needs to be fully de-identified so that the people linked to the data cannot be identified using the dataset. De-identification is the removal of all 18 HIPAA identifiers.
Data Lifecycle: The steps in the data lifecycle include - Plan, Collect/Create, Process, Analyze, Share/Disseminate, Preserve, Reuse.
More terms:
Interview
Interview with Ceci Chamorro
Manager of Information Systems
Duke Office of Clinical Research