Home » Research Projects » Missing and Faulty Data

Missing and Faulty Data

Federal agencies field surveys that potentially support complex and rich secondary analyses that lead to deeper scientific understanding and more informed policy making; yet, data quality often suffers because data subjects are reluctant to respond to surveys or are prone to report data with errors.  Statistical agencies frequently handle nonresponse with ad hoc approaches that tend not to work well in complex datasets with many variables—especially when data are released to the public—because they often ignore multivariate relationships. Similarly, editing faulty data is frequently based on heuristics rather than principled theory. The TCRN will improve methodology and practice for handling missing and faulty data by

  1. developing frameworks for simultaneous imputation of missing data and editing of faulty data by integrating paradigms from statistics and operations research; and,
  2. developing nonparametric Bayesian methodology for multiple imputation of missing data in high dimensions with longitudinal and multi-level aspects.

These methods will be used to enhance imputations in the restricted access files for the ASM and the Survey of Income and Program Participation (SIPP), which are used by large research communities in, e.g., business, economics, public policy, and sociology.

TCRN no longer active

The NSF award that supported the TCRN ended on September 30, 2018.  This site is maintained for archival purposes.