Home » Research Projects » Papers

Papers

The TCRN methodological developments are summarized in the papers posted on this page.  Please contact Jerry Reiter for copies of papers under peer review, which are listed as “submitted.”

Papers about Confidentiality Protection Methods

  1. D. Manrique-Vallier and J. P. Reiter (2012), “Estimating identification disclosure risk using mixed membership models,” Journal of the American Statistical Association, 107, 1385 – 1394.
  2. J. P. Reiter and S. K. Kinney (2012), “Inferentially valid partially synthetic data: Generating from posterior predictive distributions not necessary,” Journal of Official Statistics, 28, 583 – 590.
  3. L. Cox (2014), “Enabling statistical analysis of suppressed tabular data,” in Privacy in Statistical Databases, edited by J. Domingo-Ferrer, Lecture Notes in Computer Science 8744. Heidelberg: Springer, 1 – 10.
  4. J. Hu, J. P. Reiter, Q. Wang, (2014), “Disclosure risk evaluation for fully synthetic data,” in Privacy in Statistical Databases, edited by J. Domingo-Ferrer, Lecture Notes in Computer Science 8744. Heidelberg: Springer, 185-199.
  5. A. F. Karr (2014), “Why data availability is such a hard problem,” Statistical Journal of the International Association for Official Statistics,  30, 101 – 107.
  6. A. F. Karr and J. P. Reiter (2014),  “Using statistics to protect privacy,” in Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by J. Lane, V. Stodden, S. Bender, and H. Nissenbaum, Cambridge University Press, 276 – 295.
  7. S. K. Kinney, J. P. Reiter, and J. Miranda (2014), “SynLBD 2.0: Improving the Synthetic Longitudinal Business Database,” Statistical Journal of the International Association for Official Statistics, 30, 129 – 135.
  8. D. Manrique-Vallier and J. P. Reiter (2014), “Bayesian estimation of discrete multivariate latent structure models with structural zeros,” Journal of Computational and Graphical Statistics, 23, 1061 – 1079.
  9. T. Paiva, A. Chakraborty, J. P. Reiter, and A. E. Gelfand (2014), “Imputation of confidential data sets with spatial locations using disease mapping models,” Statistics in Medicine, 33, 1928 – 1945
  10. J. P. Reiter (2014), “A case for public access to redacted social science data,” FierceBigData, Sept. 3, 2014.
  11. J. P. Reiter, Q. Wang, and B. Zhang (2014), “Bayesian estimation of disclosure risks in multiply imputed, synthetic data,” Journal of Privacy and Confidentiality, 6:1, Article 2.
  12. H. J. Kim, A. F. Karr, and J. P. Reiter (2015), “Statistical disclosure limitation in the presence of edit rules“, Journal of Official Statistics, 31, 121 – 138.
  13. H. Quick, S. H. Holan, C. K. Wikle, and J. P. Reiter. (2015), “Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography,” Spatial Statistics, 14, 439 – 451.
  14. Y. Chen, A. Machanavajjhala, J. P. Reiter, and A. F. Barrientos (2016), “Differentially private regression diagnostics,” Proceedings – IEEE International Conference on Data Mining, ICDM, 81 – 90.
  15. A. F. Karr (2016), “Data sharing and access,” Annual Review of Statistics and Its Application, 3, 113 – 132.
  16. D. McClure and J. P. Reiter, (2016), “Assessing disclosure risks for synthetic data with arbitrary intruder knowledge,” Statistical Journal of the International Association for Official Statistics, 32, 109 – 126.
  17. L. Vilhuber, J. M. Abowd, and J. P. Reiter (2016),Synthetic establishment microdata around the world.” Statistical Journal of the International Association for Official Statistics 32, 65-68.
  18. L. Wei and J. P. Reiter (2016), “Releasing synthetic magnitude microdata constrained to fixed marginal totals,”Statistical Journal of the International Association for Official Statistics, 32, 95 – 108.
  19. A. F. Karr (2017), “The role of statistical disclosure limitation in total survey error,” in Total Survey Error in Practice, eds. P Biemer, E. de Leeuw, S. Eckman, B. Edwards, F. Kreuter, L. Lyberg, C. Tucker, B. West, New York: John Wiley & Sons, 71 – 94.
  20. G. Amitai and J. P. Reiter (2018), “Differentially private posterior summaries for linear regression coefficients,” Journal of Privacy and Confidentiality, 8(1), Article 2.
  21. A. F. Barrientos, A. Bolton, T. Balmat, J. P. Reiter, J. M. de Figueiredo, A. Machanavajjhala, Y. Chen, C. Kneifel, and M. DeLong (2018), “Providing access to confidential research data through synthesis and verification: An application to data on employees of the U. S. federal government,” Annals of Applied Statistics, 12, 1124 – 1156.  Earlier version published as NBER working paper 23534.
  22. Y. Chen, A. F. Barrientos, A. Machanavajjhala, and J. P. Reiter (2018), “Is my model any good: Differentially private regression diagnostics,” Knowledge and Information Systems, 54, 33 – 64.
  23. J. Hu, J. P. Reiter, and Q. Wang (2018), “Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data,” Bayesian Analysis, 13, 183 – 200.
  24. H.J. Kim, J. P. Reiter, and A. F. Karr (2018), “Simultaneous edit-imputation and disclosure limitation for business establishment data,Journal of Applied Statistics, 45, 63 – 82.
  25. H. Yu and J. P. Reiter, (2018), “Differentially private verification of regression predictions from synthetic data,” Transactions on Data Privacy, 11, 279 – 297
  26. A. F. Barrientos, J. P. Reiter, A, Machanavajjhala, and Y. Chen (2019), “Differentially private significance tests for regression coefficients,” Journal of Computational and Graphical Statistics, 28, 440 – 453.
  27. J. P. Reiter (2019), “Differential privacy and federal data releases,Annual Review of Statistics and Its Application, 6, 85 – 101.
  28. M. Pistner Nixon, A. F. Barrientos, J. P. Reiter, and A, Slavkovic (2022), “A latent class modeling approach for generating synthetic data and making posterior inferences from differentially private counts,” Journal of Privacy and Confidentiality.

.Papers about Missing Data Methods

  1. J. Hu, R. Mitra, and J. P. Reiter (2013), “Are independent parameter draws necessary for multiple imputation?The American Statistician, 67, 143 – 149.
  2. Y. Si and J. P. Reiter (2013), “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys,” Journal of Educational and Behavioral Statistics, 38, 499 – 521.
  3. H. J. Kim, J. P. Reiter, Q. Wang, L. H. Cox, and A. F. Karr (2014), “Multiple imputation of missing or faulty values under linear constraints,” Journal of Business and Economic Statistics, 32, 375 – 386.
  4. F. Li, M. Baccini, F. Mealli, C. E. Frangakis, E. Z. Zell, and D. B. Rubin (2014) “Multiple imputation by ordered monotone blocks with application to the Anthrax Vaccine Research Program,” Journal of Computational and Graphical Statistics, 23, 877 – 892.
  5. D. Manrique-Vallier and J. P. Reiter (2014), “Bayesian multiple imputation for large-scale categorical data with structural zeros.,” Survey Methodology, 40, 125 – 134.
  6. H. J. Kim, L. H. Cox, A. F. Karr, J. P. Reiter, Q. Wang (2015), “Simultaneous editing and imputation for continuous data,” Journal of the American Statistical Association, 110, 987 – 999.
  7. T. S. Schifeling, C. Cheng, J. P. Reiter, and D. S. Hillygus (2015), “Accounting for nonignorable unit nonresponse and attrition in panel studies with refreshment samples,” Journal of Survey Statistics and Methodology, 3, 265 – 295.
  8. Y. Si, J. P. Reiter, and D. S. Hillygus (2015), “Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples,” Political Analysis, 23, 92 – 112.
  9. J. S. Murray and J. P. Reiter (2016), “Multiple imputation of missing categorical and continuous outcomes via Bayesian mixture models with local dependence,” Journal of the American Statistical Association, 111, 1466 – 1479.
  10. Y. Si, J. P. Reiter, and D. S. Hillygus (2016),  “Bayesian latent pattern mixture models for handling attrition in panel studies with refreshment samples,” Annals of Applied Statistics, 10, 118 – 143.
  11. O. Akande, F. Li, and J. P. Reiter, (2017), “An empirical comparison of multiple imputation methods for categorical data,” The American Statistician, 71, 162 – 170.
  12. M. De Yoreo, J. P. Reiter, and D. S. Hillygus (2017), “Nonparametric Bayesian models with focused clustering for mixed ordinal and nominal data,” Bayesian Analysis, 12, 679 – 703.  Read me file. Computer code. Simulation code.
  13. T. Paiva and J. P. Reiter (2017), “Stop or continue data collection: A nonignorable missing data approach for continuous variables“, Journal of Official Statistics, 33, 579 – 599.
  14. M. Sadinle and J. P. Reiter (2017), “Itemwise conditionally independent nonresponse modeling for incomplete multivariate data“, Biometrika, 104, 207 – 220.
  15. D. Manrique-Vallier and J. P. Reiter (2018), “Bayesian simultaneous edit and imputation for multivariate categorical data,” Journal of the American Statistical Association, 112, 1708 – 1719.
  16. M. Sadinle and J. P. Reiter (2018), “Sequential identification of nonignorable missing data,” Statistica Sinica, 28, 1741 – 1759.
  17. L. Wei and J. P. Reiter (2018), “Improving on Bayesian mixture models for multiple imputation of missing data by focused clustering,” REVSTAT 16, 213 – 230.
  18. T. K. White, J. P., Reiter, and A. Petrin (2018), “Imputation in U.S. manufacturing data and its implications for productivity dispersion,” Review of Economics and Statistics, 100, 502 – 509.  Also published as NBER Working Paper #22569.
  19. O. Akande, J. P. Reiter, and A. F. Barrientos (2019), “Multiple imputation of missing values in household data with structural zeros,Survey Methodology, 45, 271 – 294.
  20. O. Akande, A. F. Barrientos, and J. P. Reiter (2019), “Simultaneous edit and imputation for household data with structural zeros,” Journal of Survey Statistics and Methodology, 7, 498 – 519.
  21. M. Sadinle and J. P. Reiter (2019), “Sequentially additive nonignorable missing data modeling using auxiliary marginal information,” Biometrika, 106, 889 – 911.
  22. A. Lott and J. P. Reiter (2020), “Wilson confidence intervals for multiple imputation,” The American Statistician, 74, 109 – 115.
  23. O. Akande, G. Madson, D. S. Hillygus, and J. P. Reiter (2021), “Leveraging auxiliary information on marginal distributions in nonignorable models for item and unit nonresponse,” Journal of the Royal Statistical Society, Series A, 184, 643 – 662.
  24. G. Kamat and J. P. Reiter (2021), “Leveraging random assignment to impute missing covariates in causal studies,” Journal of Statistical Computation and Simulation, 91, 1275 – 1305.

Papers about Combining Information Across Sources

  1. M. M. Carrig, D. Manrique-Vallier, K. Ranby, J. P. Reiter, and R. Hoyle (2015),  “A nonparametric, multiple imputation-based method for the retrospective integration of data sets,” Multivariate Behavioral Research, 50, 383 – 397.
  2. J. Siddique, J. P. Reiter, A. Brincks, R. Gibbons, C. Crespi, and C. H. Brown (2015), “Multiple imputation for harmonizing non-commensurate measures in individual participant data meta-analysis,” Statistics in Medicine, 34, 3399 – 3414.
  3. R. C. Steorts (2015), “Entity resolution with empirically motivated priors,” Bayesian Analysis,  10, 849 – 875.
  4. B. K. Fosdick, M. De Yoreo, and J. P. Reiter (2016), “Categorical data fusion using auxiliary information,” Annals of Applied Statistics, 10, 1907 – 1929.  Read me fileComputer code. Simulated data.
  5. N. Dalzell, G. Boyd, and J. P. Reiter (2017), “Creating linked datasets for SME energy-assessment evidence-building: results from the U. S. Industrial Assessment Center Program,” Energy Policy, 111, 95 – 101.  See Duke blog entry on paper.
  6. M. Sadinle (2017), “Bayesian estimation of bipartite matchings for record linkage,” Journal of the American Statistical Association, 112, 600 – 612.
  7. T. S. Schifeling and J. P. Reiter (2016), “Incorporating marginal prior information into latent class models,” Bayesian Analysis, 11, 499 – 518.
  8. R. C. Steorts, M. Barnes, and W. Neiswanger (2017), “Performance bounds for graphical record linkage,” Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 1417 – 1425.
  9. N. Dalzell and J. P. Reiter (2018), “Regression modeling and file matching using possibly erroneous matching variables,” Journal of Computational and Graphical Statistics, 27, 728 – 738.
  10. J. Heck-Wortman and J. P. Reiter (2018), “Simultaneous record linkage and causal inference,” Statistics in Medicine, 37, 3533 – 3546.
  11. M. Sadinle (2018), “Bayesian propagation of record linkage uncertainty into population size estimation of human rights violations,” Annals of Applied Statistics, 12, 1013 – 1038.
  12. T. Schifeling, J. P. Reiter, and M. De Yoreo (2019), “Data fusion for correcting measurement errors,” Journal of Survey Statistics and Methods, 7, 175 – 200.
  13. J. Tang, J. P. Reiter, and R. C. Steorts, (2020) “Bayesian modeling for simultaneous regression and record linkage,Privacy in Statistical Databases, edited by J. Domingo-Ferrer and K. Muralidhar, Lecture Notes in Computer Science 12276, Cham, Switzerland: Springer, 209 – 223.
  14. J. P. Reiter, (2021) “Assessing uncertainty when using linked administrative data,” Administrative Records for Survey Methodology, edited by A. Y. Chen, M. D. Larsen, G. Durrant, J. P. Reiter, Hoboken: John Wiley & Sons, 139 – 153.

Papers about Novel Modeling Strategies for Complex Data

  1. A. Banerjee, J. Murray, and D. B. Dunson (2013), “Bayesian learning of joint distributions of objects.” Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS) 2013, Scottsdale, AZ, USA.
  2. D. Manrique-Vallier (2014), “Longitudinal mixed membership trajectory models for disability survey data,” Annals of Applied Statistics, 8, 2268 – 2291.
  3. H. Kim and S. N. MacEachern (2015), “The generalized multiset sampler,” Journal of Computational and Graphical Statistics, 24, 1134 – 1154.
  4. P. R. Hahn, J. S. Murray, and I. Manolopoulou (2016), “A Bayesian partial identification approach to inferring the prevalence of accounting misconduct,” Journal of the American Statistical Association, 111, 14 – 26.
  5. T. Ransom (2016), “The effect of business cycle fluctuation on migration decisions.”  Available at SSRN: https://ssrn.com/abstract=2741117 or http://dx.doi.org/10.2139/ssrn.2741117 .
  6. M. De Yoreo and A. Kottas (2017), “A Bayesian nonparametric Markovian model for nonstationary time series,Statistics and Computing, 27, 1525 – 1538.
  7. M. De Yoreo and A. Kottas, (2018), “Bayesian nonparametric modeling for multivariate ordinal regression,” Journal of Computational and Graphical Statistics, 27, 71 – 84.
  8. M. De Yoreo and A. Kottas, (2018), “Modeling for dynamic ordinal regression relationships: An application to estimating maturity of rockfish in California,” Journal of the American Statistical Association, 113, 68 – 80.
  9. L. Gutierrez, A. F. Barrientos, J. Gonzalez, D. Taylor-Rodriguez, (2018) “A Bayesian nonparametric multiple testing procedure for comparing several treatments against a control,” Bayesian Analysis, 14, 649 – 675.
  10. M. Sadinle, J. Lei, and L. Wasserman (2019), “Least ambiguous set-valued classifiers with bounded error levels.Journal of the American Statistical Association, 114, 223 – 234.
  11. D. H. Weinberg, J. A. Abowd, R. F. Belli, N. Cressie, D. Folch, S. H. Holand, M. C. Levenstein, K. M. Olson, J. P. Reiter, M. D. Shapiro, J. Smyth, L. Soh, B. D. Spencer, S. E. Spielman, L. Vilhuber, and C. K. Wikle (2019), “Effects of a government-academic partnership: Has the NSF-Census Bureau research network helped improve the U.S. statistical system?” Journal of Survey Statistics and Methodology, 7, 589 – 619.
  12. A. F. Barrientos and V. Pena (2020), “Bayesian bootstraps for massive data,” Bayesian Analysis, 15, 363 – 388.
  13. M. De Yoreo and J. P. Reiter, (2020), “Bayesian mixture modeling for multivariate conditional distributions,” Journal of Statistical Theory and Practice, 14, Article 45.

Dissertations and Theses

  1. J. S. Murray.  “Some Recent Advances in Non- and Semi-parametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables,” 2013 (PhD in statistical science).
  2. T. Paiva. “Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies,” 2014 (PhD in statistical science).
  3. T. Ransom. “Dynamic Models of Human Capital Accumulation,” 2015 (PhD in economics).
  4. J. Hu. “Dirichlet Process Mixture Models for Nested Categorical Data,” 2015 (PhD in statistical science).
  5. O. Akande. “A Comparison of Multiple Imputation Methods for Categorical Data,” 2015 (MSEM — Master’s in statistics and economic modeling).
  6. S. Oh. “Multiple Imputation of Missing Values in Time Series Data,” 2015 (MSEM — Master’s in statistics and economic modeling).
  7. D. McClure. “Relaxations of Differential Privacy and Risk Utility Evaluations of Synthetic Data and Fidelity Measures,” 2015 (PhD in statistical science).
  8. T. Schifeling. “Combining Information from Multiple Sources in Bayesian Modeling,” 2016 (PhD in statistical science).
  9. L. Wei.  “Methods for Imputing Missing Values and Synthesizing Confidential Values for Continuous and Magnitude Data,” 2016 (PhD in statistical science).
  10. N. Dalzell.  “Bayesian Approaches to File Linking with Faulty Data,” 2017 (PhD in statistical science).
  11. B. Li. “A Privacy Preserving Algorithm to Release Sparse High-dimensional Histograms,” 2017 (MSS — Master’s in statistical science).
  12. S. Yu. “Differentially Private Verification of Predictions from Synthetic Data,” 2017 (MSEM — Master’s in statistics and economic modeling).
  13. G. Amitai, “Bayesian Inference Via Partitioning Under Differential Privacy,” 2018 (MSS — Master’s in statistical science).
  14. J. Heck-Wortman, “Record Linkage Methods with Applications to Causal Inference and Election Voting Data,” 2018 (PhD in statistical science).
  15. O. Akande, “Bayesian Models for Imputing Missing Data and Editing Erroneous Data in Surveys,” 2019 (PhD in statistical science).
  16. K. Burris, “Advances in Survey Methodology and Sports Science,” 2019 (PhD in statistical science).
  17. G. Kamat, “Multiple Imputation of Missing Covariates in Randomized Controlled Trials,” 2019 (MSS — Master’s in statistical science).

TCRN no longer active

The NSF award that supported the TCRN ended on September 30, 2018.  This site is maintained for archival purposes.