Re[DP]: Realistic Data Mining Under Differential Privacy


The collection and analysis of personal data about individuals has revolutionized information systems and fueled US and global economies. But privacy concerns regarding the use of such data loom large. Differential privacy has emerged as a gold standard for mathematically characterizing the privacy risks of algorithms using personal data. Yet, adoption of differentially private algorithms in industry or government agencies has been startlingly rare. This failure of adoption stems largely from a mismatch between the idealized problem settings considered to date by privacy researchers and the complex real-world workflows needed for mining personal data. This project will expand the practical usefulness of privacy algorithms, encouraging their use through technology transfer to the US Census and medical researchers at Duke University, and ultimately ensuring privacy protection with increased data sharing and transmission of knowledge.

This project aims to systematically study the complete workflow involved in mining personal data, and solve key problems that have diminished usability and prevented widespread deployment of differential privacy. Research activities include developing (i) private algorithms for data preprocessing (cleaning, imputation, and other transformations), (ii) algorithms to support parallel and iterative model selection, (iii) semantically meaningful guidelines for setting privacy policies and utility benchmarks. Results will guide the design and implementation of a novel web-based framework (DPcomp) for testing and evaluating the deployment of privacy algorithms. Broader impacts of this project include technology transfer to the US Census and medical researchers at Duke University, and incorporating privacy themes into new undergraduate courses. DPcomp will stimulate interaction between data owners and privacy researchers, and help unearth new research questions.