Publications

Duke Database Group Publications

2022


  1. Xiao Hu, Yuxi Liu, Haibo Xiu, Pankaj Agarwal, Debmalya Panigrahi, Sudeepa Roy, and Jun Yang: Selectivity Functions of Range Queries are Learnable, SIGMOD 2022.
  2. Amir Gilad*, Zhengjie Miao*, Sudeepa Roy, and Jun Yang: Understanding Queries by Conditional Instances, SIGMOD 2022.
  3. Sainyam Galhotra*, Amir Gilad*, Sudeepa Roy, and Babak Salimi: HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach, SIGMOD 2022.
  4. Xiao Hu, Stavros Sintos, Junyang Gao, Pankaj K. Agarwal, and Jun Yang: Computing Complex Temporal Join Queries Efficiently, SIGMOD 2022.
  5. Wei Dong, Juanru Fang, Ke Yi, Yuchao Tao, Ashwin Machanavajjhala: R2T: Instance-optimal Truncation for Differentially Private Query Evaluation with Foreign Keys, SIGMOD 2022
  6. Chenghong Wang, Johes Bater, Kartik Nayak, Ashwin Machanavajjhala: IncShrink: Architecting Efficient Outsourced Databases using Incremental MPC and Differential Privacy, SIGMOD 2022
  7. Shaleen Deep, Xiao Hu and Paraschos Koutris: Ranked Enumeration of Join Queries with Projections, VLDB 2022.

2021


  1. Xiao Hu: Cover or Pack: New Upper and Lower Bounds for Massively Parallel Joins, PODS 2021
  2. Xiao Hu, Paraschos Koutris and Spyros Blanas: Algorithms for a Topology-aware Massively Parallel Computation Model, PODS 2021
  3. Shaleen Deep, Xiao Hu and Paraschos Koutris: Enumeration Algorithms for Conjunctive Queries with Projection, ICDT 2021
  4. Amir GiladShweta PatwaAshwin Machanavajjhala: Synthesizing Linked Data Under Cardinality and Integrity Constraints, SIGMOD 2021
  5. Daniel DeutchAriel FrankenthalAmir GiladYuval Moskovitch: On Optimizing the Trade-off between Privacy and Utility in Data Provenance, SIGMOD 2021
  6. Daniel DeutchAriel FrankenthalAmir GiladYuval Moskovitch: PITA: Privacy Through Provenance Abstraction, ICDE 2021
  7. Chenghong Wang, Johes Bater, Kartik Nayak, Ashwin Machanavajjhala: DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy, SIGMOD 2021
  8. Chenjie Li, Zhengjie Miao, Qitian Zeng, Boris Glavic, Sudeepa Roy: Putting Things into Context: Rich Explanations for QueryAnswers using Join Graphs, SIGMOD 2021
  9. Zhengjie Miao, Yuliang Li, Xiaolan Wang: Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond, SIGMOD 2021
  10. Junyang Gao, Yifan Xu, Pankaj Agarwal, and Jun Yang: Efficiently answering durability prediction queries, SIGMOD 2021
  11. Ester Livshits, Rina Kochirgan, Segev Tsur, Ihab Ilyas, Benny Kimelfeld, and Sudeepa Roy: Properties of Inconsistency Measures for Databases, SIGMOD 2021
  12. Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang: Dynamic Enumeration of Similarity Joins, ICALP 2021
  13. Junyang Gao, Stavros Sintos, Pankaj K. Agarwal, Jun Yang: Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions, ICDE 2021
  14. Siyuan Xia, Beizhen Chang, Karl Knopf, Yihan He, Yuchao Tao, Xi He: DPGraph: A Benchmark Platform for Differentially Private Graph Analysis, SIGMOD 2021

2020


  1. Yuchao Tao, Xi He, Ashwin Machanavajjhala , Sudeepa Roy: Computing Local Sensitivities of Counting Queries with Joins, SIGMOD 2020
  2. Amrita Roy Chowdhury, Chenghong Wang, Xi He, Ashwin Machanavajjhala, and Somesh Jha:  CryptE: Crypto-Assisted Differential Privacy on Untrusted Servers, SIGMOD 2020
  3. Babak Salimi, Harsh Parikh, Moe Kayali, Lise Getoor, Sudeepa Roy, and Dan Suciu: Causal Relational Learning, SIGMOD 2020
  4. Amir Gilad, Daniel Deutch, and Sudeepa Roy: On Multiple Semantics for Declarative Database Repairs, SIGMOD 2020
  5. Mayuresh Kunjir and Shivnath Babu: Black or White: How to Develop an AutoTuner for Memory-based Analytics, SIGMOD 2020
  6. Shaleen Deep, Xiao Hu, and Paraschos Koutris: Fast Join Project Query Evaluation using Matrix Multiplication, SIGMOD 2020
  7. Pankaj Agarwal, Stavros Sintos and Alex Steiger: Efficient Indexes for Diverse Top-k Range Queries, PODS 2020
  8. Xiao Hu and Ke Yi: Parallel Algorithms for Sparse Matrix Multiplication and Join-Aggregate Queries, PODS 2020
  9. M.Usaid Awan, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky: Almost-Matching-Exactly for Treatment Effect Estimation under Network Interference, AISTATS 2020
  10. Amir Gilad, Yihao Hu, Daniel Deutch, Sudeepa Roy: MuSe: Multiple Deletion Semantics for Data Repair, PVLDB, Vol 13, demonstration track, 2020
  11. Zhengjie Miao, Tiangang Chen, Alexander Bendeck, Kevin Day, Sudeepa Roy, and Jun Yang: I-Rex: An Interactive Relational Query Explainer for SQL, PVLDB, Vol 13, demonstration track, 2020
  12. Zhengjie Miao, Yuliang Li, Xiaolan Wang, and Wang-Chiew Tan: Snippext: Semi-supervised Opinion Mining with Augmented Data, WWW ’20: Proceedings of The Web Conference 2020, April 2020, Pages 617–628
  13. Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky: Adaptive Hyper-box Matching for Interpretable Individualized Treatment Effect Estimation, UAI 2020
  14. David Pujol, Ryan McKenna, Satya Kuppam, Michael Hay, AshwinMachanavajjhala, and Gerome Miklau. Fair decision making using privacy-protected data. In FAT*’20: Conference on Fairness, Accountability, and Transparency, Barcelona,Spain, January 27-30, 2020, pages 189–199. ACM, 2020.
  15. Xiao Hu and Ke Yi: Massively Parallel Join Algorithms, SIGMOD Record, September 2020 (Vol. 49, No. 3).
  16. Xiao Hu, Shouzhuo Sun, Shweta Patwa, Debmalya Panigrahi and Sudeepa Roy: Aggregated Deletion Propagation for Counting Conjunctive Query Answers, PVLDB, Vol 14, 2020.

2019


  1. Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. PrivateSQL: a differentially private SQL query engine. In VLDB 2019
  2. Stavros Sintos, Pankaj Agarwal, and Jun Yang. Selecting data to clean for fact checking: minimizing uncertainty vs. maximizing surprise. Proceedings of the VLDB Endowment, 12(13):2408-2421, 2019.
  3. Junyang Gao, Xian Li, Yifan Ethan Xu, Bunyamin Sisman, Xin Luna Dong, and Jun Yang: Efficient Knowledge Graph Accuracy Evaluation. In VLDB 2019
  4. Naeemul Hassan, Chengkai Li, Jun Yang, and Cong Yu, ed. Special Issue on Combating Digital Misinformation and Disinformation, ACM Journal of Data and Information Quality, July 2019. 11(3).
  5. Ios Kotsogiannis, Yuchao Tao, Ashwin Machanavajjhala, Gerome Miklau, Michael Hay: Architecting a Differentially Private SQL Engine. In CIDR 2019
  6. Zhengjie Miao, Sudeepa Roy, Jun Yang: Explaining Wrong Queries Using Small Examples. In SIGMOD 2019
  7. Zhengjie Miao*, Qitian Zeng*, and Boris Glavic, Sudeepa Roy: Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances. In SIGMOD 2019
  8. Chang Ge, Xi He, Ihab Ilyas, Ashwin Machanavajjhala: APEx: Accuracy-Aware Privacy Engine for Data Exploration. In SIGMOD 2019
  9. Brett Walenz, Stavros Sintos, Sudeepa Roy, and Jun Yang: Learning to Sample: Counting with Complex Queries PVLDB Vol 13, 2019.
  10. M.Usaid Awan, Yameng Liu, Marco Morucci, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky: Almost Matching Exactly With Instrumental Variables, UAI 2019.
  11. Zhengjie Miao, Qitian Zeng, Chenjie Li, Boris Glavic, Oliver Kennedy, Sudeepa Roy: CAPE: Explaining Outliers by Counterbalancing, PVLDB, Vol 12, demonstration track, 2019.
  12. Zhengjie Miao, Andrew Lee, Sudeepa Roy: LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers, PVLDB,  Vol 12, demonstration track, 2019.
  13. Awn Dieng, Yameng Liu, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky: Almost-Exact Matching with Replacement for Causal Inference, AISTATS 2019.
  14. Zhengjie Miao, Sudeepa Roy, Jun Yang: RATest: Explaining Wrong Queries Using Small Examples [pdf].
    SIGMOD demonstration track, 2019.
  15. Prajakta Kalmegh, Shivnath Babu, Sudeepa Roy: iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks, SIGMOD 2019.
  16. Yikai Wu, David Pujol, Ios Kotsogiannis, and Ashwin Machanavajjhala.Answering summation queries for numerical attributes under differential privacy.CoRR, abs/1908.10268, 2019
  17. Matthias Boehm, Arun Kumar, and Jun Yang. Data management in machine learning systems. Morgan & Claypool Publishers, February 2019.

Publication lists might be incomplete from 2018. Please see our personal webpages for updated information.

 

2018


  1. Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, Jennie Rogers: ShrinkWrap: Efficient SQL Query Processing in Differentially Private Data FederationsPVLDB 12(3): 307-320 (2018)
  2. Yuhao Wen, Xiaodan Zhu, Sudeepa Roy, Jun Yang: Interactive Summarization and Exploration of Top Aggregate Query Answers. PVLDB, 11 (13): 2196-2208, 2018.
  3. Jun Yang, Pankaj K. Agarwal, Sudeepa Roy, Brett Walenz, You Wu, Cong Yu, and Chengkai Li. Query perturbation analysis: an adventure of database researchers in fact-checking. IEEE Data Engineering Bulletin, 41(3):28-42, 2018. Invited contribution.
  4. Junyang Gao, Pankaj K. Agarwal, Jun Yang. Durable Top-k Queries on Temporal Data. PVLDB, 11 (13): 2223-2235, 2018.

2017


  1. Brett Walenz, Sudeepa Roy, Jun Yang. Optimizing Iceberg Queries with Complex Joins, In SIGMOD 2017.
  2. Ios Kotsogiannis, Ashwin Machanavajjhala, Gerome Miklau (UMass) and Michael Hay (Colgate). Pythia: Data Dependent Differentially Private Algorithm Selection, In SIGMOD 2017
  3. Mayuresh Kunjir, Brandon Fain, Kamesh Munagala, and Shivnath Babu. ROBUS: Fair Cache Allocation for Data-parallel Workloads. In SIGMOD 2017.
  4. Samuel Haney, Ashwin Machanavajjhala, John Abowd (US Census Bureau), Matthew Graham (US Census Bureau), Mark Kutzbach (US Census Bureau) and Lars Vihuber (Cornell). Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics. In SIGMOD 2017
  5. Ios Kotsogiannis, Elena Zheleva (Univ. Maryland, College Park), and Ashwin Machanavajjhala. Directed Edge Recommendation Systems, WSDM 2017
  6. Yan Chen, Andres Barrientos, Ashwin Machanavajjhala, Jerome Reiter, Is My Model Any Good: Differentially Private Regression Diagnostics, In KAIS 2017
  7. Xi He, Ashwin Machanavajjhala, Cheryl Flynn, Divesh Srivastava, Composing Differential Privacy and Secure Computation: A case study on scaling private record linkage, In ACM CCS 2017
  8. Yan Chen, Ashwin Machanavajjhala, Michael Hay, Gerome Miklau, PeGaSus: Data-Adaptive Differentially Private Stream Processing, In ACM CCS 2017

2016


  1. “Differentially private regression diagnostics”,
    Yan Chen, Ashwin Machanavajjhala, Jerome Reiter (Stats, Duke) and Andres Barrientos (Stats, Duke), ICDM 2016
  2. Botong Huang, Nicholas W. D. Jarrett, Shivnath Babu, Sayan Mukherjee, and Jun Yang. “Cümülön: matrix-based data analytics in the cloud with spot instances.” Proceedings of the VLDB Endowment, 9(3):156-167, 2015.
  3. Sudeepa Roy, Laurel Orr, Dan Suciu: Explaining Query Answers with Explanation-Ready Databases. PVLDB 9(4): 348-359, 2015
  4. Zilong Tan, Shivnath Babu: Tempo: Robust and Self-Tuning Resource Management in Multi-tenant Parallel Databases. PVLDB 9(10): 720-731, 2016.
  5. Brett Walenz and Jun Yang. “Perturbation analysis of database queries.” Proceedings of the VLDB Endowment, 9(14), 2016.

2015


  1. Machanavajjhala, Ashwin, and Daniel Kifer. “Designing statistical privacy for your data.” Communications of the ACM 58, no. 3 (2015): 58-67.
  2. He, Xi, Graham Cormode, Ashwin Machanavajjhala, Cecilia M. Procopiuc, and Divesh Srivastava. “DPT: differentially private trajectory synthesis using hierarchical reference systems.” Proceedings of the VLDB Endowment 8, no. 11 (2015): 1154-1165.
  3. Chen, Yan, and Ashwin Machanavajjhala. “On the Privacy Properties of Variants on the Sparse Vector Technique.” arXiv preprint arXiv:1508.07306(2015).
  4. Kunjir, Mayuresh, Brandon Fain, Kamesh Munagala, and Shivnath Babu. “ROBUS: Fair Cache Allocation for Multi-tenant Data-parallel Workloads.” arXiv preprint arXiv:1504.06736 (2015).
  5. You Wu, Boulos Harb, Jun Yang, and Cong Yu. 2015. Efficient evaluation of object-centric exploration queries for visualization. Proc. VLDB Endow. 8, 12 (August 2015), 1752-1763. DOI=http://dx.doi.org/10.14778/2824032.2824072

2014


  1. Huang, Botong, Nicholas WD Jarrett, Shivnath Babu, Sayan Mukherjee, and Jun Yang. “Cumulon: Cloud-Based Statistical Analysis from Users Perspective.” IEEE Data Eng. Bull. 37, no. 3 (2014): 77-89.
  2. Wu, You, Pankaj K. Agarwal, Chengkai Li, Jun Yang, and Cong Yu. “Toward computational fact-checking.” Proceedings of the VLDB Endowment 7, no. 7 (2014): 589-600.
  3. Sultana, Ayesha, Norfaeza Hassan, Chengkai Li, Jun Yang, and Cong Yu. “Incremental discovery of prominent situational facts.” In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pp. 112-123. IEEE, 2014.
  4. Kunjir, Mayuresh, Prajakta Kalmegh, and Shivnath Babu. “Thoth: Towards managing a multi-system cluster.” Proceedings of the VLDB Endowment 7, no. 13 (2014): 1689-1692.
  5. Lim, Harold, and Sarath Babu. “Execution and optimization of continuous windowed aggregation queries.” In Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on, pp. 303-309. IEEE, 2014.
  6. Kum, Hye-Chung, Ashok Krishnamurthy, Ashwin Machanavajjhala, Michael K. Reiter, and Stanley Ahalt. “Privacy preserving interactive record linkage (PPIRL).” Journal of the American Medical Informatics Association 21, no. 2 (2014): 212-220.
  7. Kifer, Daniel, and Ashwin Machanavajjhala. “Pufferfish: A framework for mathematical privacy definitions.” ACM Transactions on Database Systems (TODS) 39, no. 1 (2014): 3.
  8. Raval, Nisarg, Landon Cox, Animesh Srivastava, Ashwin Machanavajjhala, and Kiron Lebeck. “Markit: privacy markers for protecting visual secrets.” InProceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 1289-1295. ACM, 2014.
  9. He, Xi, Ashwin Machanavajjhala, and Bolin Ding. “Blowfish privacy: Tuning privacy-utility trade-offs using policies.” In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 1447-1458. ACM, 2014.
  10. Haney, Samuel, Ashwin Machanavajjhala, and Bolin Ding. “Answering Query Workloads with Optimal Error under Blowfish Privacy.” arXiv preprint arXiv:1404.3722 (2014).
  11. Stoddard, Ben, Yan Chen, and Ashwin Machanavajjhala. “Differentially Private Algorithms for Empirical Machine Learning.” arXiv preprint arXiv:1411.5428(2014).

2013


  1. Agarwal, Pankaj K., Lars Arge, Sathish Govindarajan, Jun Yang, and Ke Yi. “Efficient external memory structures for range-aggregate queries.”Computational Geometry 46, no. 3 (2013): 358-370.
  2. Thonangi, Risi, and Jun Yang. “Permuting data on random-access block storage.” Proceedings of the VLDB Endowment 6, no. 9 (2013): 721-732.
  3. Huang, Botong, Shivnath Babu, and Jun Yang. “Cumulon: Optimizing statistical data analysis in the cloud.” In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1-12. ACM, 2013.
  4. Herodotou, Herodotos, and Shivnath Babu. “A What-if Engine for Cost-based MapReduce Optimization.” IEEE Data Eng. Bull. 36, no. 1 (2013): 5-14.
  5. Babu, Shivnath, and Herodotos Herodotou. “Massively Parallel Databases and MapReduce Systems.” Foundations and Trends in Databases 5, no. 1 (2013): 1-104.
  6. Lim, Harold, Yuzhang Han, and Shivnath Babu. “How to Fit when No One Size Fits.” In CIDR, vol. 4, p. 35. 2013.
  7. Borisov, Nedyalko, and Shivnath Babu. “Rapid experimentation for testing and tuning a production database deployment.” In Proceedings of the 16th International Conference on Extending Database Technology, pp. 125-136. ACM, 2013.
  8. Aboulnaga, Ashraf, and Shivnath Babu. “Workload management for big data analytics.” In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 929-932. ACM, 2013.
  9. Lim, Harold, and Shivnath Babu. “Execution and optimization of continuous queries with cyclops.” In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1069-1072. ACM, 2013.
  10. Rekatsinas, Theodoros, Amol Deshpande, and Ashwin Machanavajjhala. “SPARSI: partitioning sensitive data amongst multiple adversaries.”Proceedings of the VLDB Endowment 6, no. 13 (2013): 1594-1605.
  11. Chen, Jianjun, Ashwin Machanavajjhala, and George Varghese. “Scalable Social Coordination with Group Constraints using Enmeshed Queries.” In CIDR. 2013.
  12. Ryu, Eunsu, Yao Rong, Jie Li, and Ashwin Machanavajjhala. “curso: protect yourself from curse of attribute inference: a social network privacy-analyzer.” InProceedings of the ACM SIGMOD Workshop on Databases and Social Networks, pp. 13-18. ACM, 2013.
  13. Rastogi, Vibhor, Ashwin Machanavajjhala, Laukik Chitnis, and Akash Das Sarma. “Finding connected components in map-reduce in logarithmic rounds.” In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pp. 50-61. IEEE, 2013.
  14. Getoor, Lise, and Ashwin Machanavajjhala. “Entity resolution for big data.” InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1527-1527. ACM, 2013.
  15. Rekatsinas, Theodoros, Amol Deshpande, and Ashwin Machanavajjhala. “On Sharing Private Data with Multiple Non-Colluding Adversaries.” arXiv preprint arXiv:1302.6556 (2013).
  16. He, Xi, Ashwin Machanavajjhala, and Bolin Ding. “Blowfish privacy: Tuning privacy-utility trade-offs using policies.” In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pp. 1447-1458. ACM, 2014.