Discrete Distribution Estimation with Local Differential Privacy: A Comparative Analysis

June 3rd, 2021

Date Time: 13th May  2021 3pm-4pm AEST

Recording: https://webcast.csiro.au/#/videos/3983e557-1d55-417d-9463-47cf6c6128f1


Speaker: Dr Ba Dung Le

Ba Dung Le is a Postdoctoral Research Fellow in Cyber Security at the Charles Sturt University, NSW. He received his Ph.D. in Computer Science from the University of Adelaide. His research interests include Machine Learning algorithms, particularly Data Clustering, and applications of Machine Learning to Cybersecurity, such as Cyber Threat Detection and Data Security and Privacy. He is currently focusing on developing efficient privacy-preserving techniques for the statistical aggregation of user data.


Local differential privacy is a promising privacy-preserving model for statistical aggregation of user data that prevents user privacy leakage from the data aggregator. This paper focuses on the problem of estimating the distribution of discrete user values with Local differential privacy. We review and present a comparative analysis on the performance of the existing discrete distribution estimation algorithms in terms of their accuracy on benchmark datasets. Our evaluation benchmarks include real-world and synthetic datasets of categorical individual values with the number of individuals from hundreds to millions and the domain size up to a few hundreds of values. The experimental results show that the Basic RAPPOR algorithm generally performs best for the benchmark datasets in the high privacy regime while the k-RR algorithm often gives the best estimation in the low privacy regime. In the medium privacy regime, the performance of the k-RR, the k-subset, and the HR algorithms are fairly competitive with each other and generally better than the performance of the Basic RAPPOR and the CMS algorithms.