Privacy-Enhanced Federated Learning
The Challenge
Today’s mobile intelligent devices boast increasingly powerful hardware, enabling data collection and processing at an unprecedented scale. Concurrently, emerging artificial intelligence (AI) techniques have ushered in groundbreaking achievements in computer vision and data analytics. As a result, there is a substantial interest in leveraging the wealth of data from these distributed devices to cultivate and refine potent machine learning (ML) models. Recent times have witnessed remarkable progress in deploying distributed ML across real-life applications, such as intelligent devices like Google Gboard, digital health solutions exemplified by OWKIN, and intelligent transportation systems like IoV.
Yet, data privacy has become paramount, exemplified by events such as the Facebook data privacy crisis, and the Didi car-pooling privacy breach, which led to two murder cases. In particular, using centralized and easily searchable data repositories has escalated the threat of private information leakage, encompassing sensitive data like health records, travel details, and financial information. Moreover, the burgeoning array of open data applications has heightened the need for robust privacy safeguards. Even in pure research, the use of real-life datasets can inadvertently result in information leakage.
In this light, ML methods that restrict sharing of sensitive information, such as learning outcomes or parameter updates, have gained immense appeal among users. These methods prevent the need for direct sharing of users’ sensitive data. A noteworthy recent example is federated learning (FL). This technique facilitates training a global model on a server using users’ locally generated models without direct access to the users’ local data. However, it’s essential to recognize that information theory dictates that it remains possible for a curious server to extract private information from users’ locally generated models. This possibility arises because distributed ML inherently requires some level of information exchange, and this information is naturally derived from users’ local data. For instance, malicious entities could exploit distributed learning models by intercepting and analyzing the shared information to reconstruct individual data points from the local user data (also known as a reconstruction attack) or unveil specific attributes of this local data (also known as an inference attack). Consequently, the study of mechanisms to mitigate these privacy risks in distributed ML has emerged as a focal point of attention within the AI/ML community.
The Research
Our research group has been working proactively in the study and development of innovative algorithms to mitigate privacy risks associated with information sharing in distributed machine learning (ML) while preserving the quality of trained AI models. This field is a dynamic and ongoing research direction, with numerous uncharted territories awaiting exploration. The concept of differential privacy is of particular significance. It stands out for its ability to provide a robust and verifiable privacy assurance, guaranteeing plausible deniability upon individuals within a dataset against any potential privacy incursion. Our work has substantially contributed to private AI by advancing a theoretical framework that characterizes the delicate balance between AI capabilities and human privacy.
In greater detail, our research activities revolve around the following pivotal themes:
-
Understanding Privacy Dynamics: We delve into the intricate landscape of privacy dynamics by scrutinizing many privacy attack mechanisms, including reconstruction and inference attacks. In tandem, we explore privacy defense mechanisms, such as perturbation. Our investigations span various distributed ML algorithms, encompassing but not limited to federated learning and split learning. Furthermore, we delve into the diverse realm of differential privacy mechanisms, considering noise injection at different stages, including users, servers, and third-party intermediaries.
-
Exploring the Privacy-Utility Trade-off: Our research investigates the fundamental trade-off between privacy and utility within the domain of distributed ML. We evaluate the ramifications of this trade-off across various AI architecture alternatives, such as synchronous and asynchronous approaches, serverless paradigms leveraging blockchain technology, and neural-network-based ML. This exploration extends to adjusting ML parameters, further enriching our understanding of the interplay between privacy and utility.
-
Optimizing Differential Privacy: We optimize differential privacy-based distributed ML across diverse deployment and operational scenarios. These scenarios encompass Internet of Things (IoT) devices, mobile platforms, and edge computing systems. Developing algorithms capable of identifying the optimal deployment configuration for privacy-preserving distributed ML constitutes a formidable challenge we actively engage with.
-
Practical Deployment: Our mission extends beyond theoretical innovation; we aim to bridge the gap between research and practical application. We are committed to deploying our meticulously designed privacy-preserved distributed ML algorithms into embedded devices capable of executing intelligent tasks at the network edge. This paradigm finds relevance across various business domains, including digital health, intelligent agriculture, transportation systems, smart energy, and advanced manufacturing.
To sum up, our research endeavors in private AI not only push the boundaries of knowledge but also hold the potential to catalyze transformative advancements in many real-world applications, ultimately ushering in a new era of privacy-aware, AI-driven innovation.
Related Publications
- Wei, K., Li, J., Ding, M., Ma, C., Jeon, Y. S., & Poor, H. V. (2023). Covert model poisoning against federated learning: Algorithm design and optimization. IEEE Transactions on Dependable and Secure Computing.
- Wei, K., Li, J., Ma, C., Ding, M., Chen, W., Wu, J., … & Poor, H. V. (2023). Personalized Federated Learning with Differential Privacy and Convergence Guarantee. IEEE Transactions on Information Forensics and Security.
- Nguyen, D. C., Pham, Q. V., Pathirana, P. N., Ding, M., Seneviratne, A., Lin, Z., … & Hwang, W. J. (2022). Federated learning for smart healthcare: A survey. ACM Computing Surveys (CSUR), 55(3), 1-37.
- Ma, C., Li, J., Shi, L., Ding, M., Wang, T., Han, Z., & Poor, H. V. (2022). When federated learning meets blockchain: A new distributed learning paradigm. IEEE Computational Intelligence Magazine, 17(3), 26-33.
- Leng, J., Lin, Z., Ding, M., Wang, P., Smith, D., & Vucetic, B. (2022). Client scheduling in wireless federated learning based on channel and learning qualities. IEEE Wireless Communications Letters, 11(4), 732-735.
- Li, J., Shao, Y., Wei, K., Ding, M., Ma, C., Shi, L., … & Poor, H. V. (2021). Blockchain assisted decentralized federated learning (BLADE-FL): Performance analysis and resource allocation. IEEE Transactions on Parallel and Distributed Systems, 33(10), 2401-2415.
- Ma, C., Li, J., Ding, M., Wei, K., Chen, W., & Poor, H. V. (2021). Federated learning with unreliable clients: Performance analysis and mechanism design. IEEE Internet of Things Journal, 8(24), 17308-17319.
- Wei, K., Li, J., Ding, M., Ma, C., Su, H., Zhang, B., & Poor, H. V. (2021). User-level privacy-preserving federated learning: Analysis and performance optimization. IEEE Transactions on Mobile Computing, 21(9), 3388-3401.
- Nguyen, D. C., Ding, M., Pathirana, P. N., Seneviratne, A., Li, J., & Poor, H. V. (2021). Federated learning for internet of things: A comprehensive survey. IEEE Communications Surveys & Tutorials, 23(3), 1622-1658.
- Nguyen, D. C., Ding, M., Pham, Q. V., Pathirana, P. N., Le, L. B., Seneviratne, A., … & Poor, H. V. (2021). Federated learning meets blockchain in edge computing: Opportunities and challenges. IEEE Internet of Things Journal, 8(16), 12806-12825.
- Liu, B., Ding, M., Shaham, S., Rahayu, W., Farokhi, F., & Lin, Z. (2021). When machine learning meets privacy: A survey and outlook. ACM Computing Surveys (CSUR), 54(2), 1-36.
- Ma, C., Li, J., Ding, M., Yang, H. H., Shu, F., Quek, T. Q., & Poor, H. V. (2020). On safeguarding privacy and security in the framework of federated learning. IEEE Network, 34(4), 242-248.
- Nguyen, D. C., Cheng, P., Ding, M., Lopez-Perez, D., Pathirana, P. N., Li, J., … & Poor, H. V. (2020). Enabling AI in future wireless networks: A data life cycle perspective. IEEE Communications Surveys & Tutorials, 23(1), 553-595.
- Wei, K., Li, J., Ding, M., Ma, C., Yang, H. H., Farokhi, F., … & Poor, H. V. (2020). Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security, 15, 3454-3469. [IEEE Signal Processing Society Best Paper Award in 2022; Google citation count: 970 as of 13/09/2023]