Privacy and Machine Unlearning

The Challenge

In the wake of increased computational resources and data availability, Machine Learning (ML) models have been extensively employed across sectors like healthcare and finance for predictive analytics and informed decision-making. However, static models that don’t adapt post their initial training can become outdated in the constantly evolving data landscape, prompting a shift towards more adaptable ML models. Parallelly, global legislation focusing on data privacy rights necessitates the retraining of ML models when data changes due to privacy, resilience, bias mitigation, and uncertainty factors.

A novel concept, termed “machine unlearning,” offers a solution by enabling ML models to efficiently forget or exclude specific data without a considerable performance hit, providing an alternative to completely retraining models. This emerging domain in AI and ML concentrates on creating models that can remove specific knowledge or data, addressing pressing concerns about data privacy, model robustness, and system updates. Techniques in machine unlearning range from entirely retraining models to more nuanced methods that selectively alter the model’s knowledge base.

Machine unlearning is pivotal in shaping responsible and responsive AI and is bifurcated into “exact” and “approximate” unlearning. While exact unlearning ensures total data removal from a model, making it as if the data never existed, approximate unlearning is more efficient but less precise. Each method presents its own set of challenges, particularly in verification and susceptibility to adversarial attacks. As these techniques solidify their place in contemporary AI systems, research into their verification becomes critical, ensuring the ethical application of AI in ever-shifting data contexts.

The Research

With the growing significance of machine unlearning, we strive to conduct comprehensive research on the current landscape in this field, as illustrated in the following figure. Our goal is to offer a systematic perspective on the ongoing advancements in this area and to highlight its critical role in the constantly evolving machine learning ecosystem.
This image has an empty alt attribute; its file name is unlearning_diagram-1.png
Details of the research themes include:
  • Federated Machine Unlearning: Federated Machine Unlearning builds upon the foundational concepts of Federated Learning, which decentralizes machine learning by training on multiple devices or servers without centralizing the data. In traditional settings, models are trained using data that is pooled at a centralized location. Federated Learning, on the other hand, trains models at the data source, be it individual devices or local servers. Federated Machine Unlearning extends this paradigm by not only decentralizing the learning process but also incorporating mechanisms to “unlearn” or forget specific data points or patterns across these federated nodes. The significant advantage here is twofold: it respects user privacy by never moving or exposing individual data, and it provides a distributed mechanism to rollback or exclude particular data influences, which can be especially important if a data source is compromised or found to be unreliable.
  • Fairness Issues in Machine Unlearning: As machine learning models aim to make fair, unbiased decisions, the process of unlearning adds another layer of complexity to the fairness discourse. When data is removed or modified in the unlearning process, there’s a potential risk that the model might inadvertently introduce or exacerbate biases. If the “unlearned” data pertains to a particular group, demographic, or scenario, its removal could skew the model’s understanding and predictions. The fairness issue in machine unlearning isn’t just about the data that’s being removed but also concerns the data that remains and how it influences the model’s outputs. This theme emphasizes the need for rigorous checks and balances when implementing machine unlearning, ensuring that the pursuit of data privacy and adaptability doesn’t come at the cost of fairness and equity in AI systems.
  • Machine Unlearning for Large Language Models: Large language models, like GPT and BERT, are revolutionizing the field of natural language processing due to their vast knowledge bases and fine-tuning abilities. However, their size and complexity present unique challenges when it comes to machine unlearning. Removing or adjusting specific information from these models without retraining them entirely is a daunting task, given the intricate interdependencies within their neural architectures. The necessity for unlearning in language models can arise from various reasons: to delete outdated information, correct erroneous beliefs, or ensure the model isn’t perpetuating harmful biases. This research theme delves into the methodologies and techniques tailored for these behemoths, aiming to make them more adaptive and responsible without compromising their formidable capabilities.

Related Publications

  1. Youyang Qu, Xin Yuan, Ming Ding, Wei Ni, Thierry Rakotoarivelo, and David Smith. “Learn to Unlearn: A Survey on Machine Unlearning.” (ArXiv version available, to be online soon).