Project 9

August 5th, 2023

Controversy-Aware Machine (Un)learning for an AI-based Science CoPilot

Project location:

Eveleigh (NSW) and/or Clayton (VIC)

Desirable skills:

• A degree in computer science, information technology, or a related area.
• Proficiency in programming languages such as Python and experience in developing and implementing AI models and algorithms.
• Proficiency in machine learning techniques and algorithms.
• Experience in working with large datasets and data pre-processing.

Supervisory project team:

Ming Ding, Youyang Qu, Thierry Rakotoarivelo, James Hoang, Zhenchang Xing and Sherry Xu

Contact person:

Principal Research Scientist, Data61

Project description:

Unprecedented leaps in data volume and AI/ML advances in the last few years has brought closer the reality of a AI laboratory assistant for scientists, aka a Science CoPilot, much like the J.A.R.V.I.S assistant in famous blockbuster movies. One much needed feature of such a Science CoPilot will be the ability to sift through the ever rapidly growing pool of scientific knowledge and associated materials, to distil, summarise and suggest pertinent facts, insights, theories, results, etc that would be relevant to a scientist’s research topic. The sheer magnitude of this information landscape, the multidisciplinary nature of modern science, and the dynamics of the Scientific Process make such feature of Knowledge Synthesis highly challenging for an AI-based Science Copilot. On that last point, we identified 3 specific challenges resulting from Science’s nature to constantly evolve, iterate, cumulate, and update itself.

Privacy Considerations

Most research domains demand stringent ethical adherence, especially when potential privacy violations or personal identification concerns arise. The Science Copilot must uphold these ethical guidelines, safeguarding the integrity of research outcomes while utilizing sensitive data. Machine unlearning techniques can be employed to remove sensitive data samples from datasets or even pre-trained AI models.


Adaptable Research Paradigms

As scientific inquiries are highly dynamic, consensus and controversies constantly shift and arise, both at the forefront of research but also on older topics, e.g. novel tools may lead to new experiments which invalidate or strengthen past scientific consensus. Thus, news contradictory findings can introduce ambiguity to an AI-driven CoPilot. Machine unlearning techniques can help recalibrate its knowledge base in response to scientific consensus shifts and paradigm-challenging findings, ensuring that the Science Copilot remains au fait with contemporary research with accuracy and timeliness.

Avoidance of Model Overfitting

Continual learning from an expanding dataset presents the risk of model overfitting, wherein the AI CoPilot’s performance degrades as it becomes hyper-specialized to outdated or marginal data (i.e. pieces of information in the large corpus of scientific knowledge). Machine unlearning can safeguard against this peril by enabling the AI to unlearn or downplay information that is less relevant or useful in regards to the current scientific consensus.


The goal of this PhD project is to study the above 3 challenges, and explore, develop, and evaluate Machine Unlearning approaches to address them. Through this research, we seek to contribute to the emergence of AI-based Science CoPilots, which would enhance research accuracy, reproducibility, and ethical considerations, while remaining at the forefront of scientific consensus.

Research Activities
– Year 1: Foundation, Literature Review, Technical Exploration (data, parameters, trade-offs), Use-case/Scenario Design, Problem Formulation, and Publications.
– Year 2: Solution Design, Algorithm and Prototype Design, Testing and Evaluation, Integration with other D61 (or third party, as relevant) Prototypes for AI-Based CoPilot, and Publications.
– Year 3: Deployment in Real-World trial (potentially as part of a larger integrated offering), Refinement and Iteration, Thesis Manuscript and Defence Preparation.

Expected Outcomes

Research publications in target venues, Prototype and demonstration of proposed approach either as a standalone feature or as part of an integrated AI-based CoPilot with other D61 projects and PhD students, collaboration with D61 and CSIRO group specifically cross-disciplinary teams from which use-cases of scientific inquiries would be sourced.