This activity aims to use MLAI to transform biological production and breeding systems through the development, extension and leveraging of MLAI approaches to advance analytical frontiers in areas with direct applications to animal and plant breeding. It is a vertical activity, spanning domain specific knowledge of biology and genomic structure, all the way to developing MLAI methods that capture uncertainty from high dimensional low sample size data. Its outputs will be driven use cases in crops, aquaculture, and livestock.
The following analytical challenges were identified as being of high interest for this activity:
- (integration) Data-integration: exploitation of multi-layer data, uncertainty, bias, automated annotation
- (structure) Algorithm Development: exploitation of known structure (pedigree, LD) in input data, utilisation of domain knowledge in algorithm development
- (biology) Biological inference: translating ML outputs to biological understanding that can be used as interventions in biological systems (interactions (GxG, GxE), pathways)
- (genomes) ML with graph analytics for “omic” applications (pangenomes, transcriptomic graphs)
We focus on three major use cases:
- Crops – The main outcome of this use case is to improve CSIRO’s research expertise in plant breeding. The data includes 100 -1000’s of genotypes, adaptive and developmental traits, yield and productivity traits. The family structure / pedigree information is known, and predictive tools need to include ‘Omic’ data layers, SNPs, transcriptomes, proteomes. A key question is the environmental response, based on controlled and field conditions, over multiple sites and years.
- Aquaculture – Breeding for aquaculture has data with 1000’s of genotypes, and biometric traits, including diseases. The family structure / pedigree is known, and researchers are interested to understand the effect of SNPs as well as microbiomes on yield.
- Livestock – CSIRO has a large database of 20,000 animals, including data on growth, carcass traits, fertility, and behaviour with known family structure/ pedigree. The data integration challenge involves 40 million SNPs, transcriptomes, and the metabolome.