Bioprediction

Transforming biological production systems.

This activity aimed to use MLAI to transform biological production and breeding systems through the development, extension and leveraging of MLAI approaches to advance analytical frontiers in areas with direct applications to animal and plant breeding. It was a vertical activity, spanning domain specific knowledge of biology and genomic structure, all the way to developing MLAI methods that capture uncertainty from high dimensional low sample size data. Its outputs are driven use cases in crops, aquaculture, and livestock.

Challenges

The following analytical challenges were identified as being of high interest for this activity:

(integration) Data-integration: exploitation of multi-layer data, uncertainty, bias, automated annotation
(structure) Algorithm Development: exploitation of known structure (pedigree, LD) in input data, utilisation of domain knowledge in algorithm development
(biology) Biological inference: translating ML outputs to biological understanding that can be used as interventions in biological systems (interactions (GxG, GxE), pathways)
(genomes) ML with graph analytics for “omic” applications (pangenomes, transcriptomic graphs)

Use Cases

We focused on three major use cases:

Crops – The main outcome of this use case was to improve CSIRO’s research expertise in plant breeding. The data includes 100 -1000’s of genotypes, adaptive and developmental traits, yield and productivity traits. The family structure / pedigree information is known, and predictive tools needed to include ‘Omic’ data layers, SNPs, transcriptomes, proteomes. A key question was the environmental response, based on controlled and field conditions, over multiple sites and years.
Aquaculture – Breeding for aquaculture has data with 1000’s of genotypes, and biometric traits, including diseases. The family structure / pedigree was known, and researchers were interested to understand the effect of SNPs as well as microbiomes on yield.
Livestock – CSIRO has a large database of 20,000 animals, including data on growth, carcass traits, fertility, and behaviour with known family structure/ pedigree. The data integration challenge involved 40 million SNPs, transcriptomes, and the metabolome.