Bioinformatics for Environomics

This project is now complete and you can read the abstract of the final report here.

Many of our research projects depend on high throughput DNA sequencing of either environmental samples or of the specimens in our biological collections. Our bioinformatics team is developing new tools and methods to solve the unique data analysis challenges created by high throughput sequencing.

Two people in white lab coats working in a scientific laboratory

Researchers in the laboratory working with samples. Credit: CSIRO, Parkville.

High throughput sequencing of biological collections involves working with degraded or fragmented DNA from specimens that are up to 150 years old. Bioinformatics challenges include assembling and annotating the resulting fragments of DNA sequences, as well as dealing with the complexities posed by pseudo-genes and nuclear integrated mitochondrial DNA segments. Ultimately, we aim to provide reference genomic DNA sequences for many of the 16 million biological specimens in the National Research Collections Australia.

We are also exploring many exciting research opportunities in high throughput sequencing of environmental DNA (eDNA), including estimating species abundance, assembling genomes, sparse matrix analysis, visualisation, and linking phenotype, space, time and abundance.

Project lead: Dr Annette McGrath

Banner image credit: https://www.flickr.com/photos/libertasacademica/6936924345/ under a Creative Commons Attribution 2.0. Full terms at https://creativecommons.org/licenses/by/2.0/