Our data generation

With approximately 150,000 named species in scope, building the NBDL is a monumental task that requires us to process many thousands of individual specimens each year.

This has been made possible by protocols developed at CSIRO’s National Research Collections Australia that miniaturise and automate high-throughput, shallow genome sequencing (also known as ‘genome skimming’). With the help of cutting-edge liquid-handling equipment and robotics, we can prepare our DNA libraries for sequencing quickly and efficiently while minimising costs.  This approach works at scale for almost any organism or specimen age, and for most preservation methods. What began as an innovative project in the Australian National Insect Collection has now become a transformative part of everyday operations here at CSIRO.

To process these data, we have developed a reproducible bioinformatic workflow that converts raw DNA sequencing data from thousands of specimens into high-quality, project-ready genomic resources. Our workflow uses a combination of careful manual curation and automated processes to validate and standardise sample metadata, assemble DNA sequences, identify and scaffold genomic targets, and annotate genes. We use phylogenetic analyses to corroborate evidence of sequence identity and annotation quality among specimens and against published and verified references. 

We then work with taxonomic experts from our partner collections to conduct a thorough review of the taxonomic, specimen and DNA sequence data. We use the outcomes of this multi-step review process to ensure a high level of confidence in the quality of our data releases, and strive to consistently adhere to applicable standards and best practices regarding nomenclature, file formats, and data stewardship throughout our workflow and in our data outputs.

You can read more about the benefits of generating data from collection specimens, including reference data integrity and the extended specimen concept, as well as the hosting of additional data in the NBDL, in our FAQs.