Sequence data and annotations

The NBDL aims to deliver essential reference data to support taxonomic identification, particularly for DNA fragments derived from environmental samples. Key to this objective is the provision of high-quality sequences that enable identification through molecular techniques. To efficiently generate sequence data across the vast diversity of Australian species, we’ve developed a compact, high-throughput genome skimming platform capable of capturing DNA sequences from all organisms—including microbes, fungi, plants, insects, mammals, and birds.

For animals, we are focused on sequencing the mitochondrial genome and nuclear ribosomal markers. For plants and macroalgae, the focus will be on the plastid (chloroplast) genome as well as nuclear ribosomal markers.

Our bioinformatic workflow provides a reproducible way to generate high-quality genomic datasets from short-read sequencing. It begins by validating and standardising sample metadata, then assembles reads into contigs with guidance from reference sequences. Assemblies are annotated and assessed for completeness and reliability. The pipeline also constructs gene-based and whole-genome phylogenies, which are used to place sequences in context and support our review process. As part of this review, each sequence is checked for accuracy against the known identity of the specimen by taxonomic experts in the collection holding the specimen before release.