FAQs

What is the NBDL?

The National Biodiversity DNA Library is an initiative to generate the comprehensive reference sequences needed to identify Australia’s biodiversity from DNA. The initiative is a collaboration between CSIRO, Australia’s research collections and a range of philanthropic and government agencies The NBDL will generate data from expertly identified specimens held in Australia’s museums and herbaria, bringing new capability to biodiversity research, and enabling the uptake of new technologies to describe and detect changes in biodiversity from DNA that organisms leave behind in the environment. The NBDL will launch as a digital platform with its first available data in 2025.

What is its purpose?

To transform environmental monitoring by enabling faster, more accurate, and more scalable ways to identify species. But to be more specific, to enable Australian species to be identified with high accuracy based on their unique DNA sequences. This includes via environmental DNA (eDNA), DNA barcoding and other DNA-based diagnostic techniques.

Who is it for?

The NBDL is open access. That means it can be used by anybody. The most immediate users are likely to be environmental scientists, but its impact will be felt across the agricultural, health and taxonomic sectors. We also anticipate many users from education.

Can it be used commercially?

Commercial entities may use the NBDL to generate results, but the NBDL data cannot be sold commercially. Full attribution of the NBDL must be made, and along with data reuse licenses.

How do I use it?

You will be able to browse, view and download available sequence data, and view information about the specimens from which the sequences were derived using the NBDL platform. If you have obtained one or more DNA sequences from a specimen or environmental sample of interest, you will be able to query that against available taxonomic datasets in the NBDL using our inbuilt query tool using the BLAST algorithm, or offline by downloading relevant sequences from the NBDL and using an analysis platform of your choice.

What kinds of DNA sequences does the NBDL contain? Why doen’t the NBDL include just COI?

The NBDL is targeting DNA sequences suited to taxonomic identification of unknown material, with a focus on overcoming the significant reference data challenges of interpreting very small fragments of DNA recovered from the environment. This means that for animals, the NBDL is generating full (or mostly complete) mitochondrial genome DNA sequences, and nuclear ribosomal markers, and for plants and macroalgae the NBDL is generating full (or mostly complete) plastid (chloroplast) genome DNA sequences and nuclear ribosomal markers. We won’t be hosting data soley from the traditional COI barcoding region unless its part of more comprehensive data. Our goal is to host data that can match to many primer sets, and we don’t want to be limited to this marker alone.

What is so special about NBDL sequences?

The NBDL offers authoritative sequences that have been generated from specimens identified by taxonomic experts and are available in Australian public research collections (e.g. Australian museums and herbaria) for further verification if needed. Sequences will be served alongside basic specimen metadata and many will be linked to an image of the specimen they were generated from.

How many examples of each species are included in the NBDL?

We have sought to include three examples of each species where possible.

GenBank already has lots of sequences. What’s different about the NBDL?

GenBank makes available sequences submitted to it from researchers throughout the world, for any genomic targets that would support any use. It is a remarkable resource but does not offer quality control of taxon identities or require any supporting evidence of the basis of the identification. It also is not built for the express purpose of enabling identifications and hence has many taxonomic gaps. As a result, identifications made through GenBank often cannot be made with high confidence or accuracy. The NBDL is being built specifically to overcome the challenges that practitioners routinely face when using publicly available repositories to identify taxa in Australian contexts, with respect to taxonomic accuracy, data integrity, taxonomic completeness and marker coverage.

What taxonomic groups will be represented in the NBDL?

Our current taxonomic targets include marine vertebrates, marine invertebrates, macroalgae, and seagrasses, focal terrestrial taxa, and priority plant pests. All of our campaigns are externally funded and target specific use cases. For example, the marine vertebrate campaign aims to include all described species in Australia, whereas the marine invetebrate, macroalgae and seagrass campaign aims to allow taxonomic identification at higher taxonomic levels, with total species representation for some focal groups. We will also have some opportunistic inclusions from researchers, as we continue to build the library.

Does it include microbes?

No, the NBDL does not include microbes (e.g. bacteria, archaea). NBDL reference sequences are generated from a physical specimen to provide an authoritative reference, and in many cases of microbes it is difficult or impossible to keep a specimen.

Are you trying to discover new species?

No, the NBDL’s purpose is to enable existing species to be identified. It is an effort to make the wealth of knowledge created by taxonomic experts accessible to a wider audience via the medium of DNA. However, by supporting accurate and rapid identification of species that are already named, these data will assist the taxonomic community to describe new Australian species.

When will the NBDL go live?

We expect to go live with our first data in 2025. These first data offerings will provide the community with an opportunity to provide feedback on the NBDL web portal and further development of this infrastructure to support our end users.

How can I contribute to ensure a certain group of organisms are included?

The NBDL works via campaigns that support specific use cases, and define particular targets for sequencing. We work with industry, government and the philanthropic sector to resource campaigns. If there are priority groups of organisms that you have an interest in supporting, please contact us.

Can I submit my own data?

Given the stringent quality requirements for data inclusion, the NBDL will generate most data de novo. If you have whole mitochondria or plastid genome data generated from specimens that are catalogued in an Australian museum or herbarium, they may be suitable for inclusion. Please contact us to discuss inclusion of these data in the NBDL.

Does the NBDL carry out its own eDNA studies?

The NBDL itself does not carry out eDNA studies, or store eDNA data or samples. Sometimes researchers associated with us may carry out or collaborate in this type of work, but it is not our goal.

Will the NBDL store eDNA data?

No, but eDNA sequence data and DNA-derived species occurrence records can be published at Sequence Read Archive and GBIF respectively. There are also on-going efforts to make eDNA data FAIR (Findable, Accessible, Interoperable, Reusable) by developing best practice guidelines for formatting and publishing eDNA data.

How does this initiative relate to others supported by Bioplatforms Australia?

The NBDL works closely with Bioplatforms Australia and complements many of their projects. Wherever possible, we work with BPA initiatives to make these generated data available for the same specimens.

How are you different from other CSIRO initiatives, ARGA, AGI and ALA?

The NBDL is generating new sequence data from specimens held in Australian museums and herbaria. The Australian Reference Genome Atlas (ARGA) is aggregating any sequence data available for Australian taxa from existing sources across the internet. The Applied Genomic Initiative (AGI) is generating reference genomes for particular applied outcomes. The Atlas of Living Australia (ALA) aggregates existing biodiversity occurrence data. ARGA and AGI will include sequences from specimens that haven’t been expertly identified or are not linked to a specimen in a public research collection. All these resources work together on different parts of the puzzle to support national biodiversity priorities.

How does being involved in the NBDL initiative benefit its museum and herbarium partners?

The NBDL enhances the work of collections staff by allowing the wealth of knowledge created by taxonomic experts accessible to a wider audience via the medium of DNA. Having sequence data linked to a vouchered specimen forms part of the extended specimen concept, where an interconnected network of data can bridge disciplines and databases.The collections can access the newly generated data for research purposes and the availability of this data through the NBDL may lessen the workload on collections for many data and tissue requests. Through partnering with national environmental monitoring initiatives such as the NBDL, collections can showcase the vital role they play in understanding our natural word.

Are identifications based on DNA always accurate?

This is a good question and has been thoroughly investigated (see some references below). The short answer is that DNA-based identifications can be very powerful and are made more accurate by having a better quality library. However, there are some issues that limit the accuracy of sequence-based identifications, including technical problems such as marker suitability, primer mismatches, and contamination, or evolutionary processes that cause inaccuracy like species hybridisation, cryptic speciation and unresolved taxonomy. Improved and more complete DNA reference libraries, like the NBDL, can help overcome some of these problems and make for more accurate sequence-based identifications.

Cristescu 2014, Trends in Ecology and Evolution: https://doi.org/10.1016/j.tree.2014.08.001

Hering et al. 2018, Water Research: https://doi.org/10.1016/j.watres.2018.03.003

Ruppert et al. 2019, Global Ecology and Conservation: https://doi.org/10.1016/j.gecco.2019.e00547

Thomsen and Willerslev 2015, Biological Conservation: https://doi.org/10.1016/j.biocon.2014.11.019

Are you just using the data to create a mega-publication?

No, we are creating a digital DNA resource to enable and support environmental monitoring in Australia. If collections who contributed the specimens want to work collaboratively on the data, we will facilitate connections where we are able.

Why do I need to register to use the NBDL? Why will I have to login?

Although it’s free to use, the NBDL relies on ongoing funding to keep it available and up to date. We need to demonstrate that it is widely used, and an important way to do that is to report how many users we have. We also can improve the NBDL by understanding how people use it. Finally, registration means that you can receive a customised interface where your previous settings and analyses are retained.

What problems does the NBDL address?

The broader objective is to enable faster, more accurate, and more scalable ways to identify species and to map their distributions. This is important because this information is needed to underpin evidence-based decision making for environmental management. This may include detecting pest or introduced species, and documenting community composition and environmental change.

Who is building the NBDL?

CSIRO, Australia’s national science agency, is leading the NBDL project, but the work is only possible with many partners, especially the collections community.

How long will it take?

There are about 150,000 named macroscopic species in Australia. We think it may take around ten years to complete the majority of the task, but this particularly depends on the availability of appropriate specimens.

The higher taxonomy isn’t what I’m used to- why?

Because the NBDL is combining data from a wide range of collections, we needed to use a single standardised taxonomy to be able to combine it all. After consulting among researchers and collections, we chose to use the Catalogue of Life.

Will the NBDL include species or specimens from outside Australia?

Occasionally, there is a need to host data from specimens originating outside Australian land and waters. This is usually to include possible biosecurity threats that need proactive monitoring. Sometimes its because there are no specimens available from Australia, even if we know the species occurs here.