FAQs
The National Biodiversity DNA Library is an initiative to generate the comprehensive reference sequences needed to identify Australia’s biodiversity from DNA. The initiative is a collaboration between CSIRO, Australia’s research collections and a range of philanthropic and government agencies. The NBDL will generate data from expertly identified specimens held in Australia’s museums and herbaria, bringing new capability to biodiversity research, and enabling the uptake of new technologies to describe and detect changes in biodiversity from DNA that organisms leave behind in the environment. The NBDL will launch as a digital platform with its first available data in 2025.
To transform environmental monitoring by enabling faster, more accurate, and more scalable ways to identify species. But to be more specific, to enable Australian species to be identified with high accuracy based on their unique DNA sequences. This includes via environmental DNA (eDNA), DNA barcoding and other DNA-based diagnostic techniques.
The NBDL is open access. That means it can be used by anybody. The most immediate users are likely to be environmental scientists, but its impact will be felt across the agricultural, health and taxonomic sectors. We also anticipate many users from the education sector.
Commercial entities may use the NBDL to generate results, but the NBDL data cannot be sold commercially. Full attribution of the NBDL must be made, along with data reuse licenses.
You will be able to browse, view and download available sequence data, and view information about the specimens from which the sequences were derived using the NBDL platform. If you have obtained one or more DNA sequences from a specimen or environmental sample of interest, you will be able to query that against available taxonomic datasets in the NBDL using our in-built query tool using the BLAST algorithm, or offline by downloading relevant sequences from the NBDL and using an analysis platform of your choice.
The NBDL is targeting DNA sequences suited to taxonomic identification of unknown material, with a focus on overcoming the significant reference data challenges of interpreting very small fragments of DNA recovered from the environment. This means that for animals, the NBDL is generating full (or mostly complete) mitochondrial genome DNA sequences, and nuclear ribosomal markers, and for plants and macroalgae the NBDL is generating full (or mostly complete) plastid (chloroplast) genome DNA sequences and nuclear ribosomal markers. We won’t be hosting data solely from the traditional COI barcoding region unless it’s part of more comprehensive data. Our goal is to host data that can match many primer sets, and we don’t want to be limited to this marker alone.
The NBDL offers authoritative sequences that have been generated from specimens identified by taxonomic experts and are available in Australian public research collections (e.g. Australian museums and herbaria) for further verification if needed. Sequences will be served alongside basic specimen metadata and many will be linked to an image of the specimen they were generated from.
We have sought to include three examples of each species where possible.
GenBank makes available sequences submitted to it from researchers throughout the world, for any genomic targets that would support any use. It is a remarkable resource but does not offer quality control of taxon identities or require any supporting evidence of the basis of the identification. It is also not built for the express purpose of enabling identifications and hence has many taxonomic gaps. As a result, identifications made through GenBank often cannot be made with high confidence or accuracy. The NBDL is being built specifically to overcome the challenges that practitioners routinely face when using publicly available repositories to identify taxa in Australian contexts, with respect to taxonomic accuracy, data integrity, taxonomic completeness and marker coverage.
Our current taxonomic targets include marine vertebrates, marine invertebrates, macroalgae, and seagrasses, focal terrestrial taxa, and priority plant pests. All of our campaigns are externally funded and target specific use cases. For example, the marine vertebrate campaign aims to include all described species in Australia, whereas the marine invertebrate, macroalgae and seagrass campaign aims to allow taxonomic identification at higher taxonomic levels, with total species representation for some focal groups. We will also have some opportunistic inclusions from researchers, as we continue to build the library.
No, the NBDL does not include microbes (e.g. bacteria, archaea). NBDL reference sequences are generated from a physical specimen to provide an authoritative reference, and in many cases of microbes, it is difficult or impossible to keep a specimen.
No, the NBDL’s purpose is to enable existing species to be identified. It is an effort to make the wealth of knowledge created by taxonomic experts accessible to a wider audience via the medium of DNA. However, by supporting accurate and rapid identification of species that are already named, these data will assist the taxonomic community in describing new Australian species.
We expect to go live with our first data in 2025. These first data offerings will provide the community with an opportunity to provide feedback on the NBDL web portal and further development of this infrastructure to support our end users.
The NBDL works via campaigns that support specific use cases and define particular targets for sequencing. We work with industry, government and the philanthropic sector to resource campaigns. If there are priority groups of organisms that you have an interest in supporting, please contact us.
Given the stringent quality requirements for data inclusion, the NBDL will generate most data de novo. If you have whole mitochondria or plastid genome data generated from specimens that are catalogued in an Australian museum or herbarium, they may be suitable for inclusion. Please contact us to discuss the inclusion of these data in the NBDL.
The NBDL itself does not carry out eDNA studies, or store eDNA data or samples. Sometimes researchers associated with us may carry out or collaborate in this type of work, but it is not our goal.
No, but eDNA sequence data and DNA-derived species occurrence records can be published at Sequence Read Archive and GBIF, respectively. There are also ongoing efforts to make eDNA data FAIR (Findable, Accessible, Interoperable, Reusable) by developing best practice guidelines for formatting and publishing eDNA data.
The NBDL works closely with Bioplatforms Australia and complements many of their projects. Wherever possible, we work with BPA initiatives to make the generated data available for the same specimens.
The NBDL is generating new sequence data from specimens held in Australian museums and herbaria. The Australian Reference Genome Atlas (ARGA) is aggregating any sequence data available for Australian taxa from existing sources across the internet. The Applied Genomic Initiative (AGI) is generating reference genomes for particular applied outcomes. The Atlas of Living Australia (ALA) aggregates existing biodiversity occurrence data. ARGA and AGI will include sequences from specimens that haven’t been expertly identified or are not linked to a specimen in a public research collection. All these resources work together on different parts of the puzzle to support national biodiversity priorities.
The NBDL enhances the work of collections staff by making the wealth of knowledge created by taxonomic experts accessible to a wider audience via the medium of DNA. Having sequence data linked to a vouchered specimen forms part of the extended specimen concept, where an interconnected network of data can bridge disciplines and databases. The collections can access the newly generated data for research purposes, and the availability of this data through the NBDL may lessen the workload on collections for many data and tissue requests. Through partnering with national environmental monitoring initiatives such as the NBDL, collections can showcase the vital role they play in understanding our natural world.
This is a good question and has been thoroughly investigated (see some references below). The short answer is that DNA-based identifications can be very powerful and are made more accurate by having a better quality library. However, some issues limit the accuracy of sequence-based identifications, including technical problems such as marker suitability, primer mismatches, and contamination, or evolutionary processes that cause inaccuracy like species hybridisation, cryptic speciation and unresolved taxonomy. Improved and more complete DNA reference libraries, like the NBDL, can help overcome some of these problems and make for more accurate sequence-based identifications.
Cristescu 2014, Trends in Ecology and Evolution: https://doi.org/10.1016/j.tree.2014.08.001
Hering et al. 2018, Water Research: https://doi.org/10.1016/j.watres.2018.03.003
Ruppert et al. 2019, Global Ecology and Conservation: https://doi.org/10.1016/j.gecco.2019.e00547
Thomsen and Willerslev 2015, Biological Conservation: https://doi.org/10.1016/j.biocon.2014.11.019
No, we are creating a digital DNA resource to enable and support environmental monitoring in Australia. If collections that contributed the specimens want to work collaboratively on the data, we will facilitate connections where we are able.
Although it’s free to use, the NBDL relies on ongoing funding to keep it available and up to date. We need to demonstrate that it is widely used, and an important way to do that is to report how many users we have. We can also improve the NBDL by understanding how people use it. Finally, registration means that you can receive a customised interface where your previous settings and analyses are retained.
The broader objective is to enable faster, more accurate, and more scalable ways to identify species and to map their distributions. This is important because this information is needed to underpin evidence-based decision-making for environmental management. This may include detecting pest or introduced species, and documenting community composition and environmental change.
CSIRO, Australia’s national science agency, is leading the NBDL project, but the work is only possible with many partners, especially the collections community.
There are about 150,000 named macroscopic species in Australia. We think it may take around ten years to complete the majority of the task, but this particularly depends on the availability of appropriate specimens.
Because the NBDL is combining data from a wide range of collections, we needed to use a single standardised taxonomy to be able to combine it all. After consulting among researchers and collections, we chose to use the Catalogue of Life.
Occasionally, there is a need to host data from specimens originating outside Australian land and waters. This is usually to include possible biosecurity threats that need proactive monitoring. Sometimes it’s because there are no specimens available from Australia, even if we know the species occurs here.