Download data

Once you have made your species or specimen selections, clicking on one of the download buttons will open a dialog box where you can choose your preferred sequence data targets. Currently, you can choose between whole mitochondrial genomes or individual mitochondrial gene regions:

Image of download data initial screen

If you select whole genome data, you will be taken to another dialog box to choose one of the available file formats:

Image of download data for mitochondrial genome screen

You will then be asked to select a reason for downloading the data and the geographic region relevant to your data use:

Image of screen to enter reason for download

Your data selections will be downloaded to your default downloads folder per your browser settings.

Alternatively, you can choose to download an individual gene region. Selecting this option will take you to a dialog box where you can select one of the gene regions available for your selected specimens:

Image of download gene region screen

After selecting a gene region, followed by a reason for download and relevant geographic region, you can continue to download your selections. By default, every download of DNA sequence data includes a download of associated specimen metadata.  

Data download files

If you download genomes, your download will be a zip archive named in the format NBDL-{unique_download_ID}.zip.  The unique_download_ID is a randomly generated hexadecimal string. The archive will contain the following types of files:

  • Specimen metadata: This is a CSV-formatted file containing all the specimen metadata available in the NBDL for the downloaded records. The file name is in the format metadata-{unique_download_ID}.csv. The first row of the file is a header containing all the metadata field names used in the NBDL, many of which are displayed in the specimen page.  The second row is a header containing the equivalent Darwin Core Terms where applicable.  The following rows contain each specimen’s metadata details. For more information on the NDBL specimen metadata fields, go to Specimen metadata. The specimen metadata file is automatically included with all sequence downloads.
  • Genome sequence file: For each specimen selected, you will receive a separate FASTA-formatted file containing the full genome sequence available. This file is available when you select to download Genome Assembly, FASTA or Annotate genome assembly, FASTA and GFF in the Download dialog box. The filename is in the format: {NBDL_Unique_ID}_{organism}_{record_version}-{genome_type}.fa where:
    • NBDL_Unique_ID is the unique ID used in the NBDL, a 14-character identifier prefixed by NBDL-, e.g. NBDL-HK3R5BQVHXXBVT.
    • organism refers to the scientific name of the species
    • record_version refers to the release version for this sequence
    • genome_type refers to the type of genome sequenced, e.g. mt for mitochondrial genome.
  • Annotation file: For each specimen selected, you will receive a separate GFF-formatted file containing the gene annotations for the corresponding genomic sequence in the download. This file is only available when you select to download Annotate genome assembly, FASTA and GFF in the Download dialog box. The filename is in the format: {NBDL_Unique_ID}_{organism}_{record_version}-{genome_type}.gff. 
  • Gene regions file: For each specimen selected, you will receive a separate FASTA-formatted file containing all separate gene regions available. This file is only available when you select to download Annotated gene regions, FASTA in the Download dialog box. The filename is in the format: {NBDL_Unique_ID}_{organism}_{record_version}-{genomic_region}_genes.gff.  If you want a single gene region, select the download gene regions option in the first Download screen.

If you choose to download individual gene regions, your download will be a zip archive named NBDL{unique_download_ID}.zip. The archive will contain the following types of files:

  • Specimen metadata: This is a CSV-formatted file containing all the specimen metadata available in the NBDL for the downloaded records (as above).
  • Gene region file: This is a single FASTA-formatted file containing the selected gene region. If multiple specimens are selected, you will receive all individuals in a single FASTA-formatted file. The filename is in the format: {NBDL_Unique_ID}_{organism}_{record_version}-{gene_region}.fa where gene region refers to the gene or region name, eg. CYTB.