Using Collection Identifiers

Overview | Finding an identifier in the user interface | Finding an identifier using the web services API

Overview

This page covers how different types of collection identifier work.  Depending on your needs you might want the latest version of a collection, or you might want a specific version.  Understanding how different identifiers resolve to different versions will help you find the correct metadata and files.

For example, if you look at a software collection with a number of versions (https://data.csiro.au/collection/csiro:11028) at the time of writing it is version 22:

CSIRO; Bolger, Matt; Cleary, Paul; Hetherton, Lachlan; Rucinski, Chris; Thomas, David; Watkins, Damien; Zhang, Zikai; Sankaranarayanan, Nirupama; Subramanian, Rajesh; Nguyen, Dang Quan; Oakes, Nerolie; Xie, Ping (2019): Workspace: Scientific Workflow Platform. v22.CSIRO. Software Collection. https://doi.org/10.25919/3ezk-ha96

If you needed to obtain a specific version via the web services API you need to know how to find the correct identifier.

The first section on finding an identifier in the user interface has examples of using the identifiers to retrieve metadata or a list of files.

The second section on finding an identifier using the web services API does not have examples, but the examples from the first section should cover these identifiers.

Finding an identifier in the user interface

There are two areas that you will find a collection identifier in the UI:

  • The “Cite as” attribution statement
  • The URL

“Cite as” attribution statement

These identifiers can be different formats.

DOI

Screenshot of the text box titled "Cite as", auto filled with citation details, highlighting the part of the URL that reads "10.25919/3ezk-ha96"

This is indicated by the domain of the URL, “doi.org”.  The value you want for this version is “10.25919/3ezk-ha96”.

A DOI will resolve to a specific version of the data, even if metadata is updated.

To retrieve collection metadata

Construct a URL as follows:

https://data.csiro.au/dap/ws/v2/collections/{identifier}

e.g.

https://data.csiro.au/dap/ws/v2/collections/10.25919/3ezk-ha96

Use either content negotiation or suffixes to specify JSON or XML format:

To retrieve files

The web services API will only allow you to use a DOI to retrieve collection metadata, i.e. URLs in the following format

https://data.csiro.au/dap/ws/v2/collections/{identifier}

First retrieve the metadata, e.g. https://data.csiro.au/dap/ws/v2/collections/10.25919/3ezk-ha96.json

In the response, use the “data” key/value pair to find the URL for the /ws/v2/collections/{identifier}/data endpoint, e.g.:

  "data": "https://data.csiro.au/dap/ws/v2/collections/53321/data"

The identifier returned here is not the DOI, but a version specific “Data Collection ID”.

Handle (ANDS PID)

Some DAP collections will have a Handle (also referred to as “ANDS PID”) instead of a DOI, e.g. https://data.csiro.au/collection/csiro:49262

Screenshot of the text box titled "Cite as", auto filled with citation details, highlighting the part of the URL that reads "102.100.100/390056?"

This is indicated by the domain of the URL, “handle.net”.  At the time of writing Handles cannot be used with the web services API.

Fedora PID

DAP collections that are not publicly accessible will have a DAP URL in the attribution statement using the format:

https://data.csiro.au/collection/{Fedora PID}

All DAP collections have Fedora PIDs.  A public example is “csiro:11028” for https://data.csiro.au/collection/csiro:11028.  Fedora PIDs use the format:

csiro:{identifier}

where “{identifier}” may be numeric or text.

Fedora PIDs will resolve to the most recent version of a collection.

Some uses of Fedora PID append a version, e.g. “csiro:11028v20”.  The lower case “v” and any digits that follow are not part of the Fedora PID.  The web services API will accept Fedora PIDs with the version appended, and this will resolve a specific version rather than the latest.

To retrieve collection metadata

Construct a URL as follows:

https://data.csiro.au/dap/ws/v2/collections/{Fedora PID}

e.g.

https://data.csiro.au/dap/ws/v2/collections/csiro:11028

Use either content negotiation or suffixes to specify JSON or XML format:

To retrieve files

Either retrieve the collection metadata to find the “data” key/value pair (see DOI section above), or construct a URL with the following format.

https://data.csiro.au/dap/ws/v2/collections/{Fedora PID}/data

e.g.

Since the latest version will be retrieved, the file URLs returned by the /data endpoint are not persistent and will stop working if the DAP collection is updated.

URL

Typically if you navigate to a DAP collection landing page there will be a Fedora PID in the URL, e.g.

URL "https://data.csiro.au/collection/csiro:11028?q=workspace&_st=keyword&_str=13&si=1" with the following part highlighted "csiro:11028|"

The value you want here is “csiro:11028”

Depending on how you navigated to the URL a version number might have been appended, e.g. if you followed the DOI to an older version of Workspace: https://doi.org/10.25919/5f559340b2fab you would see the URL:

URL "https://data.csiro.au/collection/csiro:11028v20" with the following part highlighted "csiro:11028"

Appending a lower case “v” and a valid version number (which can be padded with up to two leading zeros) will resolve a specific version of a collection.  The Fedora PID without an appended version resolves to the latest version of the collection.

To retrieve collection metadata

Construct a URL as follows:

https://data.csiro.au/dap/ws/v2/collections/{Fedora PID}

e.g.

https://data.csiro.au/dap/ws/v2/collections/csiro:11028

Use either content negotiation or suffixes to specify JSON or XML format:

To retrieve files

Either retrieve the collection metadata to find the “data” key/value pair (see DOI section above), or construct a URL with the following format.

https://data.csiro.au/dap/ws/v2/collections/{Fedora PID}/data

e.g.

Since the latest version will be retrieved, the file URLs returned by the /data endpoint are not persistent and will stop working if the DAP collection is updated.

Finding an identifier using the web services API

There are several places you might find an identifier in the web services API:

  • From query results
  • From collection metadata
  • From a collection’s “/versions” endpoint.

From query results

e.g. a query for “Workspace”.

https://data.csiro.au/dap/ws/v2/collections?q=Workspace

In a JSON response the first item at the time of writing contains several identifiers

Screenshot of page of JSON script. In these following lines, the numbers are highlighted
"identifier": "csiro:11028
"dataCollectionID": 55321
"self": "https://data.csiro.au/dap/ws/v2/collections/55321
"asdsPid": "102.100.100/16136"
"doi": "10.25919/3ezk-ha96"

Fedora PID

Resolves to latest version.

      "id": {
        "identifierType": "Fedora PID",
        "identifier": "csiro:11028"
      },

Data Collection ID

Resolves to a specific version.

      "dataCollectionId": 53321,
      "self": "https://data.csiro.au/dap/ws/v2/collections/53321",

Handle (ANDS PID)

Resolves to the latest version.  Cannot be used directly with the /data endpoint for a collection, instead retrieve the collection metadata and use the “data” key/value pair.

      "andsPid": "102.100.100/16136",

DOI

Resolves to a specific version of data (metadata updates allowed).  Cannot be used directly with the /data endpoint for a collection, instead retrieve the collection metadata and use the “data” key/value pair.

      "doi": "10.25919/3ezk-ha96",

From collection metadata

When resolving a collection using any identifier with a URL format as follows:

https://data.csiro.au/dap/ws/v2/collections/{identifier}

several identifiers will be included in the response, e.g. for Workspace https://data.csiro.au/dap/ws/v2/collections/csiro:11028.json

Screenshot of page of JSON script. In these following lines, the numbers are highlighted:
"identifier": "csiro:11028"
"dataCollectionID": 53321
"Self": "https://data.csiro.au/dap/ws/v2/collections/55321"
"andsPid": 102.100.100/16136"
"doi": "10.25919/3ezk-ha96"

These are the same identifiers described above in “From query results”.

From the /versions endpoint

Using either a Fedora PID (latest version), Fedora PID with a version appended (specific version), or a Data Collection ID (specific version) a list of a collection’s accessible versions is available:

https://data.csiro.au/dap/ws/v2/collections/{identifier}/versions

e.g. for Workspace https://data.csiro.au/dap/ws/v2/collections/csiro:11028/versions.json

Screenshot of page of JSON script. In these following lines, the numbers are highlighted:
"self": "https://data.csiro.au/dap/ws/v2/collections/53321"
"https://data.csiro.au/dap/ws/v2/collections/49342"

"https://data.csiro.au/dap/ws/v2/collections/46264|"

These “Data Collection ID” values need to be obtained from the version URLs.  These are version specific identifiers.

To get a list of files, either:

  • Retrieve the collection’s metadata using the URL and then use the “data” key/value pair.
  • Append “/data” to the URL and call the endpoint directly.