Linked Data

If you’ve ever had to combine two datasets together, you’ve probably faced the problem of making sense of the fields, values and even terms used. Imagine if we were to scale this from datasets on your computer, to data across your colleagues, team, organisation, up to state-wide or national-wide data, or even international datasets! Imagine the challenge of making sense of all of those datasets.

Source: Image courtesy of WIRED magazine.

At its essence, the issue is about enabling agreements about how to interpret and/or encode the data. This is much easier for an individual (assuming you are ok with agreeing with yourself). As more people and organisations are involved, the challenge becomes more complex often dealing with data silos or variations of it.

It is in this context that the idea of “Linked Data” was proposed by the inventor of the World Wide Web, Tim Berners-Lee.

What is Linked Data?

Linked Data is a set of practices to allow structured data to be published so that it can be interlinked on the web and support cross-dataset queries [1], much like how the Web works today but instead of documents, its for data. Linked Data is a slice of the overall Semantic Web stack and relies on web standards such as HTTP, URIs, and RDF to link bits of information together.

Hyper-text Transfer Protocol (HTTP) is the foundation of communication for the World Wide Web. It provides the protocol for applications and web servers to pass messages to each other. E.g. Web browsers submit HTTP requests to Web Servers and receives back content.

Uniform Resource Identifier (URI) schemes and Uniform Resource Locators (URLs) give identity (names) and help applications locate resources on the web. URIs and hyperlinks create web links.

Resource Description Framework (RDF), a W3C standard format, is used as a world-wide lingua-franca to express information and relationships. RDF allows statements to be expressed as triples, following – subject-predicate-object form. URIs provide unambiguous resource identifiers for subjects, predicates, and objects.

An example of a RDF graph. (Source:

e.g. Bob (subject) is a type of (predicate) Person (object)


rdf:type    <> .

A collection of RDF statements or triples is called a graph. Graphs are often stored and queried in RDF triple stores.

RDF provides a general data model which can be serialised into different formats like Turtle, RDF/XML, JSON-LD. Serialization in different formats allow use of the data in a wider range of tools. Conversely, formats like JSON and CSV now have RDF profiles [3,4] and can be transformed into RDF to interlink and integrate with other graphs for enhanced data discovery and use. Using RDF and Linked Data principles, a graph of knowledge is able to be constructed that is web-enabled and web-scalable.

Tim Berners-Lee outlined 4 principles for Linked Data:

1.Use URIs as names for things
2.Use HTTP URIs so that people can look up those names.
3.When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
4.Include links to other URIs. so that they can discover more things.

How Linked Data can help?

Implementing Linked Data principles can help solve the challenge of connecting up datasets at any scale, whether it is within a single data ecosystem in an organisation or across multiple organisations and institutions.

Linked Data vocabularies

Controlled vocabularies are a key element of many classification systems for and are typically published by specific organisations, domains, or communities of practice. Examples include chemical entities, bio-medical terminology, environmental science topic or subject headings and geological classifications. The emergence of Semantic Web and Linked Data technologies has provided some powerful tools for formalizing definitions, vocabularies, and ontologies, in forms that also support reasoning and inferencing. Simple Knowledge Organization System (SKOS) is allows easy construction of multilingual vocabularies and provides a standard way for people to represent thesauri, classifications, taxonomies and controlled vocabularies, using RDF.
The OzNome team has used SKOS and other Linked Data/Semantic Web technologies for solving data interoperability and integration challenges for a range of applications – water, energy, geology, as well as capability mapping. Contact us for more information about these solutions and how it could help your government agency or organisation develop and utilise Linked Data vocabularies.


Linked Data approaches and Provenance standards/ontologies can be used to help build foundations to capture, record and analyse provenance. This can be used in applications and domains that want to represent, exchange, and integrate provenance information generated in different systems and under different contexts. For example, the OzNome team is working at the moment to embed provenance in scientific workflows for Biodiversity Baseline Assessments project to enable reproducible and traceability of indicators, their inputs and workflow runs.


Spatial Linked Data

The OzNome team are developing Spatial Linked Data tools, methods and resources. This will enable linkages between spatial features and systems that utilise spatial features. To do so, the team is building systems to capture identity of features and feature collections (e.g. Persistable Web identifiers called URIs), systems to persist references to spatial feature assets (GeoJSON, WFS services, Shape files, etc.), APIs for access to these spatial features, and application connectors to consume these resources (e.g. plugins to QGIS/ArcGIS, and python and R libraries). Preliminary work on Spatial Linked Data is being implemented through the CSIRO Knowledge Network platform – check out for examples.


More on OzNome and Linked Data?

The OzNome team are working nationally and internationally with clients and partners on multiple Linked Data initiatives. Contact us for more details to know more about Linked Data and how this could help meet your project or business needs.
In the meantime, checkout this video for more about Linked Data.