Data Staging Platform
Efficient and scalable delivery of high quality, rich, and reliable data
There is a tremendous growth of collected data both from local and remote sensing sources. This is fuelled by the rapid deployments of networked sensors and Internet-of-Things (IoT) technologies. This trend is characterised by wide heterogeneous types of data sources (e.g. field humidity sensor device, satellite imaging device, on-board drone sensor), and communication systems (e.g. wireless/cellular network, web portals). These sources are often distributed across sizeable areas, involve many third-parties, include multi-tiered device capabilities, and suffer from the intrinsic chance of hardware/software or environmentally-induced failures.
Given these characteristics, transforming these vast data into meaningful information face the following challenges:
- Trust – Can we guarantee the origin and integrity of the data? Can we account for all operations performed on the data before they got to us? Who performed them, how did they impact precision & accuracy?
- Reliability and Scalability– Will the data be there when needed? Could alternate data sources be used when the primary ones fail? How to scale the data-to-information processes to a large number and types of data and systems? Which computation (e.g. filtering, analytics) should be run at the edge (i.e. embedded node), at the core (e.g. cloud server), or in-between (e.g. gateway/concentrator)?
- Uniformity and Richness – Could we expect temperature data from different sensors to be in the same format, unit, and have similar precision and accuracy? Could new richer data be automatically computed from operations on primary data, thus creating a new virtual sensor? (e.g. average the pollutant concentrations from scattered points into a total estimate for the whole region).
This project will develop a “Stating service for Locally and Remotely sensed Data (SLRD)” horizontal service, which will transform raw data into “product-ready” data suitable for other services or applications within Digiscape. They will then apply model and analysis on these data to turn them into meaningful information. The SLRD service will include novel research outcomes within a sound software architecture, and will:
- increase the confidence in the data by providing three modules, i.e. Data Provenance (DP), Quality (DQ), and Security (DS). They will automate the processes of traceability, quality validation, authentication, encryption, and privacy-preserving of the data.
- reduce the complexity in getting formatted, coherent, or rich combined data by providing a Data Transformation (DT) module, which will allow in-line operations on data streams being collected to combine them, format them, or apply some analytics on them.
- improve the availability of the data by implementing a distributed Management Layer (ML), which will plan and deploy the functions of the above modules over the various distributed entities composing the data collection (e.g. deploy an analytic DT function on an embedded node at the edge of the collection system). ML will consider factors such as cost, battery usage, processing/storage resources, interface APIs, and will dynamically react to their variations to maximise data availability.
Our Data Staging Platform will…
- ensure that the knowledge and information derived from the staged data is trusted and traceable
- guarantee data integrity/security, and the privacy of its sources
- provide more resilient information despite environmental effects (e.g. storms) and system limitations (e.g. transient access failure, battery life)
- allow third-party services and applications to have easier access to richer data and better efficiency by re-using common building blocks across application domains (i.e. “develop once, use many times”)
Contact: Thierry [dot] Rakotoarivelo [at] data61 [dot] csiro [dot] au