Skip to main content

Sensor Data Models and Managament

The SensorCloud data platform is a scalable cloud based data storage and web API for streaming and historical sensor data. It uses deep domain knowledge in environmental modelling for a shared information model, that can readily integrate heterogenous data. SensorCloud also provides analytics integration for streamlining analysis of streaming data.

sensorcloud

Introduction
The SensorCloud version 2 platform is a loosely coupled suite of components which come together to create a platform for ingesting, storing, analysing and delivering sensor data. SensorCloud is designed to manage time series data recorded by field instruments across many applications including:

  • Weather stations
  • Soil monitoring
  • Animal tracking and monitoring
  • Water quality

The platform provides a developer friendly REST Application Programming Interface (API) for interfacing any type of user interface such web sites and mobile apps. In addition, analytical tools can be connected using those same APIs.
Data Stream Types
The core data storage concept in SensorCloud is a Data Stream. Currently the SensorCloud platform manages four types of data streams:

  • Scalar (numerical)
  • Geolocation (longitude, latitude, altitude)
  • Vector (array)
  • Images (image/jpeg or image/png)
Webased display of oyster heart signal
Webased display of oyster heart signal
Geolocation data stream from a soil mapping sensor
Geolocation data stream from a soil mapping sensor

 

 

 

 

 

 

 

 

Architecture
The SensorCloud platform can be deployed in any cloud as there are no dependencies on cloud specific technologies. The architecture provides an event driven messaging layer for handling incoming sensor data streams, this is typically provided using the open source RabbitMQ. Sensor data is transformed as it arrives and stream processors can also be applied on arrival. Data is then pushed into the data storage system and also streamed to any real-time clients. The REST API interfaces directly with the storage layer to request historical time series. The platform architecture allows a scalable deployment from one embedded PC up to many VMs in a cloud platform.

onceptual Architecture of the core SensorCloud platform
onceptual Architecture of the core SensorCloud platform

Meta-Data
SensorCloud provides a detailed meta-data model targeted at field deployed sensors. The meta-data model provides key concepts of Platforms, Sensors and Deployments. In addition important details about each Data Stream are required. Controlled vocabularies are used for unit of measure and observed property fields. These properties can be linked to an externally managed linked data registry.
Data Ingestion Framework (SMG)
Sensor Messaging Gateway (SMG) provides a message oriented framework used to manage data ingestion of many heterogeneous data streams. SMG uses a modular architecture to allow extensions to be added for new data sources. Data source modules can be specific to a particular sensor or generic such as CSV file download from FTP.

Sensor Messaging Gateway
Sensor Messaging Gateway

SMG also provides a simple framework for building event driven sensor data stream processors. The local cache allows a processor to access temporal windows. Common use cases for these processors include derived data streams, temporal aggregation and quality control.

Data Storage (MongoDB)

All Data Stream samples are stored in the MongoDB document store. MongoDB provides automated sharding, replication, geospatial queries, compressed storage and a built in aggregation framework.

Web APIs
A REST API is provided as the primary means of accessing data. The REST API follows the Hypermedia Application Language specification. For large queries data is streamed to the client to avoid memory limitations of the application server. The default data encoding is JSON, and observation data queries can be returned as JSON, CSV or GeoJSON for geolocation data streams.
The REST API provides a pull based interface for synchronous requests for historical or recent data. However, for near real-time data streams a push based publish-subscribe mechanism is used. Clients can connect to the message queue directly and consumer any data stream.

Performance and Scalability
A recent deployment has over 600 million data samples hosted in one a database cluster with two shards each with 4 VCPUs and 12GB RAM.

REST API total times for requesting json formatted sensor observations from a single data stream.
REST API total times for requesting json formatted sensor observations from a single data stream.

The times in Figure 5 are calculated from API responses, including the time to serialise and transfer the data to the HTTP client.
All components of the SensorCloud platform scale horizontally. Data streams can be distributed across a message queue cluster, data ingestion can be distributed and the Database auto sharding makes it simple to add additional shards.

Authentication and Authorisation
The SensorCloud API provides a role based authorisation model allowing data to be segregated between clients, and granular permissions such as data embargoes. In addition logical groups of data can be shared between client organisations. By using an API manager many different authentication schemes can be integrated.

 

Projects