Skip to main content

Sensor Data Models and Management

The Senaps data platform is a scalable cloud based data storage and web API for streaming and historical sensor data. It uses deep domain knowledge in environmental modelling for a shared information model, that can readily integrate heterogenous data. Senaps also provides analytics integration for streamlining analysis of sensor data and integration into existing models.

Introduction
The Senaps platform is a loosely coupled suite of components which come together to create a platform for ingesting, storing, analysing and delivering sensor data. Senaps is designed to manage time series data recorded by field instruments across many applications including:

  • Weather stations
  • Soil monitoring
  • Animal tracking and monitoring
  • Water quality

The platform provides a developer friendly REST Application Programming Interface (API) for interfacing any type of user interface such web sites and mobile apps. In addition, analytical tools can be connected using those same APIs.
Data Stream Types
The core data storage concept in Senaps is a Data Stream. Currently the Senaps platform manages four types of data streams:

  • Scalar (numerical)
  • Geolocation (longitude, latitude, altitude)
  • Vector (array)
  • Images (image/jpeg or image/png)
Webased display of oyster heart signal
Webased display of oyster heart signal
Geolocation data stream from a soil mapping sensor
Geolocation data stream from a soil mapping sensor

 

 

 

 

 

 

 

 

Architecture
The Senaps platform can be deployed in any cloud as there are no dependencies on cloud specific technologies. The architecture provides an event driven messaging layer for handling incoming sensor data streams, this is typically provided using the open source RabbitMQ. Sensor data is transformed as it arrives and stream processors can also be applied on arrival. Data is then pushed into the data storage system and also streamed to any real-time clients. The REST API interfaces directly with the storage layer to request historical time series. The platform architecture allows a scalable deployment from one embedded PC up to many VMs in a cloud platform.

onceptual Architecture of the core Senaps platform
onceptual Architecture of the core Seanps platform

Meta-Data
Senaps provides a detailed meta-data model targeted at field deployed sensors. The meta-data model provides key concepts of Platforms, Sensors and Deployments. In addition important details about each Data Stream are required. Controlled vocabularies are used for unit of measure and observed property fields. These properties can be linked to an externally managed linked data registry.
Data Ingestion Framework (SMG)
Sensor Messaging Gateway (SMG) provides a message oriented framework used to manage data ingestion of many heterogeneous data streams. SMG uses a modular architecture to allow extensions to be added for new data sources. Data source modules can be specific to a particular sensor or generic such as CSV file download from FTP.

Sensor Messaging Gateway
Sensor Messaging Gateway

SMG also provides a simple framework for building event driven sensor data stream processors. The local cache allows a processor to access temporal windows. Common use cases for these processors include derived data streams, temporal aggregation and quality control.

Data Storage (MongoDB)

All Data Stream samples are stored in the MongoDB document store. MongoDB provides automated sharding, replication, geospatial queries, compressed storage and a built in aggregation framework.

Web APIs
A REST API is provided as the primary means of accessing data. The REST API follows the Hypermedia Application Language specification. For large queries data is streamed to the client to avoid memory limitations of the application server. The default data encoding is JSON, and observation data queries can be returned as JSON, CSV or GeoJSON for geolocation data streams.
The REST API provides a pull based interface for synchronous requests for historical or recent data. However, for near real-time data streams a push based publish-subscribe mechanism is used. Clients can connect to the message queue directly and consumer any data stream.

Performance and Scalability
A recent deployment has over 600 million data samples hosted in one a database cluster with two shards each with 4 VCPUs and 12GB RAM.

REST API total times for requesting json formatted sensor observations from a single data stream.
REST API total times for requesting json formatted sensor observations from a single data stream.

The times in Figure 5 are calculated from API responses, including the time to serialise and transfer the data to the HTTP client.
All components of the Senaps platform scale horizontally. Data streams can be distributed across a message queue cluster, data ingestion can be distributed and the Database auto sharding makes it simple to add additional shards.

Authentication and Authorisation
The Senaps API provides a role based authorisation model allowing data to be segregated between clients, and granular permissions such as data embargoes. In addition logical groups of data can be shared between client organisations. By using an API manager many different authentication schemes can be integrated.

 

Projects