Publications Wednesday: Machine learning approaches to improve and predict water quality data

April 1st, 2020

The harsh reality of Agriculture 4.0 is that sensors have downtime. So how do we make sense of the times when sensor data has gaps? One answer is to fill in the gap, by imputing most-likely values for the period of missing data.

Digiscape early-career researcher Yi-fan Zhang has been solving this problem. He has been working with water quality sensor data provided by our colleagues in the Great Barrier Reef hinterland (1622.farm). This is a hostile environment for sensors, so missing data happen; and an imputation method for these data can’t rely on other data streams that are useful predictors. A gap-filling method that works here is likely to be transferable to many other agricultural, land and water management settings.

Dr Zhang has developed a machine learning system called SSIM, the “sequence-to-sequence imputation model”, for recovering missing data in sensor networks. The SSIM combines Long Short Term Memory Network (LSTM) subsystems with an attention mechanism and it utilizes both the past and future non-missing information to fill a gap. A new variable-length sliding window algorithm is used to generate a large number of training samples, so that the SSIM can be trained with small data sets.

The SSIM has been published in the IEEE Internet of Things Journal here.