PhD Project: Statistical analysis of big time series electricity usage data
In the state of Victoria, Australia, it is state government policy that all households have a smart meter fitted (Victoria State Government, 2015), which has resulted in 97% of the approximately 2.1 million households in Victoria having a smart meter installed in their home. These smart meters collect energy usage information at half hourly intervals, resulting in over 35 billion half hourly observations per year across all households in the state. The introduction of smart meters affords the opportunity to better model and understand residential and business energy usage patterns between months, between days and within days, something that is not possible using only quarterly energy usage information. This is an emerging area of research (Taieb et al., 2015, and 2016) but there are considerable challenges as the computational price of processing this extra data can be immense. This project will concentrate on undertaking research to tackle this computational challenge and to link the large smart meter energy usage data set with other smaller datasets, such as demographic datasets, building size and material, behavioural and customer billing information. This research activity has two main aims:
- Develop scalable method to analyse a large number of time series. In the first instance the aim will be to develop methods that can analyse many time series. Dimension reduction methods such as functional data analysis (Ramsey and Silverman, 2005), probabilistic methods for approximate matrix decompositions (Halko, Martinsson and Tropp, 2011), state-space approaches that do not require the use of large matrices (Jones, 1993), and sparse matrices approaches (e.g. Furrer, Genton and Nychka, 2006) will be explored. The methodology developed will be reliant on parallel processing using multiple multi-core computers and platforms such as Hadoop, Spark or Tessera. The method will be able to link to other data sets that contain explanatory variables such demographic datasets, building size and material, behavioural and customer billing information for inferential purposes.
- Extend the scalable method developed to analyse a large number of time series to work in real-time or near real-time. Increasingly decision makers want to be able to make decisions as new data is acquired or soon after the acquisition of the new data. The method developed will need to be extended to work in real-time or near real-time. This will help with identifying any changes that may require a management response and with forecasting household and business energy usage behaviour. New statistical machine learning (Hastie, Tibshirani, and Friedman, 2009) approaches will be developed or modified to monitor the time series for changes. Two approaches could be adopted. One approach is to update model parameters as new data is collected, and the other is to only update model parameters when a shift in the parameters values is detected. Evaluating which approach is computationally feasible will be an important component of this work.
Applications can be made by selecting the below link.
Please attach supporting documentation including a covering letter outlining why you would like to undertake the PhD project and a current CV including 2 referees. Please note that more than one application can be made if you wish to be considered for more than one PhD project.