Skip to main content


RiskLab Publications

If you would like to obtain more information regarding any of our publications listed here, please contact the RiskLab team, RiskLab.

Australian Household Superannuation, Balances at the Date of Retirement

Yunxiao Wang, Colin O’Hare, Alec G. Stephenson, Bonsoo Koo, Peter Toscas, Zili Zhu, Andrew Reeson, Aaron Minney

Australia’s superannuation system primarily consists of defined contribution type funds and has experienced 26 years of compulsory contributions. As this system matures, more retirees are reaching re- tirement and beginning the process of drawing down on those funds.   A key question then is how successful has the superannuation system been to date in helping Australians to save adequate funds for a sus- tainable retirement income. Using a new, unique and large dataset provided by the Australian Government’s Department of Human Ser- vices (DHS) we are able to, for the first time, provide a detailed ac- count superannuation savings at both the individual and a household level providing a more comprehensive account of total wealth. We also compare the size of superannuation assets against a households non superannuation assets to produce a fuller picture of wealth in retire- ment, and also look to quantify the gender superannuation gap at the individual level. Finally we look the frequency and size of lump sum withdrawals from superannuation at retirement. Given Australia’s significant coverage and large personal savings of it’s superannuation system, the findings presented provide a useful case study for countries with an increasing amount of defined contribution type assets.

Keywords: Superannuation; Superannuation Balance; Household assets; Retirement savings; Lump sum; Income stream; DHS data

Incorporating Primary Home Value in Total Testable Assets for Australia’s Age Pension

Yunxiao Wang, Alec G. Stephenson, Colin O’Hare, Bonsoo Koo, Peter Toscas, Zili Zhu, Andrew Reeson, Aaron Minney

The growing cost of the Age Pension in Australia is of significant con- cern, especially when combined with the effects of an ageing popula- tion and increasing longevity. It is thus vital to make the Age Pension more sustainable and affordable in order to safeguard the retirement income of Australian retires who are most vulnerable. The current Australian superannuation system has still yet to mature, with around 70% of retirees relying on the means tested Age Pension. A higher proportion own their own home, and with increasing house values, it is of interest to look at the effects of including the primary house value in the asset test. The primary home often accounts for a large portion of households assets. Using data from the Australian Department of Human Services, we can analyse assets held by Australian retirees, at the date of retirement, and at both an individual and household level. In this paper we look at including 5%, 10% and 20% of the primary home value in the Age Pension asset test, providing an em- pirical analysis of the impact that such a policy change would have on 100,250 coupled and 182,212 single home-owning households in Aus- tralia. Changing the asset test to include the primary home requires research into how the primary home value is included and the effects it has to Age Pension.

Keywords: Age Pension; Asset test; Retirement savings; Superannua- tion; Primary home

Superannuation Drawdown Behaviour for Australian Households

Alec G. Stephenson, Peter Toscas, Zili Zhu, Andrew Reeson, Aaron Minney, Yunxiao Wang, Bonsoo Koo, Colin O’Hare

We examine superannuation accumulation and drawdown behaviour using transactional records from a database provided by the  Aus- tralian Government Department of Human Services to assess eligibility for welfare payments and concession cards. Superannuation remains a key policy issue for the Australian government, with legislative reform designed to deal with budgetary pressures while increasing superan- nuation balances for an ageing population, allowing people to profit  from superannuation for wellbeing in retirement. We seek to inform debate on potential regulatory change, and find that many retirees are under withdrawing  their superannuation.

Keywords: Assets; Longevity; Retirement; Savings; Superannua- tion

Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization.

Shanika L Wickramasuriya, George Athanasopoulos, Rob J Hyndman (2018)

Large collections of time series often have aggregation constraints due to product or geographical groupings. The forecasts for the most disaggregated series are usually required to add-up exactly to the forecasts of the aggregated series, a constraint we refer to as “coherence”. Forecast reconciliation is the process of adjusting forecasts to make them coherent. The reconciliation algorithm proposed by Hyndman et al. (2011) is based on a generalized least squares estimator that requires an estimate of the covariance matrix of the coherency errors (i.e., the errors that arise due to incoherence). We show that this matrix is impossible to estimate in practice due to identifiability conditions. We propose a new forecast reconciliation approach that incorporates the information from a full covariance matrix of forecast errors in obtaining a set of coherent forecasts. Our approach minimizes the mean squared error of the coherent forecasts across the entire collection of time series under the assumption of unbiasedness. The minimization problem has a closed form solution. We make this solution scalable by providing a computationally efficient representation. We evaluate the performance of the proposed method compared to alternative methods using a series of simulation designs which take into account various features of the collected time series. This is followed by an empirical application using Australian domestic tourism data. The results indicate that the proposed method works well with artificial and real data.


Exploring the sources of uncertainty: why does bagging for time series forecasting work?

Fotios Petropoulos, Rob J Hyndman, Christoph Bergmeir (2018)

In a recent study, Bergmeir, Hyndman and Benítez (2016) successfully employed a bootstrap aggregation (bagging) technique for improving the performance of exponential smoothing. Each series is Box-Cox transformed, and decomposed by Seasonal and Trend decomposition using Loess (STL); then bootstrapping is applied on the remainder series before the trend and seasonality are added back, and the transformation reversed to create bootstrapped versions of the series. Subsequently, they apply automatic exponential smoothing on the original series and the bootstrapped versions of the series, with the final forecast being the equal-weight combination across all forecasts. In this study we attempt to address the question: why does bagging for time series forecasting work? We assume three sources of uncertainty (model uncertainty, data uncertainty, and parameter uncertainty) and we separately explore the benefits of bagging for time series forecasting for each one of them. Our analysis considers 4,004 time series (from the M- and M3-competitions) and two families of models. The results show that the benefits of bagging predominantly originate from the model uncertainty: the fact that different models might be selected as optimal for the bootstrapped series. As such, a suitable weighted combination of the most suitable models should be preferred to selecting a single model.


Visualizing big energy data.

Rob J Hyndman, Xueqin Lin, Pierre Pinson (2018)

Visualization is a crucial component of data analysis. It is always a good idea to plot the data before fitting any models, making any predictions, or drawing any conclusions. As sensors of the electric grid are collecting large volumes of data from various sources, power industry professionals are facing the challenge of visualizing such data in a timely fashion. In this article, we demonstrate several data visualization solutions for big energy data through three case studies involving smart meter data, phasor measurement unit (PMU) data, and probabilistic forecasts, respectively.


A note on the validity of cross-validation for evaluating autoregressive time series prediction.

Christoph Bergmeir, Rob J Hyndman, Bonsoo Koo (2018)

One of the most widely used standard procedures for model evaluation in classification and regression is K-fold cross-validation (CV). However, when it comes to time series forecasting, because of the inherent serial correlation and potential non-stationarity of the data, its application is not straightforward and often omitted by practitioners in favour of an out-of-sample (OOS) evaluation. In this paper, we show that in the case of a purely autoregressive model, the use of standard K-fold CV is possible as long as the models considered have uncorrelated errors. Such a setup occurs, for example, when the models nest a more appropriate model. This is very common when Machine Learning methods are used for prediction, where CV in particular is suitable to control for overfitting the data. We present theoretical insights supporting our arguments. Furthermore, we present a simulation study and a real-world example where we show empirically that K-fold CV performs favourably compared to both OOS evaluation and other time-series-specific techniques such as non-dependent cross-validation.


Coherent Probabilistic Forecasts for Hierarchical Time Series.

Souhaib Ben Taieb, James W Taylor, Rob J Hyndman (2017)

Many applications require forecasts for a hierarchy comprising a set of time series along with aggregates of subsets of these series. Although forecasts can be produced independently for each series in the hierarchy, typically this does not lead to coherent forecasts — the property that forecasts add up appropriately across the hierarchy. State-of-the-art hierarchical forecasting methods usually reconcile these independently generated forecasts to satisfy the aggregation constraints. A fundamental limitation of prior research is that it has looked only at the problem of forecasting the mean of each time series. We consider the situation where probabilistic forecasts are needed for each series in the hierarchy. We define forecast coherency in this setting, and propose an algorithm to compute predictive distributions for each series in the hierarchy. Our algorithm has the advantage of synthesizing information from different levels in the hierarchy through a sparse forecast combination and a probabilistic hierarchical aggregation. We evaluate the accuracy of our forecasting algorithm on both simulated data and large-scale electricity smart meter data. The results show consistent performance gains compared to state-of-the art methods.


Forecasting with temporal hierarchies.

George Athanasopoulos, Rob J Hyndman, Nikolaos Kourentzes, Fotios Petropoulos (2017)

This paper introduces the concept of Temporal Hierarchies for time series forecasting. A temporal hierarchy can be constructed for any time series by means of non-overlapping temporal aggregation. Predictions constructed at all aggregation levels are combined with the proposed framework to result in temporally reconciled, accurate and robust forecasts. The implied combination mitigates modelling uncertainty, while the reconciled nature of the forecasts results in a unified prediction that supports aligned decisions at different planning horizons: from short-term operational up to long-term strategic planning. The proposed methodology is independent of forecasting models. It can embed high level managerial forecasts that incorporate complex and unstructured information with lower level statistical forecasts. Our results show that forecasting with temporal hierarchies increases accuracy over conventional forecasting, particularly under increased modelling uncertainty. We discuss organisational implications of the temporally reconciled forecasts using a case study of Accident & Emergency departments.


Grouped functional time series forecasting: an application to age-specific mortality rates.

Han Lin Shang, Rob J Hyndman (2017)

Age-specific mortality rates are often disaggregated by different attributes, such as sex, state and ethnicity. Forecasting age-specific mortality rates at the national and sub-national levels plays an important role in developing social policy. However, independent forecasts of age-specific mortality rates at the sub-national levels may not add up to the forecasts at the national level. To address this issue, we consider the problem of reconciling age-specific mortality rate forecasts from the viewpoint of grouped univariate time series forecasting methods (Hyndman et al, 2011), and extend these methods to functional time series forecasting, where age is considered as a continuum. The grouped functional time series methods are used to produce point forecasts of mortality rates that are aggregated appropriately across different disaggregation factors. For evaluating forecast uncertainty, we propose a bootstrap method for reconciling interval forecasts. Using the regional age-specific mortality rates in Japan, obtained from the Japanese Mortality Database, we investigate the one- to ten-step-ahead point and interval forecast accuracies between the independent and grouped functional time series forecasting methods. The proposed methods are shown to be useful for reconciling forecasts of age-specific mortality rates at the national and sub-national levels, and they also enjoy improved forecast accuracy averaged over different disaggregation factors.


A note on upper bounds for forecast-value-added relative to naïve forecasts.

Paul Goodwin, Fotios Petropoulos, Rob J Hyndman (2017)

In forecast value added analysis, the accuracy of relatively sophisticated forecasting methods is compared to that of naïve 1 forecasts to see whether the extra costs and effort of implementing them are justified. In this note, we derive a ratio that indicates the upper bound of a forecasting method’s accuracy relative to naïve 1 forecasts when the mean squared error is used to measure one-period-ahead accuracy. The ratio is applicable when a series is stationary or when its first differences are stationary. Formulae for the ratio are presented for several exemplar time series processes.

Dynamic Algorithm Selection for Pareto Optimal Set Approximation.

Ingrida Steponavičė, Rob J Hyndman, Kate Smith-Miles, Laura Villanova (2017)

This paper presents a meta-algorithm for approximating the Pareto optimal set of costly black-box multiobjective optimization problems given a limited number of objective function evaluations. The key idea is to switch among different algorithms during the optimization search based on the predicted performance of each algorithm at the time. Algorithm performance is modeled using a machine learning technique based on the available information. The predicted best algorithm is then selected to run for a limited number of evaluations. The proposed approach is tested on several benchmark problems and the results are compared against those obtained using any one of the candidate algorithms alone.


Visualising forecasting algorithm performance using time series instance spaces.

Yanfei Kang, Rob J Hyndman, Kate Smith-Miles (2017)

It is common practice to evaluate the strength of forecasting methods using collections of well-studied time series datasets, such as the M3 data. But how diverse are these time series, how challenging, and do they enable us to study the unique strengths and weaknesses of different forecasting methods? In this paper we propose a visualisation method for a collection of time series that enables a time series to be represented as a point in a 2-dimensional instance space. The effectiveness of different forecasting methods can be visualised easily across this space, and the diversity of the time series in an existing collection can be assessed. Noting that the M3 dataset is not as diverse as we would ideally like, this paper also proposes a method for generating new time series with controllable characteristics to fill in and spread out the instance space, making generalisations of forecasting method performance as robust as possible.


Anomaly detection in streaming nonstationary temporal data.

Priyanga Dilini Talagala, Rob J Hyndman, Kate Smith-Miles, Sevvandi Kandanaarachchi and Mario A Muñoz (2018)

This article proposes a framework that provides early detection of anomalous series within a large collection of non-stationary streaming time series data. We define an anomaly as an observation that is very unlikely given the recent distribution of a given system. The proposed framework first forecasts a boundary for the system’s typical behavior using extreme value theory. Then a sliding window is used to test for anomalous series within a newly arrived collection of series. The model uses time series features as inputs, and a density-based comparison to detect any significant changes in the distribution of the features. Using various synthetic and real world datasets, we demonstrate the wide applicability and usefulness of our proposed framework. We show that the proposed algorithm can work well in the presence of noisy non-stationarity data within multiple classes of time series. This framework is implemented in the open source R package oddstream. R code and data are available in the supplementary materials.


Calendar-based graphics for visualizing people’s daily schedules.

Earo Wang, Dianne Cook, Rob J Hyndman (2017)

This paper describes a frame_calendar function that organizes and displays temporal data, collected on sub-daily resolution, into a calendar layout. Calendars are broadly used in society to display temporal information, and events. The frame_calendar uses linear algebra on the date variable to create the layout. It utilizes the grammar of graphics to create the plots inside each cell, and thus synchronizes neatly with ggplot2 graphics. The motivating application is studying pedestrian behavior in Melbourne, Australia, based on counts which are captured at hourly intervals by sensors scattered around the city. Faceting by the usual features such as day and month, was insufficient to examine the behavior. Making displays on a monthly calendar format helps to understand pedestrian patterns relative to events such as work days, weekends, holidays, and special events. The layout algorithm has several format options and variations. It is implemented in the R package sugrrants.


Hierarchical Probabilistic Forecasting of Electricity Demand with Smart Meter Data.

Souhaib Ben Taieb, James W Taylor, Rob J Hyndman (2017)

Electricity smart meters record consumption, on a near real-time basis, at the level of individual commercial and residential properties. From this, a hierarchy can be constructed consisting of time series of demand at the smart meter level, and at various levels of aggregation, such as substations, cities and regions. Forecasts are needed at each level to support the efficient and reliable management of consumption. A limitation of previous research in this area is that it considered only deterministic prediction. To enable improved decision-making, we introduce an algorithm for producing a probability density forecast for each series within a large-scale hierarchy. The resulting forecasts are coherent in the sense that the forecast distribution of each aggregate series is equal to the convolution of the forecast distributions of the corresponding disaggregate series. Our algorithm has the advantage of synthesizing information from different levels in the hierarchy through forecast combination. Distributional assumptions are not required, and dependencies between forecast distributions are imposed through the use of empirical copulas. Scalability to large hierarchies is enabled by decomposing the problem into multiple lower-dimension sub-problems. Results for UK electricity smart meter data show performance gains for our method when compared to benchmarks.


The Australian Macro Database: An online resource for macroeconomic research in Australia.

Timur Behlul, Anastasios Panagiotelis, George Athanasopoulos, Rob J Hyndman, Farshid Vahid (2017)

A website that encourages and facilities the use of quantitative, publicly available Australian macroeconomic data is introduced. The Australian Macro Database hosted at provides a user friendly front end for searching among over 40000 economic variables, sourced from the Australian Bureau of Statistics and the Reserve Bank of Australia. The search box, tags and categories used to facilitate data retrieval, are described in detail. Known issues with the website and future plans are discussed in the conclusion.


Macroeconomic forecasting for Australia using a large number of predictors.

Bin Jiang, George Athanasopoulos, Rob J Hyndman, Anastasios Panagiotelis, Farshid Vahid (2017)

A popular approach to forecasting macroeconomic variables is to utilize a large number of predictors. Several regularization and shrinkage methods can be used to exploit such high-dimensional datasets, and have been shown to improve forecast accuracy for the US economy. To assess whether similar results hold for economies with different characteristics, an Australian dataset containing observations on 151 aggregate and disaggregate economic series is introduced. An extensive empirical study is carried out investigating forecasts at different horizons, using a variety of methods and with information sets containing different numbers of predictors. The results share both differences and similarities with the conclusions from the literature on forecasting US macroeconomic variables. The major difference is that forecasts based on dynamic factor models perform relatively poorly compared to forecasts based on other methods which is the opposite of the conclusion made by Stock and Watson (2012) for the US. On the other hand, a conclusion that can be made for both the Australian and US data is that there is little to no improvement in forecast accuracy when the number of predictors is expanded beyond 20-40 variables.