Final Report

This document summarises outcomes from the Digital Coast Community of Practice (January – June 2017), including the vision of the Digital Coast (DC) platform as established through the series of meetings, workshops and follow up communications. It also sets out a number of “no regret” activities, each activity delivering a valuable product by itself but also contributing to the DC platform. The report also recommends establishing a DC committee that would take over the terminated DC CoP to plan and oversee development of the DC platform. One of the functions of the DC committee would be to seek funding opportunities. This committee would also establish a recurrent forum for both IT experts and researchers/managers with a limited background in IT to discuss and share recent advances in this area relevant to O&A and collaborators.

Launch Camp and DC workshops

The goal of the DC CoP was to review the state of the art and flesh out the vision of an e-infrastructure for real-time modelling and observational products. To this end a two days meeting funded by O&A Capability Development Fund (CDF) has been carried out in April 2017. This CDF meeting was preceded by the Launch Camp workshop in March 2017 (attended by Farhan Rizwi, Sharon Tickell, Uwe Rosebrock, Mark Baird, Mike Herzfeld, Gary Carroll, Dan Wild, Nugzar Margvelashvili). The Launch Camp was funded separately from the CDF meeting and was shared with several other teams, each team focused on its own project. One of the goals of the Launch Camp was to road test ideas relating to the DC prior to the CDF meeting.

The Launch Camp was a good opportunity to test and validate business and marketing ideas at early stage of the DC (for example, members of our group were encouraged to identify and approach potential users of the DC to seek further insights into the product). The meeting also highlighted communication barriers between experts from different domains. Even within a relatively small team of modellers and software engineers, after 2 days of deliberations, we were unable to converge on a shared vision of the DC platform.

A 2 days DC meeting (following the Launch Camp and facilitated by Farhan Rizwi, Mark Baird, Dan Wild, and Nugzar Margvelashvili) delivered a comprehensive overview of the current and recent developments in eResearch infrastructure. Talks by the invited speakers (uploaded online on https://research.csiro.au/digitalcoast/ ) have laid down a knowledge base solid enough to facilitate subsequent discussions. Because of the time constraints and high complexity of the problem in hand, a shared vision of the DC has not been achieved through this meeting.

Despite not having a shared vision of the DC established, the Launch Camp and DC workshops produced sufficient grounds to build on and refine earlier ideas of the DC (originally driven predominantly by the needs of the O&A Coastal Modelling group). The rest of this document gives a concise summary of the current vision of the DC platform which has evolved through series of workshops and offline communications into a more mature product acknowledging established practices and earlier achievements.

Background

An idea of having an infrastructure around data repositories to manage and utilise this data is definitely not new. The need for such an infrastructure has been acknowledged in Australia through projects driven by individual institutions/partnerships as well as through the coordinated efforts at the government level. For example, since 2006 there has been unprecedented, in scale and intent, investment in national eResearch infrastructure under the National Collaborative Research Infrastructure Strategy (NCRIS). Projects funded through NCRIS and partners delivered a range of strategically significant, enabling products and services including computing infrastructure (eg Nectar cloud, NCI, Pawsey HPC etc), data management systems (IMOS/AODN, Data Cube), and a suite of Virtual Laboratories tailored to the needs of particular science disciplines (e.g. Atlas of Living Australia, Marine Virtual Laboratory, AuScope etc). Advances in Big Data technologies have been instrumental to these developments. Another feature characteristic to eResearch in recent years is an increasing focus on online communities translated into proliferation of Virtual Labs underpinning these communities (https://nectar.org.au/ ). For further details on these developments, including an international perspective, the reader is encouraged to visit https://research.csiro.au/digitalcoast/cop-meeting/ .

Despite all these achievements, the progress in establishing a national-scale e-Infrastructure was not uniform and a number of directions have been identified recently as requiring further attention. The “2016 National Research Infrastructure Roadmap” (an expert group review to advise Australian government on priority research infrastructure for the coming decade) highlights, in particular, the need for the refinement and consolidation of the established fragmented resources (https://docs.education.gov.au/node/43736 ). The Roadmap also indicates the need for “Enhancing and integrating observational research infrastructure with predictive modelling to strengthen environmental management, risk assessments, primary production, and resource development whilst sustaining biodiversity. Predicting impacts on environmental systems is the necessary first step in the management of our continent, atmosphere, and surrounding oceans, in order to adapt to climate change to ensure domestic and global sustainable growth.”

The DC is an attempt to learn from previous projects and deliver the state of the art infrastructure for real-time modelling/observational products. Applications of the DC platform are expected to range from real-time environmental products integrated into the Virtual Globe, to a network of online educational/communication centres each centre built around specific data repository.

DC structure and implementation steps

The DC is envisaged as a distributed resource comprising a network of web-platforms (dubbed “knowledge-stacks”), each platform offering data-management/community-support services and operated by custodians of the knowledge-stack. A new instance of the knowledge-stack is launched via online Registrar upon a registration of a new custodian. The knowledge-stack comprises the client-side and server-side servicing infrastructure readily available to users upon a registration. These services are grouped into distinct layers (the Data Management Layer, Processing Layer, and Community layer) each layer representing a particular stage on the way from the raw data towards the knowledge extracted from these data.

The Data Management Layer includes data repository combined with a number of critical low-level services (e.g. data sharing, visualisation, low-level processing etc).
The Processing Layer offers more advanced data-processing tools such as an online scripting capacity (e.g. Jupyter, Zeppelin, or Shiny RStudio), Machine Learning, and Emulation services for complex models.
The Community Layer offers an infrastructure for online collaboration, communication and education.

The development of the DC platform can be carried out in 3 stages. The first stage focused on establishing an isolated knowledge-stack and an online registrar. The second stage integrating individual instances of the knowledge-stacks into the networked structure. And the final stage generalising the DC by incorporating both structured and unstructured data exceeding environmental modelling.

DC vs others

Unlike many established Virtual Laboratories (VL) locked-in on a particular science domain, the Digital Coast is intended as an enabling technology facilitating development of new Virtual Laboratories. Instead of starting from scratch, a custodian of the VL would start by installing a backbone knowledge-stack comprising a critical set of readily accessible services (somewhat analogous to installing a WordPress CMS). Having this basic infrastructure established, the knowledge-stack would be customised further into a fully-fledged VL by the custodians of the VL (e.g. by invoking external services or through the plugin machinery).

Unlike HubZero project (another VL enabling initiative in US, https://hubzero.org/ ) DC is intended to be a networked platform that would integrate individual models into the Virtual Globe.

Unlike many data-management systems locked-in on a particular hardware platform, DC is intended as a distributed and, hence, a platform independent system (i.e. enabling installation of new instances of knowledge-stacks on different platforms). Another feature characteristic of DC is that it extends beyond basic data-management services towards higher order processing (eg Machine Learning and Emulation) and education/communication services.

“No regret” developments

This section identifies a number of “no regret” activities that can produce valuable products by themselves but also contribute to the DC platform.

Online Registrar. O&A has an expertise in setting-up data management infrastructure for real-time (operational) modelling applications. Making such infrastructure installed automatically via online registrar, would benefit modelling groups in O&A and other divisions.
Machine Learning and Emulation services. Machine Learning has been implemented successfully in the past to build emulators (surrogates) of complex ocean models. Such emulators run orders of magnitude faster than the original models and can be used to assimilate data and run scenarios. Emulators may also contribute to integrating knowledge-stacks into the Virtual Globe – they have relatively simple structure and are easier to formalise thus abstracting complexity and heterogeneity of the underlying complex models. Apart from the emulation task, Machine Learning can be used as a general-purpose processing tool to investigate models and observations.
The Communication/Education layer is a critical part of the knowledge-stack. Having this infrastructure readily available to the custodians of knowledge-stacks would facilitate development of online communities and uptake of knowledge.

Having products and services associated with these activities delivered, would result in an infrastructure to automatically install an isolated knowledge stack via an online Registrar – the first step towards the DC platform.

Since the funding to the DC may not be available on the continuous basis, it is recommended to establish a DC committee. One of the functions of this committee would be to seek funding opportunities. The committee would also meet on a regular basis to plan and oversee the project developments. The DC committee would also facilitate sharing of knowledge about recent advances in IT that might be relevant to O&A and partners.

DC committee to take over the terminated DC CoP to plan and oversee development of the DC in the future (particularly during the periods of scarce funding).