Big Data Knowledge Discovery
Big Data Knowledge Discovery is an interdisciplinary research initiative that focuses on meshing the new data science and machine learning techniques with natural sciences. The project aims to develop new pathways for scientists to collect, interact with, and draw meaning from the rapidly expanding quantities of data available to them. This research initiative, sponsored by the Science and Industry Endowment Fund, brings Data61’s Machine Learning researchers together with discipline leaders from the data-intensive fields of Geosciences, Life Sciences and Physical Sciences. Founded by statute in 1926, SIEF was rejuvenated in 2009 through a gift of $150 million from CSIRO’s proceeds of its wireless Local Area Network (WLAN) licensing program.
Data Management – Managing the flow of data is fundamental to open, reproducible science. This includes understanding not only the data’s origin, but also how it has been transformed to produce a particular result. The initiative is developing open source tools to make this kind of data management an integral part of the scientific workflow.
Core Machine Learning – Incorporating machine learning techniques into natural sciences will help scientists use larger quantities of data, and also infer more from their existing data. The resulting software tools will enable the integration of machine learning techniques into existing workflows, allowing new relationships to be discovered while optimising the design of future experiments.
Natural Science Applications
Plate tectonics – Rapidly increasing volumes of diverse geoscientific data offers new opportunities and challenges in the form of efficiently accessing these datasets, before developing new statistical insights into Earth processes and mineral systems.
Forest Ecology – To investigate the processes underpinning forest ecosystem diversity, it is necessary to learn models around the contributing factors including environmental conditions, plant traits and competition.
Nonlinear laser physics – Efficiently exploring the complex outputs from nonlinear laser systems requires the combination of traditional numerical simulation and machine learning. Active sampling techniques allow the experiment to spend more time in regions where the numerical modelling is less accurate.
Partners:
People: Alistair Reid, Daniel Steinberg, Simon O’Callaghan, Stephen Hardy