Machine learning with Big Data (63 Billion tiny data points to be precise!)

By September 25th, 2019

PhD candidate Wil Gardner who is undertaking his PhD at La Trobe University’s Centre for Materials and Surface Science in partnership with scientists at RAMP Centre has just had his first paper accepted in Analytical Chemistry. Wil is developing new methods of analysing extremely complex data sets using Kohonen self-organizing maps. In this work his technique analysed over 63 billion data points to analyse antibiotic loaded nanoparticles that one day could be used to treat cancer.

Visualizing ToF-SIMS Hyperspectral Imaging Data Using Color-Tagged Toroidal Self-Organizing Maps

Abstract: Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is a powerful surface characterization technique capable of producing high spatial resolution hyperspectral images, in which each pixel comprises an entire mass spectrum. Such images can provide insight into the chemical composition across a surface. However, issues arise due to the size and complexity of the data produced. Data are particularly complicated for biological samples, primarily due to overlapping spectra produced by similar components. The traditional approach of selecting individual ion peaks as representative of particular components is insufficient for such complex data sets. Multivariate analysis (MVA) can help to overcome this significant hurdle. We demonstrate that Kohonen self-organizing maps (SOMs) with a toroidal topology can be used to analyze a ToF-SIMS hyperspectral imaging data set and identify spectral similarities between pixels. We present a method for color-tagging the toroidal SOM output, which reduces the entire data set to a single RGB image in which similar pixels—based on their associated mass spectra—are assigned a similar color. This method was exemplified using a ToF-SIMS image of dried large multilamellar vesicles (LMVs), loaded with the antibiotic cefditoren pivoxil (CP). We successfully identified CP-loaded and empty LMVs without the need for any prior knowledge of the sample, despite their highly similar spectra. We also identified which specific ion peaks were most important in differentiating the two LMV populations. This approach is entirely unsupervised and requires minimal experimenter input. It was developed with the aim of providing a user-friendly yet sophisticated workflow for understanding complex biological samples using ToF-SIMS images.