Don’t put it in the bin!
Scientists in a collaboration between RAMP, DATA 61 and La Trobe University are making sense of big data sets using machine learning. The PhD candidate, Mr Robert Madiona recently published an article in ‘Applied Surface Science’ where machine learning has been used to gain new insights into how information is encoded with ToF-SIMS data sets as instrument resolution is increased. Ultimately the techniques being developed by these scientists will help researchers conducting automated and robotic research that results in the generation of large data sets.
Information content of ToF-SIMS data: Effect of spectral binning
Robert M.T.Madionaa David L.J.Alexanderb David A.Winklercdef Benjamin W.Muirf Paul J.Pigrama
Abstract
Surface analysis methods such as Time of Flight Secondary Ion Mass Spectrometry (ToF-SIMS) have become essential for probing surfaces and interfaces that are critical determinants of the properties of a diverse range of materials. These methods generate copious amounts of information but this is rarely analyzed by modern artificial intelligence and machine learning methods. Here we calculate the information content of ToF-SIMS spectra, and how this changes with variation in the size of the bins into which the spectra can be assigned. We find that Shannon entropy of the spectra of 10 diverse polymers correlates well with molecular information content of the monomers from which the polymers derive. Surprisingly, we find that most of the information in ToF-SIMS spectra resides in resolutions (bin sizes) of 0.02–1 m/z. At very small bin sizes the information content of the spectra is close to that expected for mass spectral information uniformly distributed across bins. Conversely, for large bin sizes we find that the information content of the spectra is close to that expected for mass spectral information randomly distributed across bins.
- a
- Centre for Materials and Surface Science, Department of Chemistry and Physics, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
- b
- CSIRO Data61, Clayton, VIC 3168, Australia
- c
- La Trobe Institute for Molecular Sciences, School of Molecular Sciences, La Trobe University, Melbourne, VIC 3086, Australia
- d
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Australia
- e
- School of Pharmacy, University of Nottingham, Nottingham NG7 2RD, UK
- f
- CSIRO Manufacturing, Clayton, VIC 3168, Australia