Materials Informatics and Data-driven Discovery
Of course, generating and organizing results is just part of the problem. The analysis of high-throughput (HT) computational data involves encoding structural features, data analytics and machine learning to extract information, identify correlation patterns and the rapid detection of “high-performing” candidates.
In this project we are exploring the use of complex network analysis tools, self-organized maps (SOMs) representations of data sets, and deep learning neural networks technologies to describe complicated mixtures and distributions of nanostructures. We are studying the general applicability of methods used in other fields, such as clustering and Archetypal Analysis (AA), to reduce nanoparticle ensembles to the structures that really matter, and finding inventive ways that machine learning can improve how we research (as well as what we research).
For example, ML techniques can identify when higher level QM methods are required to calculate molecular properties, and when computationally cheaper methods will be sufficient. We are investigating different structural fingerprints including atom fragments, topology and the Coulomb matrix to calibrate ML models for intelligent screening strategies that will help researchers to avoid unnecessary quantum mechanical simulations when less expensive methods can provide comparable accuracy in a fraction of the time.
For more information, contact the Project Leader, Dr Baichuan Sun.