“One of the novelties of our work is that we built a special neural network to understand symmetry and we use that as a feature extractor to make it much better at understanding images,” says Agar, a lead author of the paper where the work is described: “Symmetry-Aware Recursive Image Similarity Exploration for Materials Microscopy,” published today in Nature Computational Materials Science. In addition to Agar, authors include, from Lehigh University: Tri N. M. Nguyen, Yichen Guo, Shuyu Qin and Kylie S. Frew and, from Stanford University: Ruijuan Xu. Nguyen, a lead author, was an undergraduate at Lehigh University and is now pursuing a Ph.D. at Stanford.
The team was able to arrive at projections by employing Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction technique. This approach, says Agar, allows researchers to learn “...in a fuzzy way, the topology and the higher-level structure of the data and compress it down into 2D.”
“If you train a neural network, the result is a vector, or a set of numbers that is a compact descriptor of the features. Those features help classify things so that some similarity is learned,” says Agar. “What’s produced is still rather large in space, though, because you might have 512 or more different features. So, then you want to compress it into a space that a human can comprehend such as 2D, or 3D―or, maybe, 4D.”
By doing this, Agar and his team were able to take the 25,000-plus images and group very similar classes of material together.
“Similar types of structures in material are semantically close together and also certain trends can be observed particularly if you apply some metadata filters,” says Agar. “If you start filtering by who did the deposition, who made the material, what were they trying to do, what is the material system...you can really start to refine and get more and more similarity. That similarity can then be linked to other parameters like properties.”
This work demonstrates how improved data storage and management could rapidly accelerate materials discoveries. According to Agar, of particular value are images and data generated by failed experiments.
“No one publishes failed results and that’s a big loss because then a few years later someone repeats the same line of experiments,” says Agar. “So, you waste really good resources on an experiment that likely won’t work.”
Instead of losing all of that information, the data that has already been collected could be used to generate new trends that have not been seen before and speed discovery exponentially, says Agar.
This study is the first “use case” of an innovative new data-storage enterprise housed at Oak Ridge National Laboratory called DataFed. DataFed, according to its website is “...a federated, big-data storage, collaboration, and full-life-cycle management system for computational science and/or data analytics within distributed high-performance computing (HPC) and/or cloud-computing environments.”
“My team at Lehigh has been part of the design and development of DataFed in terms of making it relevant for scientific use cases,” says Agar. “Lehigh is the first live implementation of this fully-scalable system. It’s a federated database so anyone can pop up their own server and be tied to the central facility.”
Agar is the machine learning expert on Lehigh University’s Presidential Nano-Human Interface Initiative team. The interdisciplinary initiative, integrating the social sciences and engineering, seeks to transform the ways that humans interact with instruments of scientific discovery to accelerate innovations.
“One of the key goals of Lehigh’s Nano/Human Interface Initiative is to put relevant information at the fingertips of experimentalists to provide actionable information that allows more informed decision-making and accelerates scientific discovery,” says Agar. “Humans have limited capacity for memory and recollection. DataFed is a modern-day Memex; it provides a memory of scientific information that can easily be found and recalled.”
DataFed provides an especially powerful and invaluable tool for researchers engaged in interdisciplinary team science, allowing researchers who are collaborating on team projects located in different/remote locations to access each other’s raw data. This is one of the key components of our Lehigh Presidential Nano/Human Interface (NHI) Initiative for accelerating scientific discovery,” says Martin P. Harmer, Alcoa Foundation Professor in Lehigh’s Department of Materials Science and Engineering and Director of the Nano/Human Interface Initiative.
The work described was supported by the Lehigh University Nano/Human Interface Presidential Initiative and a National Science Foundation grant under TRIPODS + X.