Bridging Computer Science and Molecular Science: Caitlin Whitter’s Journey as a MolSSI Software Fellow

Caitlin Whitter didn’t take the traditional path into molecular science. As a Ph.D. student in computer science at Purdue University, she always knew she wanted to work on interdisciplinary research. Rather than sticking purely to computing, she was drawn to solving complex problems in the natural sciences. That curiosity led her to computational molecular science, where she now applies machine learning to better understand molecular and atomic properties.

Diving Into Computational Science
Whitter’s research focuses on building interpretable machine learning models to predict molecular and atomic properties accurately and efficiently using quantum-mechanical datasets. These predictions have significant implications for fields like drug discovery, materials science, and chemistry, where understanding molecular interactions is crucial. By developing interpretable machine learning pipelines, she aims to make these computational tools more effective and accessible to scientists.

Identifying molecular compounds with novel properties is a vital step in pharmaceutical drug discovery and materials design. However, the chemical space is vast, with approximately 10^60 possible molecules, creating a strong need for accurate and efficient predictive models. Traditional quantum chemical methods, such as Density Functional Theory (DFT), can take days per molecule to achieve high accuracy. Machine learning models, on the other hand, offer a way to dramatically accelerate this process, predicting properties in mere seconds. Yet, the lack of transparency in typical black-box models can hinder scientific trust and usability.

To address this, Whitter designs machine learning pipelines with interpretability in mind, ensuring clearer insights into the underlying decision-making process.
During her Ph.D., she has worked on multiple projects at the intersection of machine learning and computational chemistry. These include:

  1. Physics-informed graph convolutional networks for fast and accurate prediction of atomic multipole moments.
  2. Subset selection algorithms for choosing representative molecular science datasets to improve neural network performance and gain insights into dataset distributions—a focus of her MolSSI software project.

The MolSSI Fellowship Experience
Becoming a MolSSI Software Fellow has been a transformative experience for Whitter. The fellowship provided funding that allowed her to fully dedicate herself to her MolSSI project for a year, diving deeper into her research without financial concerns. One of the highlights has been working closely with her MolSSI Software Scientist mentor, Dr. Benjamin Pritchard. Their bi-weekly meetings have provided invaluable insights, particularly in working with molecular science datasets. The MolSSI summer bootcamp was another major advantage—not only was it a great learning opportunity, but it also allowed her to connect with other Fellows and explore research from different perspectives.

Developing Machine Learning for Molecular Science
In her MolSSI software project, Whitter focuses on developing subset selection algorithms to improve the training efficiency of neural networks while maintaining low error. These algorithms also provide deeper insights into the distribution of molecular datasets. One of the benchmark datasets she works with is QM9, a quantum-mechanical dataset containing 134,000 small molecules with various energetic and electronic properties. This dataset is widely used in computational chemistry and is available through multiple repositories, including MolSSI’s QCArchive.

To analyze these molecular datasets, she trains graph neural networks (GNNs), a type of neural network specifically designed for graph-structured data. Because molecules are often represented in tabular form, she first converts them into graph representations, where atoms are nodes and bonds are edges. Additional molecular features, such as atomic number and bond type, further refine these graph structures.

For implementation, Whitter primarily uses Python and leverages software packages like PyTorch and Deep Graph Library (DGL) for model architecture and training. Additionally, RDKit provides access to molecular features beyond what is included in standard molecular science datasets.

Looking Ahead
As she works toward completing her Ph.D., Whitter remains excited about the future of machine learning in computational science. Whether in academia, industry, or national labs, she hopes to contribute to cutting-edge advancements in the field.  Beyond research, she enjoys singing in a musical ensemble at Purdue, reading, exploring new places, and spending time with friends and family. Looking back, she is proud of the collaborations she has been part of and the opportunities she has had to share her research with a broader audience. Being selected as a MolSSI Software Fellow has been one of the most rewarding experiences of her Ph.D. journey, and she highly encourages others interested in computational molecular science to apply.

To connect with Caitlin Whitter or learn more about her research, visit her LinkedIn profile