Toggle Main Menu Toggle Search

Open Access padlockePrints

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Lookup NU author(s): Josh HortonORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2022, The Author(s).Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.


Publication metadata

Author(s): Eastman P, Behara PK, Dotson DL, Galvelis R, Herr JE, Horton JT, Mao Y, Chodera JD, Pritchard BP, Wang Y, De Fabritiis G, Markland TE

Publication type: Article

Publication status: Published

Journal: Scientific Data

Year: 2023

Volume: 10

Issue: 1

Online publication date: 04/01/2023

Acceptance date: 01/12/2022

Date deposited: 17/01/2023

ISSN (electronic): 2052-4463

Publisher: Nature Research

URL: https://doi.org/10.1038/s41597-022-01882-6

DOI: 10.1038/s41597-022-01882-6

PubMed id: 36599873

Notes: Dataset at https://doi.org/10.5281/zenodo.7338495


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
CHE-2136142
R01GM140090
R01GM132386

Share