Mass spectrometry and informatics: distribution of molecules in the PubChem database and general requirements for mass accuracy in surface analysis

Anal Chem. 2011 May 1;83(9):3239-43. doi: 10.1021/ac200067s. Epub 2011 Apr 1.

Abstract

Mass spectrometry is a powerful tool for the analysis and identification of substances across a broad range of technologies from proteomics and metabolomics through to surface analysis methods used for nanotechnology. A major challenge has been the development of automated methods to identify substances from the mass spectra. Public chemical databases have grown over 2 orders of magnitude in size over the past few years and have become a powerful tool in informatics approaches for identification. We analyze the popular PubChem database in terms of the population of substances with mass when resolved with typical mass spectrometer mass accuracies. We also characterize the average molecule in terms of the mass excess from nominal mass and the modal mass. It is shown, in agreement with other studies, that for the identification of unknowns a mass accuracy of around 1 ppm is required together with additional filtering using isotope patterns. This information is an essential part of a framework being developed for experimental library-free interpretation of complex molecule spectra in secondary ion mass spectrometry.

Publication types

  • Letter