The Use of Exact Mass Measurements at High Resolution to Probe Structural Diversity in Complex Natural Product Extracts
Deborah L. Zink, Merck Research Laboratories Rahway, NJ
Scott Campbell, Sierra Analytics Inc. Modesto, CA
[This publication was originally presented as a poster at ASMS 2006.]
Abstract
Micro-organisms often produce families of compounds. In our search for new antibiotics it is not unusual to find families of known antibiotics during the screening process. It is important to quickly dereplicate known compounds and to associate activity with specific components. We are using the specificity of exact mass measurements obtained from a Thermo Finnigan LTQ-FT and the data extraction/processing tools in Apex to identify/dereplicate families of compounds in a semi-high-throughput manner.
Objective
- To identify natural products of known activity at low levels (ng) in crude extracts in a semi-high-throughput manner.
- Literature can be searched to find compounds from given organism class with the appropriate biological activity.
- If we have previously seen the compounds we can use our low resolution dereplication LC-MS tools previously described to identify the knowns (reference 1). Can we increase our sensitivity by going to LTQ-FT with Apex detection?
- Can literature compounds be identified by exact mass measurements with enough certainty to drop the sample?
Approach
A Thermo-Finnigan LTQ-FT is used to obtain LC-MS data at exact mass. Apex software is used to extract the exact mass data and compare it to a library of known compounds, with a specific activity, from a given organism class. The hits are analyzed and a report generated for each sample set.
Instrument Methods
- LC-conditions: Column: Zorbax SB-C8, 2.1x30 mm; Temp: 40°C; Flow rate 300 μl /min; Solvents: A =10% acetonitrile 90% water with 1.3 mM trifluoro-acetic acid and ammonium formate B = 90% acetonitrile 10% water with 1.3 mM trifluoroacetic acid and ammonium formate; Gradient 10% B to 100% B in 6 min, hold 2 min, initialize 2 min. Either 100 vials or 2 96 well plates can be analyzed.
- MS Instrument: Thermo Finnigan LTQ-FT with the standard Ion Max API source (without the sweep cone) and ESI probe. Three scan events were used. The ion trap was scanned from 150-2000 first in negative ion mode and then in positive ion mode. The FT was scanned from 200-2000 in the positive ion mode only. In all cases the SID was set to 18 volts to try to reduce multiple ion clusters.
- Data analysis: Apex software with the following standard conditions; peak width/resolution is set to 100,000 resolution at mass 400, MS search tolerance set between 0.008Da and 0.003Da.
Description of Apex Software
- Apex uses a proprietary algorithm to determine the baseline and compute the peak centroids of continuum mass spectral data. The primary objective of the algorithm is to accurately determine peak centroid m/z values to facilitate use in accurate mass applications.
- Apex targets compounds in acquired mass spectral data sets by computing theoretical isotope clusters of all library compounds and exhaustively comparing each cluster to each spectrum in the data set, taking into account elevated mass accuracy.
- Apex is designed for high-throughput application, with imported sample list and tabular results files in XML format.
- Apex provides a viewer for batch analysis of hits.
Libraries
- Our libraries consist of molecular formulas and compound names. A library is based on targeted assay areas, and organism class.
- Depending on the library we calculate and search for M+H, M+Na, M+NH4 and/or the multiply charged species of the same adducts.
Screenshots from Apex
Libraries in Apex

Summary Result Table from Apex

Detail list of results is saved to Excel

Apex viewer allows one to confirm assignment of desired hits

Reports from Apex (Excel)
Chemist result file in which several families of compounds were identified
| Dataset Name | Name | Formula | MW | Adduct | m/z | Found | RT | MICC (Area) |
| Sample A | "Sulfomycin III 65-Deoxy 21-demethoxy" | C52H48N16O14S2 | 1184.29773 | [M + Na] + | 1207.28696 | 1207.28347 | 4.24 | 72279 |
| Sample A | "Sulfomycin III 65-Deoxy 21-demethoxy" | C52H48N16O14S2 | 1184.29773 | [M + H] + | 1185.30501 | 1185.30707 | 4.24 | 294857 |
| Sample A | "Sulfomycin III 65-Deoxy 21-demethoxy" | C52H48N16O14S2 | 1184.29773 | [M + NH4] + | 1202.33156 | 1202.33179 | 4.26 | 362086 |
| Sample A | 21-Demethoxysulfomycin I. Pre- | C53H50N16O15S2 | 1214.3083 | [M + Na] + | 1237.29752 | 1237.29711 | 4.06 | 334417 |
| Sample A | 21-Demethoxysulfomycin I. Pre- | C53H50N16O15S2 | 1214.3083 | [M + NH4] + | 1232.34213 | 1232.34402 | 4.06 | 1868063 |
| Sample A | Sulfomycin I | C54H52N16O16S2 | 1244.31886 | [M + H] + | 1245.32614 | 1245.32576 | 4.09 | 2851985 |
| Sample A | Sulfomycin I | C54H52N16O16S2 | 1244.31886 | [M + NH4] + | 1262.35269 | 1262.35281 | 4.09 | 6992755 |
| Sample A | Sulfomycin I | C54H52N16O16S2 | 1244.31886 | [M + Na] + | 1267.30809 | 1267.31064 | 4.09 | 2209156 |
| Sample A | Sulfomycin II | C54H52N16O15S2 | 1228.32395 | [M + NH4] + | 1246.35778 | 1246.35812 | 4.49 | 2621796 |
| Sample A | Sulfomycin II | C54H52N16O15S2 | 1228.32395 | [M + H] + | 1229.33123 | 1229.32906 | 4.49 | 697648 |
| Sample A | Sulfomycin II | C54H52N16O15S2 | 1228.32395 | [M + Na] + | 1251.31317 | 1251.31364 | 4.49 | 659698 |
| Sample B | Thiostrepton A | C72H85N19O18S5 | 1663.49236 | [M + Na] + | 1686.48158 | 1686.48323 | 4.89 | 13273809 |
| Sample B | Thiostrepton A | C72H85N19O18S5 | 1663.49236 | [M + H] + | 1664.49963 | 1664.49854 | 4.89 | 42861015 |
| Sample B | Thiopeptin Ba | C71H84N18O18S6 | 1668.45353 | [M + NH4] + | 1686.48736 | 1686.48336 | 4.86 | 12599233 |
| Sample B | Thiostrepton B | C66H79N17O16S5 | 1525.44943 | [M + H] + | 1526.45671 | 1526.45598 | 4.83 | 1967091 |
| Sample B | Thiostrepton B | C66H79N17O16S5 | 1525.44943 | [M + Na] + | 1548.43865 | 1548.43745 | 4.83 | 253435 |
Familial compounds lead to increased certainty of identifications
| Entry | Name | Formula | MW | Adduct | m/z | Found | RT | MICC (Area) | |
| Sample A | Erythromycin F | C37H67NO14 | 749.45616 | [M + Na] + | 772.44538 | 772.44541 | 2.86 | 866,658 | isobaric |
| Sample A | Erythromycin F | C37H67NO14 | 749.45616 | [M + H] + | 750.46343 | 750.46397 | 2.86 | 16,950,344 | isobaric |
| Sample A | Erythromycin C | C36H65NO13 | 719.44559 | [M + Na] + | 742.43481 | 742.43501 | 3.43 | 2,764,325 | |
| Sample A | Erythromycin C | C36H65NO13 | 719.44559 | [M + H] + | 720.45287 | 720.45332 | 3.43 | 42,784,517 | |
| Sample A | Erythromycin E | C37H65NO14 | 747.44051 | [M + H] + | 748.44778 | 748.44745 | 3.49 | 23,624,085 | |
| Sample A | Erythromycin A | C37H67NO13 | 733.46124 | [M + Na] + | 756.45046 | 756.45129 | 3.60 | 22,219,063 | isobaric |
| Sample A | Erythromycin A | C37H67NO13 | 733.46124 | [M + H] + | 734.46852 | 734.46836 | 3.60 | 483,896,070 | isobaric |
| Sample A | 6,7-Anhydroerythromycin C | C36H63NO12 | 701.43503 | [M + Na] + | 724.42425 | 724.42454 | 3.83 | 7,613,955 | |
| Sample A | 6,7-Anhydroerythromycin C | C36H63NO12 | 701.43503 | [M + H] + | 702.44230 | 702.44277 | 3.83 | 16,560,242 |
Results of previous table shown in Xcalibur

Note: Frequently the compounds of interest do not give any UV signal and are not the most intense peaks in the TIC or base peak trace.
Results
- The entire process works well for compounds of molecular weight greater than 1000 or multiply charged compounds.
- The process allows identification of a family of compounds at different concentrations in a sample.
- The sensitivity of the LTQ-FT and the specificity of the Apex search allows for detection typically two orders of magnitude lower than our lower resolution methods.
Issues
- Single data point is often not enough to ID a component, especially if there are no minors in the sample
- RT of compounds seen previously can be manually used to confirm assignment.
- If the intensity of the UV signal is strong enough one can go back and check that in Xcalibur.
- Exact mass and isotope abundance is the only parameter used to assign an ion. There is no cluster analysis so an ion can be misassigned. This occurs frequently with low mass ions.
Incorrectly assigned ion
Observed m/z 393.2036
- Possible Apex match with calculated 393.20539 M+H of Desmethly-Lincomycin BUT
- Cluster analysis suggest 393.2036 = M+Na since we see an ion at m/z 371.22172 (M+H)
Results
- Found M+ equals 370.2144 and does not correspond to library hit of Desmethyl-Lincomycin

Sierra Analytics is currently working on adding cluster analysis to the software.
Summary
- Literature compounds of a known activity can be identified by exact mass with Apex software. When multiple adducts and/or familial compounds are observed within the sample identification is relatively certain.
- The addition of retention time limits (when known) in the Apex analysis would add more certainty to hits.
- The addition of automated cluster analysis in Apex would reduce analysis times by suggesting better hits especially at masses less than 1000.
References
Automated LC-MS Analysis of Natural Products: Extraction of UV, MS and Retention Time Data for Component Identification and Characterization. Deborah Zink, Claude Dufresne, Jerrold Liesch, Jesus Martin, 50th ASMS conference on Mass Spectrometry and allied topics, 2002.