The Use of Exact Mass Measurements at High Resolution to Probe Structural Diversity in Complex Natural Product Extracts

Deborah L. Zink, Merck Research Laboratories Rahway, NJ
Scott Campbell, Sierra Analytics Inc. Modesto, CA

[This publication was originally presented as a poster at ASMS 2006.]

Abstract

Micro-organisms often produce families of compounds. In our search for new antibiotics it is not unusual to find families of known antibiotics during the screening process. It is important to quickly dereplicate known compounds and to associate activity with specific components. We are using the specificity of exact mass measurements obtained from a Thermo Finnigan LTQ-FT and the data extraction/processing tools in Apex to identify/dereplicate families of compounds in a semi-high-throughput manner.

Objective

  • To identify natural products of known activity at low levels (ng) in crude extracts in a semi-high-throughput manner.
  • Literature can be searched to find compounds from given organism class with the appropriate biological activity.
    • If we have previously seen the compounds we can use our low resolution dereplication LC-MS tools previously described to identify the knowns (reference 1). Can we increase our sensitivity by going to LTQ-FT with Apex detection?
    • Can literature compounds be identified by exact mass measurements with enough certainty to drop the sample?

Approach

A Thermo-Finnigan LTQ-FT is used to obtain LC-MS data at exact mass. Apex software is used to extract the exact mass data and compare it to a library of known compounds, with a specific activity, from a given organism class. The hits are analyzed and a report generated for each sample set.

Instrument Methods

  • LC-conditions: Column: Zorbax SB-C8, 2.1x30 mm; Temp: 40°C; Flow rate 300 μl /min; Solvents: A =10% acetonitrile 90% water with 1.3 mM trifluoro-acetic acid and ammonium formate B = 90% acetonitrile 10% water with 1.3 mM trifluoroacetic acid and ammonium formate; Gradient 10% B to 100% B in 6 min, hold 2 min, initialize 2 min. Either 100 vials or 2 96 well plates can be analyzed.
  • MS Instrument: Thermo Finnigan LTQ-FT with the standard Ion Max API source (without the sweep cone) and ESI probe. Three scan events were used. The ion trap was scanned from 150-2000 first in negative ion mode and then in positive ion mode. The FT was scanned from 200-2000 in the positive ion mode only. In all cases the SID was set to 18 volts to try to reduce multiple ion clusters.
  • Data analysis: Apex software with the following standard conditions; peak width/resolution is set to 100,000 resolution at mass 400, MS search tolerance set between 0.008Da and 0.003Da.

Description of Apex Software

  • Apex uses a proprietary algorithm to determine the baseline and compute the peak centroids of continuum mass spectral data. The primary objective of the algorithm is to accurately determine peak centroid m/z values to facilitate use in accurate mass applications.
  • Apex targets compounds in acquired mass spectral data sets by computing theoretical isotope clusters of all library compounds and exhaustively comparing each cluster to each spectrum in the data set, taking into account elevated mass accuracy.
  • Apex is designed for high-throughput application, with imported sample list and tabular results files in XML format.
  • Apex provides a viewer for batch analysis of hits.

Libraries

  • Our libraries consist of molecular formulas and compound names. A library is based on targeted assay areas, and organism class.
  • Depending on the library we calculate and search for M+H, M+Na, M+NH4 and/or the multiply charged species of the same adducts.

Screenshots from Apex

Libraries in Apex

Libraries in Apex

Summary Result Table from Apex

Summary Result Table from Apex

Detail list of results is saved to Excel

List of results

Apex viewer allows one to confirm assignment of desired hits

Apex viewer

Reports from Apex (Excel)

Chemist result file in which several families of compounds were identified

Dataset Name Name Formula MW Adduct m/z Found RT MICC (Area)
Sample A "Sulfomycin III 65-Deoxy 21-demethoxy" C52H48N16O14S2 1184.29773 [M + Na] + 1207.28696 1207.28347 4.24 72279
Sample A "Sulfomycin III 65-Deoxy 21-demethoxy" C52H48N16O14S2 1184.29773 [M + H] + 1185.30501 1185.30707 4.24 294857
Sample A "Sulfomycin III 65-Deoxy 21-demethoxy" C52H48N16O14S2 1184.29773 [M + NH4] + 1202.33156 1202.33179 4.26 362086
Sample A 21-Demethoxysulfomycin I. Pre- C53H50N16O15S2 1214.3083 [M + Na] + 1237.29752 1237.29711 4.06 334417
Sample A 21-Demethoxysulfomycin I. Pre- C53H50N16O15S2 1214.3083 [M + NH4] + 1232.34213 1232.34402 4.06 1868063
Sample A Sulfomycin I C54H52N16O16S2 1244.31886 [M + H] + 1245.32614 1245.32576 4.09 2851985
Sample A Sulfomycin I C54H52N16O16S2 1244.31886 [M + NH4] + 1262.35269 1262.35281 4.09 6992755
Sample A Sulfomycin I C54H52N16O16S2 1244.31886 [M + Na] + 1267.30809 1267.31064 4.09 2209156
Sample A Sulfomycin II C54H52N16O15S2 1228.32395 [M + NH4] + 1246.35778 1246.35812 4.49 2621796
Sample A Sulfomycin II C54H52N16O15S2 1228.32395 [M + H] + 1229.33123 1229.32906 4.49 697648
Sample A Sulfomycin II C54H52N16O15S2 1228.32395 [M + Na] + 1251.31317 1251.31364 4.49 659698
Sample B Thiostrepton A C72H85N19O18S5 1663.49236 [M + Na] + 1686.48158 1686.48323 4.89 13273809
Sample B Thiostrepton A C72H85N19O18S5 1663.49236 [M + H] + 1664.49963 1664.49854 4.89 42861015
Sample B Thiopeptin Ba C71H84N18O18S6 1668.45353 [M + NH4] + 1686.48736 1686.48336 4.86 12599233
Sample B Thiostrepton B C66H79N17O16S5 1525.44943 [M + H] + 1526.45671 1526.45598 4.83 1967091
Sample B Thiostrepton B C66H79N17O16S5 1525.44943 [M + Na] + 1548.43865 1548.43745 4.83 253435

Familial compounds lead to increased certainty of identifications

Entry Name Formula MW Adduct m/z Found RT MICC (Area)  
Sample A Erythromycin F C37H67NO14 749.45616 [M + Na] + 772.44538 772.44541 2.86 866,658 isobaric
Sample A Erythromycin F C37H67NO14 749.45616 [M + H] + 750.46343 750.46397 2.86 16,950,344 isobaric
Sample A Erythromycin C C36H65NO13 719.44559 [M + Na] + 742.43481 742.43501 3.43 2,764,325  
Sample A Erythromycin C C36H65NO13 719.44559 [M + H] + 720.45287 720.45332 3.43 42,784,517  
Sample A Erythromycin E C37H65NO14 747.44051 [M + H] + 748.44778 748.44745 3.49 23,624,085  
Sample A Erythromycin A C37H67NO13 733.46124 [M + Na] + 756.45046 756.45129 3.60 22,219,063 isobaric
Sample A Erythromycin A C37H67NO13 733.46124 [M + H] + 734.46852 734.46836 3.60 483,896,070 isobaric
Sample A 6,7-Anhydroerythromycin C C36H63NO12 701.43503 [M + Na] + 724.42425 724.42454 3.83 7,613,955  
Sample A 6,7-Anhydroerythromycin C C36H63NO12 701.43503 [M + H] + 702.44230 702.44277 3.83 16,560,242  

Results of previous table shown in Xcalibur

Results of previous

Note: Frequently the compounds of interest do not give any UV signal and are not the most intense peaks in the TIC or base peak trace.

Results

  • The entire process works well for compounds of molecular weight greater than 1000 or multiply charged compounds.
  • The process allows identification of a family of compounds at different concentrations in a sample.
  • The sensitivity of the LTQ-FT and the specificity of the Apex search allows for detection typically two orders of magnitude lower than our lower resolution methods.

Issues

  • Single data point is often not enough to ID a component, especially if there are no minors in the sample
    • RT of compounds seen previously can be manually used to confirm assignment.
    • If the intensity of the UV signal is strong enough one can go back and check that in Xcalibur.
  • Exact mass and isotope abundance is the only parameter used to assign an ion. There is no cluster analysis so an ion can be misassigned. This occurs frequently with low mass ions.

Incorrectly assigned ion

Observed m/z 393.2036

  • Possible Apex match with calculated 393.20539 M+H of Desmethly-Lincomycin BUT
  • Cluster analysis suggest 393.2036 = M+Na since we see an ion at m/z 371.22172 (M+H)

Results

  • Found M+ equals 370.2144 and does not correspond to library hit of Desmethyl-Lincomycin

Figure 6

Sierra Analytics is currently working on adding cluster analysis to the software.

Summary

  • Literature compounds of a known activity can be identified by exact mass with Apex software. When multiple adducts and/or familial compounds are observed within the sample identification is relatively certain.
  • The addition of retention time limits (when known) in the Apex analysis would add more certainty to hits.
  • The addition of automated cluster analysis in Apex would reduce analysis times by suggesting better hits especially at masses less than 1000.

References

Automated LC-MS Analysis of Natural Products: Extraction of UV, MS and Retention Time Data for Component Identification and Characterization. Deborah Zink, Claude Dufresne, Jerrold Liesch, Jesus Martin, 50th ASMS conference on Mass Spectrometry and allied topics, 2002.