Glypeps

ACCURATE MASS & PEPTIDE MODIFICATION

W.D. Lehmann, A.Bohne, C-W. von der Lieth

German Cancer Research Center
Im Neuenheimer Feld 280
69120 Heidelberg, Germany

  


Shortcut
  Input Level 1     Input Level 2     Level 1  
  Level 2     References     Requests  
Paper
The information encrypted in accurate peptide masses-improved protein identification and
assistance in glycopeptide identification and characterization.

J Mass Spectrom. 2000 Nov;35(11):1335-41.

ABSTRACT:
Help to unravel information encrypted in the deltamass value of accurate peptide masses. Deltamass is the post-decimal-point part of the accurate mass.

Physico-Biological Effect:
Deltamass values of peptide masses fall into limited segments of the continuous mass scale - 'peptide mass zones' - interrupted by 'peptide-forbidden mass zones' as outlined by Mann (1) and Sundqvist et al. (2). This effect is a consequence of the limited variation in elemental composition of peptides. Glycopeptides, in particular those with high carbohydrate and low peptide content have an elemental composition that is markedly different from peptides. As a consequence, glycopeptide deltamass values are located on the low mass side of the deltamass distribution. They are frequently found in the zone between the (deltamass -2 sigma)-value of peptides and the deltamass value of pure carbohydrates.

Functions:
Two levels of support are offered: Level 1 deliveres a display of the input deltamass value relative to the distribution of peptide deltamass values extracted from the nrdb database. In case the deltamass value is indicative of a glycopeptide, level 2 can be used to generate a list of proposed N-glycan structures when the sequence of the stem protein is known.

INTRODUCTION
Peptide Deltamass Distribution Recent advances in delayed-extraction MALDI-TOF, FT-ICR combined with ESI or MALDI, and ESI-TOF instrumentation give access to peptide molecular mass data with an accuracy in the range of 1-20 ppm. At this level of accuracy, peptide mass provides analytical information.
Mann (1) and Sundqvist et al. (2) have pointed out that peptide masses cover only limited 'allowed' regions on the mass scale interrupted by 'forbidden' zones. With increasing mass 'allowed' regions increase at the cost of 'forbidden' zones up to about 4000 Da, where 'forbidden' zones vanish.
Mann (1) has calculated the average peptide masses and their 95% confidence limits as a function of peptide molecular weight for the case of random distribution of the 20 proteinogenic amino acids and for random distribution of all possible peptide sequences (1). In autumn 1998 we have extracted the distribution of tryptic peptide masses using the current version of the nrdb protein database. Values were extracted in steps of 100 Da between mass 500 Da and 3000 Da. Data were fitted with a Gaussian distribution, average mass and standard deviation were obtained from this fit. The functions obtained from this fit are given in Figure 1 for the mass range from 500 Da to 4000 Da.

Figure of True versus Statistical Peptide Deltamass Distribution
Figure: 1

Figure 1 demonstrates that there is a linear increase of the deltamass value with increasing peptide mass showing a slope of about 1 Da per 2000 Da of nominal mass. In parallel to the increase of the deltamass value, the width of the deltamass distribution increases. Roughly at mass 4000 Da, the ± 4 sigma width encompassing 99.9 % of all peptide masses approaches 1 Da, so that the forbidden peptide mass zone vanishes for higher mass values.

True versus Statistical Peptide Deltamass Distribution
The published data for average deltamass values and the deltamass distribution width (1,2) are mainly based on a statistical distribution of both peptide sequences and amino acid abundances. The distributions given in this page are based on the set of existing tryptic peptides as extracted from the nrdb database. Mean deltamass values in our true distributions are somewhat higher than those derived from the statistical approach. This mass shift is caused by the nonstatistical distribution of amino acids. Nonpolar amino acids such as L, I, V and A with strongly positive deltamass values exhibit a disproportionately high natural abundance, whereas the sulfur-containing amino acids C and M with low deltamass values have a very low natural abundance . Both phenomena exert a high-mass shift on the true average of the peptide deltamass value compared to its statistical value. Glycopeptide Deltamass As indicated in Figure 1, deltamass values of pure oligohexoses are located on the low-mass side of peptide deltamass values. Deltamass values of glycopoeptides with short peptide sequences such as those obtained by digestion with pronase are located between the -2 sigma line of peptide deltamasses and those of pure carbohydrates. This page provides help to identify possible glycopeptides and it provides a preselected set of isobaric glycan structures that are compatible with the measured glycopeptide mass when the sequence of the stem protein is known.

Accurate Mass & Peptide Modification offers two levels of support:

LEVEL 1: Mass Display only:
Function: You can display the measured mass of a putative peptide relative to the random and the true distribution of peptide molecular masses. You obtain information about the position of the measured mass relative to the 'allowed' random and true zone of masses of an unmodified peptide. A peptide mass positioned near the left edge of these zones and is a candidate for level 2 glycopeptide search, when the sequence of the stem protein is known.
Example: "nonpolar" sequence
[DLLRN LLQVD LTK]
Molecular
weigth:
amu

Peptides containing many polar amino acids and/or methionine or cysteine show deltamass values lower than the average:
Example:"polar" sequence
[GPGDT SNFDD YEEEE IR]
Molecular
weigth:
amu

Peptides with attached N-glycan show deltamass values significantly lower than the average:
Example:sequence with attached N-Glycan
Molecular
weigth:
amu
Covalent modifications impose a shift on the peptide deltamass value, e.g. lipidation results in shift of deltamass values to higher masses due to the high hydrogen content, glycosylation results in a shift towards lower deltamass values due to the high oxygen content. This low-mass shift in glycopeptides is augmented for molecules with a high carbohydrate and a low peptide content. Those molecules are obtained e.g. by pronase digestion (3). A peptide mass positioned near the left edge of a deltamass distribution is a candidate for a level 2 search: Mass Display plus N-Glycopeptide Search , in particular when the digest is performed in a manner that generates small peptide fragments. Near their left hand borderline the deltamass distributions show a line representing the deltamass value of a pure carbohydrate signal with the brutto formula (Cn(H2O)n). Carbohydrate deltamass values for masses between existing oligosaccharide molecules have been interpolated. Glycopeptide masses are positioned on the right hand side of this line. The deltamass distance of a glycopeptide relative to this line in tendency increases with increasing peptide content.

Tool
Expasy's Peptide Mass calculation tool

Input / Work
Molecular
weigth:
amu Tolerance: amu

LEVEL 2: Mass Display plus N-Glycopeptide Search:
Function: The programm requires the input of an accurate mass value with a mass tolerance, the input of a protein sequence and the amino acid residue step width around a possible N-glycosylation site. The program identifies all possible N-glycosylation sites in the input protein sequence, calculates the masses for the peptide fragments around each possible N-glycosylation site with the input step width, substracts these masses from the input mass and searches a database of accurate masses of N-glycans with the given mass tolerance. The N-glycan database has been constructed extracting all N-glycan structures from CarbBank(5), generating their molecular formular and accurate masses using SWEET(6) and storing this information together with the original data contained in CarbBank. As a result a hit list of N-glycan structures fulfilling the desired query is provided. .

Example: Chicken ovalbumin
Molecular
weight:
amu Tolerance: amu
AA Sequence:
 
      SIGAASMEF CFDVFKELKV HHANENIFYC PIAIMSALAM VYLGAKDSTR TQINKVVRFD
     KLPGFGDSIE AQCGTSVNVH SSLRDILNQI TKPNDVYSFS LASRLYAEER YPILPEYLQC
     VKELYRGGLE PINFQTAADQ ARELINSWVE SQTNGIIRNV LQPSSVDSQT AMVLVNAIVF
     KGLWEKAFKD EDTQAMPFRV TEQESKPVQM MYQIGLFRVA SMASEKMKIL ELPFASGTMS
     MLVLLPDEVS GLEQLESIIN FEKLTEWTSS NVMEERKIKV YLPRMKMEEK YNLTSVLMAM
     GITDVFSSSA NLSGISSAES LKISQAVHAA HAEINEAGRE VVGSAEAGVD AASVSEEFRA
     DHPFLFCIKH IATNAVLFFG RCVSP
Warning: Mac computers have problems with this example.
Digestion of a glycoprotein using pronase is particularly useful since this enzyme generates glycopeptides with short peptides. Assignment of the true structure from the glycan hitlist generated by mass match alone has to be performed by other methods (ms/ms, enzymatic degradation). Good luck

Tool
Expasy's Peptide Mass calculation tool

Protein Databases
SwissProt PDB OWL ENTREZ SRS Merops

Input / Work
Molecular weigth: amu Tolerance: amu Step size:
AA sequence (max 4000 chars):


Questions concerning:
Mass spectrometry: wolf.lehmann@dkfz-heidelberg.de
Implementation: a.bohne at dkfz-heidelberg.de
Carbohydrates: w.vonderlieth@dkfz-heidelberg.de

References
1. Mann, M. Possible Peptide Masses. Proceedings of the 43rd Conference on Mass Spectrometry and Allied Topics, Atlanta, GA, 21-25 May 1995, p. 639
2. Zubarev, R. A.; Hakansson, P.; Sundqvist, B. Accuracy requirements for peptide characterization by monoisotopic molecular mass measurements. Anal. Chem. 68;4060-4063; 1996.
3. Juhasz, P.; Martin, S. A. The utility of nonspecific proteases in the characterization of glycoproteins by high-resolution time-of-flight mass spectrometry. Int. J. Mass Spectrom. Ion Proc. 169/170:217-230; 1997.
4. Amino acid abundance data of the nrdb protein database were kindly provided by Peter Mortensen, Protana, Odense, DK.
5.
http://128.192.9.29/carbbank/CarbBank.htm
6. Bohne,A.; Lang, E.;von der Lieth,C.W. W3-SWEET: Carbohydrate modeling by internet. J. Mol. Model. 4/1 (1998) 33-43. The SWEET-program: http://www.dkfz-heidelberg.de/spec/sweet2/


Generated by Andreas Bohne

Base: http://www.dkfz-heidelberg.de/spec/glypeps/

Keywords: GLYPEPS, Andreas Bohne, Wolf Dieter Lehmann, Willi von der Lieth, Glykoproteine, glycoproteins, Deutesches Krebsforschungszentrum, German Cancer Research Centre, Glypeps, Accurate Mass, ...