Protein Sequence Database Search
- Peptide Mass Fingerprint or Peptide Mapping
In this method, the observed masses of peptides obtained from a digest of an unknown protein are compared to the predicted masses of peptides from the theoretical digestion of proteins in a database. If enough peptides from the experimental (usually MALDI MS data) and theoretical mass spectrum overlap, a protein can be identified. This method should be used when a sample to be analyzed contains a purified protein, and when a protein to be identified is from a species that is well represented in a sequence database. Although peptide mass fingerprint data continue to be accepted in the literature, the requirements have become more stringent. To confirm the sequences of peptides identified from peptide mass fingerprint data, or apply MS/MS ion search methods for independent protein identification, the APL also offers MS/MS fragmentation analysis on the most intense peaks detected in MALDI MS spectra.
- MS/MS Ion Search
This method is much more powerful than the peptide mass fingerprint, because it uses peptide amino acid sequence information from MS/MS spectra as well as peptide mass. Experimental masses of peptide fragment ions are compared to the predicted masses of peptide fragments from the theoretical fragmentation of peptides in a database. If enough fragment ions from the experimental and theoretical mass spectrum overlap, a peptide can be identified. Obtaining matches to a number of peptides from a single protein provides a high level of confidence that the result is correct. This method is recommended for projects involving identification of proteins in complex mixtures or/and protein posttranslational modifications.
What protein sequence database should be used for searches?
The APL clients can request a search from any of the main protein sequence databases available publicly or a custom database when protein sequences are unique or constructed by a researcher. For large scale experiments, when results of additional statistical analyses have to be provided for publications, additional searches may be done against a database in which the sequences have been reversed or randomized.
Protein Databases
- Entrez Protein database of the National Center for Biotechnology Information (NCBI) - large database with much internal redundancy
- Universal Protein Resource (UniProt) - for protein sequences and functional information on proteins, with accurate, consistent, and rich annotation
- International Protein Index (IPI) - contains deposited protein and translated cDNA sequences, and information on predicted genes on the basis of genomic and expressed sequence-tag data