Protein Databases

    Protein Databases

    Proteins, also known as the essential building blocks of a human body are a vital part of living organisms. Studies regarding the structure and functions of proteins are executed to learn about the metabolic pathways of cells. This is strictly physiological. Wondering what term is used to define the study of proteins? Well, it is known as Proteomics. This term is derived from the word Proteome. An entire set of proteins is referred as a proteome. Mark Wilkins in the year 1994 first titled the term proteome, as a metaphor to the term genome.

    It might interest you to learn that the composition and nature of proteins synthesized by a human body varies with time. Sometimes, any one protein can undergo noticeable modifications under conditions of stress. Therefore, describing Proteomics as a complex and challenging subject to study is not quite incorrect.

    Experimenting and studying protein structures is not possible without referring to the very informative protein databases. These databases are replete with information about protein structures in the form of three-dimensional co-ordinates, angles and unit cell dimensions. Other than providing immensely useful experimental dataset, these information benefit processes like X-Ray Crystallography and structure based drug design.

    Speaking of protein databases, mention of the PDB is a must. PDB or the ‘Protein Data Bank’ was first founded in the year 1971. It was and still is the central archive, storing all information about the experimentally determined protein databases. The PDB is further controlled by an International Corporation, popular as the Worldwide Protein Data Bank.

    Traditional Methods of Studying Protein Structures:

    • ELISA – This method, also called ‘The Enzyme-Linked Immunosorbent Assay’ is quite an age-old technique. To detect the nature of sample proteins and measure them accurately, methods of ELISA are put into use.
    • MSIA – Do you know which method has earned a ‘GOLD’ standard ranking in order to study quantitative proteomics? First brought into use by Randall Nelson, MSIA is still practiced and considered as the best blend of mass-spectrometry and the traditional technique of immunoassay.
    • SISCAPA – A very popular method of studying proteins today is the SISCAPA or the ‘Stable Isotope Standard Capture with Anti-Peptide Antibodies.’

    Examples of popular protein databases (structure):

    • OCA – This is a browser-database storing relevant information on protein structure and functions.
    • ModBase – Putting the concepts of comparative modeling into use, all experimental datas calculated in the form of three-dimensional models are stored in the ModBase.
    • ProtCID – ProtCID or the Protein Common Interface Database has in store some valuable experimental databases of homologous proteins.

    A Library of Protein Family Cores

    We have taken structural alignments of protein families and computed average core structures for each family. The core structures can be divided into residues with low spatial variation and those with high spatial variation. Amino acids with low spatial variance occupy essentially the same relative position in all family members. This library is useful for building models, threading, and exploratory analysis. It is also a useful mechanism for summarizing variability in NMR structures.


    Amino Acids Sequence Database (PRF/SEQDB)

    This database consists of amino acid sequences of peptides and proteins, including sequences predicted from genes. You can also search literature in which the sequence is presented.Sequences not included in EMBL, GenBank and SwissProt are also found in PRF/SEQDB since it is constructed on the basis of all amino acid sequences of peptides and proteins reported in literature.



    Analysis of protein an dprotein-DNA interactions.


    Cytokine Family Database

    The Cytokine Family Database (dbCFC) is a collection of EST (Expressed Sequence Tag) records of cytokines deposited in the NCBI GenBank. It provides information about the identification of EST records to cytokine members and related data contained in other databases.



    MHCPEP is a database comprising over 13000 peptide sequences known to bind MHC molecules. Entries were compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and, when available, experimental method, observed activity, binding affinity, source protein, anchor positions, and publication references.



    OWL is a non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority source, all others being compared against it to eliminate identical and trivially-different sequences. The strict redundancy criteria render OWL relatively “small” and hence efficient in similarity searches.


    PDB – The Protein Data Bank

    An international repository for the processing and distribution of 3-D macromolecular structure data primarily determined experimentally by X-ray crystallography and NMR.


    PIR – Protein Information Resource

    The Protein Information Resource (PIR), in collaboration with the Munich Information Center for Protein Sequences(MIPS) and the Japanese International Protein Sequence Database (JIPID) maintains the PIR-International Protein Sequence Database – a comprehensive, annotated, and non-redundant set of protein sequence databases in which entries are classified into family groups and alignments of each group are available.



    The PIR-NREF is a Non-redundant REFerence protein database designed to provide a timely and comprehensive collection of all protein sequence data, keeping pace with the genome sequencing projects and containing source attribution and minimal redundancy.



    The Protein Mutant Database (PMD) covers natural as well as artificial mutants, including random and site-directed ones, for all proteins except members of the globin and immunoglobulin families. The PMD is based on literature, not on proteins. That is, each entry in the database corresponds to one article which may describe one, several or a number of protein mutants.