Major Sequence Databases


major sequence databases

The one thing that plays a crucial role in the study of Life Science is a Biological Database. Wondering what it is? Well, a biological database is more like a library stocked-up with information related to Life Science and scientific technology, based on the reports of scientific experiments for the purpose of computational analysis. To deliberate the concept further, databases in biology are merely records of experiments in the past.

These documents are rich with research relevant information covering subjects like genomics, metabolomics, proteomics, microarray and phylogenetics. Biological databases store information on the structure of cells and chromosomes, chief functions of genes, clinical effects of mutations and the very complex biological sequence.

Biological database is broad concept. Hence, it is subdivided into two wings – a sequence database and a structure database. While a sequence database contains information on nucleic acid and protein sequences, the structure database is solely stocked with information regarding protein sequences.

Details about Sequence Database

Sequence Database is of huge importance in the field of Bioinformatics. In this type of database, information related to protein sequences, nucleic acid sequences and polymer sequences are coded in digital forms and are stored on a computer. A major example of a protein sequence database is the ‘UnitProt.’ This database contains reports derived from projects on Genome Sequencing. It highlights information related to the biological function of protein. The UnitProt is a consortium comprising of big names like the EBI (European Bioinformatics Institute), PIR (Protein Information Resource) and SIB (Swiss Institute of Bioinformatics).

A few other Protein Sequence Databases are:

  • TIGR – This database is enriched with curated information on DNA and Protein Sequence, Protein family, taxonomic data for microbes, humans and plants, cellular structure and gene expressions.
  • DOMO – All information on homologous protein domain families are well researched and skimmed to be a part of the DOMO database.
  • ProSite – This is a database of protein families, primarily focusing on the biological sites, patterns and profiles of domains.

The EMBL Nucleotide Sequence Database is worth a mention. It comprises of DNA and RNA sequences, single-handedly submitted by the researchers. This database also keeps records of genome sequencing groups.

All major sequence databases in biology are operated using advanced computerized softwares, updated on a frequent basis.

Here’s the list of major sequence databases.


DDBJ (DNA Data Bank of Japan) is the sole DNA data bank in Japan, which is officially certified to collect DNA sequences from researchers and to issue the internationally recognized accession number to data submitters. We collect data mainly from Japanese researchers, but of course accept data and issue the accession number to researchers in any other countries.


The EMBL Nucleotide Sequence Database constitutes Europe’s primary nucleotide sequence resource. Main sources for DNA and RNA sequences are direct submissions from individual researchers, genome sequencing projects and patent applications.


The Ensembl database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files.


GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 22,617,000,000 bases in 18,197,000 sequence records as of August 2002.