Prosite “is a database of protein families and domains. It is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.”
Blocks “are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. Block Searcher, Get Blocks and Block Maker are aids to detection and verification of protein sequence homology. They compare a protein or DNA sequence to a database of protein blocks (current version), retrieve blocks, and create new blocks, respectively.”
Prints “is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterise a protein family; its diagnostic power is refined by iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbours.”
Pfam “is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families.”
BLAST
The original paper describing BLAST, which “provides a method for rapid searching of nucleotide and protein databases. Since the BLAST algorithm detects local as well as global alignments, regions of similarity embedded in otherwise unrelated proteins can be detected.” Blast can be used to search many different databases.
ToxoDB: An example of a specific sequence database. Shows how to use Perl regular expressions to search it. Exploits numeric codes to represent classes of amino acids. For example, 6 will match any hydrophobic amino acid.
Prosite database of significant motifs and patterns