Other help topics:    For The Public

Domains And Motifs Section

Introduction

The Domains & Motifs annotation section contains a combination of computational domain/motif prediction and domain/motif database references. The Pfam and SMART hidden Markov model (HMM) databases are used for domain classification, and the PRINTS database is used for motif classification. A graphical domain map is presented, showing the regions of the sequence that match the domains and motifs. InterPro is used to group the matches in the domain map, and matches that are verified by UniProt references are displayed in bold.

A summary of InterPro matches without computational evidence is shown in a table below the domain map, when applicable. These matches are found from InterPro references in UniProt records. These may not have computational evidence because the InterPro record only consists of databases that are not part of computational scheme (e.g. Prosite or PRODOM), or they may not match computationally because our reference sequence is a splice variant that does not contain the referenced domain.

The hmmpfam tool from the HMMER package is used to align the Molecule Page Protein sequence to the HMMs from Pfam and SMART. In addition, the Pfam and SMART references in all related UniProt records are noted. We use an E-value cutoff of 0.1, though that is relaxed for a few special cases.

The FingerPRINTScan tool provided with PRINTS is used to align the Molecule Page Protein sequence to profiles derived from the PRINTS database. In addition, the PRINTS references in all related UniProt records are noted. An E-value cutoff of 0.1 is used for this tool. Any records reported as being significant matches, as well as any records that match all the motifs in the FingerPrint, are reported.

Result Fields and Links

Tool Output

This hyperlink is to the raw text output that gets generate by the listed computational scheme -- either Pfam and SMART with hmmpfam, or PRINTS with fpscan (FingerPRINTScan). This output is provided for the users to get a rough visualization of the alignment that was generated.

NCBI Conserved Domain Summary

This hyperlink is to RPS-BLAST results at NCBI for the sequence representing the Molecule Page protein. This gives the user a visualization of a computational domain prediction.

Database ID

This is the unique identifier of the domain/motif record that is matched, with the short one-word description of the record. The identifier is hyperlinked to the database record at the Pfam, SMART, or PRINTS web site.

Description

The title of the record in its constitutive database (these are generally more informative than the one-word titles).

Referenced by Molecule Page Protein

UniProt references are used to classify the family, domain, or motif match. A value of "Yes" means that a reference for a UniProt sequence matching the Molecule Page sequence exists. A value of "By known variant" means that the reference that was found was for a sequence different than the Molecule Page sequence, but part of the same gene group (i.e. Entrez Gene group).

Num Matches

For Pfam and SMART, this is the number of times the model domain matches the Molecule Page protein sequence. For PRINTS, this is the number of motifs in the model signature that match the Molecule Page protein sequence. This value can be zero when there is a database reference that cannot be validated with the computational procedure.

Score (Pfam and SMART) or PF Score (PRINTS)

This is a numerical measure of the alignment of the Molecule Page protein sequence to the Pfam or SMART model. A larger score means a better alignment. Each individual match has a score, and for Pfam and SMART records there is a total score (which is roughly the sum of the scores of each individual match).

E value

A statistical measure of the alignment, which roughly gives the probability that the match to the domain or motifs occurs at random. A smaller E-value means a better alignment.

P value and Pp value (PRINTS)

Another statistical measure of the alignment, unique to PRINTS. The Pp value is the product of all the P values from each motif match.

Match Num (PFAM and SMART)

The order of the PFAM or SMART domain match in the Molecule Page protein sequence (e.g. a Match Num of 2 is the second match of the domain within the sequence).

Motif Num (PRINTS)

This is the number of the specific motif in the PRINTS signature.

Sequence Coords

The region of the Molecule Page protein sequence that matches the domain or the motif.

Motif Length (PRINTS)

The length of the motif that matches the Molecule Page protein sequence.

Motif Sequence (PRINTS)

The amino acid sequence of the motif that matches the Molecule Page protein sequence.