 |
A ProfileGrid of 300 bacterial RecA homologs is below. The multiple sequence alignment (MSA) is an update of an earlier, smaller alignment (Roca & Cox 1997). The ProfileGrid is broken into four tiers of approximately 100 residues each. At the top of each tier is the template sequence which in this case is the E. coli RecA protein. Its length is 352 residues; and, the template sequence determines the position numbering of the ProfileGrid ruler also at the top of each tier.
The 21 rows under the template sequence represent the frequency of the amino acid and gap characters at the corresponding column position in the MSA. To facilitate inspection at this size, the frequency values are not shown in each ProfileGrid cell. However, each cell is colored according to its value within the following mutually exclusive bins: <10% (white), >=10% (gray), >=25% (yellow), >=50% (orange), >=70% (green), and >=90% (red). This is the default Frequency Colors Legend but the number, size, and color of bins are user-defined.
ProfileGrids have other useful features. First, in this example the 20 amino acid code characters are in alphabetical sort order and the gap character is the last row. Other residue chemical values (such as volume) can be used to sort the rows to allow a search for structural patterns. Second, the location of conserved regions (called similarity boxes) is determined by similarity plot calculations. Here, parameters used were a window size of 9 and the BLOSUM62 scoring matrix. A threshold value of 80% similarity marks the box endpoints. Third, the JProfileGrid graphical user interface allows one to identify the sequences in each ProfileGrid cell. Finally, JProfileGrid exports a spreadsheet file for final formatting to produce figures. For more details, see the JProfileGrid documentation.
In summary, a ProfileGrid concisely depicts all of the character information from a MSA. Other representations such as consensus sequences, Sequence Logos, and similarity plots only summarize alignment content.
|