Step 2: Navigating the NMPDR protein page

Now we want to explore the NMPDR environment for one target protein, so let's all get on the same page. Open the Listeria organism summary page in a new window or tab by clicking here or use the Listeria link in the left navigation bar (download pdf of this tutorial). From that page, type "folK" in the text box and click search. The results returned are limited to strains of Listeria monocytogenes. Click on the NMPDR button button in the row corresponding to strain 1/2a F6854.

Genomic context

Context graphic

The NMPDR protein page lists the organism name, with links to the NCBI Taxonomy ID, and the current functional assignment for the selected protein. Convenient buttons allow immediate display of the protein sequence, DNA sequence, and DNA sequence including 500 nucleotides of flanking sequence on either side. Sequences are provided in FASTA format for easy copy and paste into other programs. Each page shows a graphical display of the genomic context of the protein-encoding gene of interest (the "focus peg "), highlighted in green, centered in a 16-kbp region. Neighboring genes that are conserved in other genomes are drawn in blue, while neighboring genes whose relative proximity is not conserved in at least four other species are shown in red. The functional coupling scores and identities of the blue genes will be shown in a table after clicking the button to show functional coupling. Clicking on any gene arrow will open the context page for that feature.

Compare regions

The displayed region may be compared with corresponding regions in other genomes using "Compare Regions." The size of the compared region and the number of compared genomes are adjustable.  Navigation buttons allow you to scroll upstream and downstream, effectively providing a means for "chromosome walking." Each set of homologous genes is displayed in a common color and assigned a common number.  These are ordered by number of displayed genomes in which similar genes occur.  Non-homologous proteins are shown in gray.  Mousing over the arrows will display information for that gene, and clicking "show" on any arrow will open that protein page with the compare regions display refocused on that gene. The Commentary button will open a new window that displays an information table for each set of homologous genes on the graphic.

Find Orthologs

To view a table of homologous sequences in other genomes, click on the button "Bidirectional Best Hits."  The function returns a table of homologous proteins in other organisms precomputed using BLASTP and ranked by similarity.  These are presumed to be orthologs because they are reciprocal best hits between two proteins in two genomes.  Homologous proteins within one genome, or paralogs, cannot be identified by this method and are not listed in this table.  Selected sequences from this list may aligned with ClustalW by clicking the "align" button. A new window will open that displays the alignment with phylogenetic relationships of the aligned sequences shown in a neighbor-joining tree.

Context table

Next, a table of information describes the context of this "peg," or protein encoding gene, on the sequenced "contig," or contiguous stretch of sequenced DNA. The peg that is the focus of our interest is highlighted in green. Rows in the table above and below the highlighted peg describe pegs that appear up- and down-stream from the peg of interest. Columns in the NMPDR context table list feature id numbers and nucleotide coordinates for the start and stop of each gene, along with the lengths of the genes, plus or minus strand, and the size of gaps or overlaps between neighbors. The functional role of each gene as annotated is listed next, followed by names or numbers assigned to the gene by other sources listed as aliases with links to external resources such as UniProt, GenBank, and KEGG.

We will use tools linked in the table to explore functional clustering in a moment, but first please continue to scroll down the page.

Functional context

If the functional role assigned to the focus peg is included in a functional subsystem, that annotation will appear in a table, linked to a page that will display the functional roles in that subsystem as well as a populated subsystem spreadsheet. The "Protein Families" function will show a table listing the protein families of which the focus peg is a member.  These are not restricted to NMPDR- and SEED-curated families, but include families in other databaseslike TIGRfam, pfam, KEGG, etc.

Annotation details

Details of the functional annotation of the focus peg are presented on the lower half of the page. Gene annotation is an ongoing process, therefore, viewing annotation details can greatly aid in the discernment of functional roles. You may view the history of the annotation or choose "View All Related Annotations" to display a list of the annotation details for proteins in other organisms that are homologous to the focus peg, providing a test for the consistency of annotations. "Assignments for Essentially Identical Proteins" shows a table of the functional assignments for the focus peg in the NMPDR/SEED and other databases, such as UniProt, KEGG, SwissProt, etc. The assigned functions should be the same in all cases because the amino acid sequences are the same, with any variation limited to the location of the start site. This provides another consistency check.

If certain features such as evidence codes that support the functional assignment are included in the annotation of the focus protein, entries will appear in the attributes table under the headers "key" and "value." We are in the process of adding attributes such as microarray data and structural classification as "key-value pairs" to our annotations.

Sequence analysis

The protein sequence of focus may be analyzed with a number of external tools, linked toward the bottom of the page. There is no need to cut and paste; the sequence of focus is automatically input to the chosen tool.

Comparative Analysis Tools

Please scroll up to the context table. There are three columns in the table that are linked to the results of precomputed comparative analyses.

Functional Clusters—CL

The easiest way to determine whether your gene of interest is clustered with other genes is to look at the graphic of the gene's context above the table on the protein page. Blue genes are located within 8 kb up- or downstream of the green gene of focus in at least four other species. Conserved clustering implies related functions. The context of the focus peg may be preserved in other organisms in clusters made up of more or fewer other genes. Whether or not the gene of interest appears to be functionally clustered in its genome, homologs of this gene may occur in clusters in other genomes. These are found by clicking on the CL link in the column headed "find best clusters."

Notice that the Streptococci and Vibrio genomes contain larger clusters. The other three Listeria strains are listed as having 10-gene clusters that include our gene, folK. The strain of Listeria that we started with, 1/2a F6854, also likely has a 10-gene cluster, though only 8 are pictured within the 16 kb shown on the protein page. Strain 1/2a F6854 is not listed on this page because it is showing only homolog-containing clusters in other genomes.

Explore the homologs in clusters to locate the corresponding gene in Bacillus subtilis subsp. subtilis str. 168 or in Escherichia coli K12. Try opening the homologous protein page from another organism (in a new window is best) and compare the context of the homologs. These clusters probably contain different members, so how do you know if the proximity is meaningful? Proximity is most likely to have a functional basis when the clustering is observed with high frequency and across a wide variety of organisms. This has been computed as a functional clustering score, fc-sc.

Homologous pairs—fc-sc

Return to the NMPDR protein page for folK from L.mo. 1/2a F6854. In the column labeled "fc-sc" are found linked numbers in the rows for neighboring proteins. The numbers are computed scores that take into account the number of genomes in which the two genes are neighbors, and the phylogenetic distance between the genomes. The functions of the two linked genes are more likely to be related when they are clustered in a large number of organisms distributed over a wide phylogenetic space, which is represented as a high functional clustering score. The score is approximately equal to the number of different species (not strains) in which the two genes are co-localized. Click on the link for the highest score. This will return a table listing pairs of homologs with links to their NMPDR protein pages.

Chromosomal clusters—Pins

Graphic display

Return to the NMPDR protein page for HPPK from L.mo. 1/2a F6854 and click on the Pins button. A separate window will open with a graphic display of different versions of the clusters as they occur in other genomes. These genomes are arranged on the page phylogenetically. The red genes, labeled 1, are aligned in the center of the page. These are homologs of the target gene, folK in this case. All of the genes within about 8 kb of this central "pin" are shown. Genes that share the same numerical label are homologs and are depicted in the same color. The identity of the gene and genome are revealed upon mouse-over. Another way to see the identities of the pictured genes is to click on the "commentary" button toward the bottom of the page. Before you do that, let's examine the graphic display of pinned regions.

This example is particularly interesting because there seem to be two very small genes labeled "1" in the genomes displayed at the top of the page. One of the red arrows is offset below the line because the DNA sequences of the two red genes overlap. Scroll down the page slowly until you reach the Streptococcus pneumoniae (Str.pne) clusters. Compare the pneumo clusters with those from pyogenes. Gene fusions and rearrangements are observed within the Strep species. The context of the Strep genes is different from that of the Listeria, while the Listeria are homologous to Staph.; The context of the Listerial and Staphylococcal HPPK region is similar to the region in Bacillus species, except for an insertion of three genes (11, 12, 13) involved in biosynthesis of both folate and tryptophan. Can you find gene fusions in any more organisms?

Tabular display

Click on the Commentary button toward the bottom of the page. A new window will open that displays an information table for each set of homologous genes on the graphic. The first table lists our target enzyme (HPPK, EC 2.7.6.3) as one occurrence of gene 1 in each organism, and dihydroneopterin aldolase (DHNA, EC 4.1.2.25) as the second occurrence of gene 1. The two different enzyme activities are shaded different colors in the functions column, and the fusion of the two activities is shaded in a third color. Uniprot id numbers are links to a description of the gene, with further links to other resources, which may include metabolic maps at KEGG.

Scroll down the page until you reach the first orange entry, which is the functional fusion in strain R6 of Strep pneumoniae. Click on the Uniprot link and explore the information table. You should find that the fused functions, EC 2.7.6.3 and EC 4.1.2.25, catalyze consecutive steps in the folate biosynthetic pathway, with the start of the protein responsible for step 4 and the end of the protein responsible for step 5.

Redraw with selected genomes

Scrolling further down the commentary page will reveal another table, entitled "Keep Just Checked" that lists the organisms in alphabetical order, decodes the abbreviations of the organisms, and lists which of the genes in the display are contained in each organism.  Each organism in the display corresponds to a row in this table, and each has a check box.  You may choose a subset of organisms in this list and redraw the pins display to show just the checked organisms by clicking the button "Picked Maps Only," at the very bottom of the page.