SigGenes
From NMPDR Wiki
Use the signature genes tool to find genes that are common among one set, or differentiate one set from another.
Like most searches in the NMPDR, this is a search for genes. In particular, it is a search for genes in a single genome, known as the reference genome. The reference genome is compared to two sets of genomes, the inclusion genomes and the exclusion genomes. For each gene in the reference genome, the search strives to determine if it is common in the inclusion set and uncommon in the exclusion set, as shown in the diagram. If the reference gene passes the test, it is displayed in the search results.If the exclusion set is empty, then the search will return genes that are common to the inclusion set (on the grounds that everything is uncommon in an empty set). Note that in both cases, the reference genome is considered part of the inclusion set.
To begin the search, select the reference genome from the drop-down box at the top of the form. The inclusion and exclusion sets are specified using standard genome controls. When you select a genome for one set, it will automatically be deselected from the other set.
You have the option of filtering the genes in the reference genome before they are examined, using the options below.
| Search Words | Keyword Box | Enter one or more keywords to limit the set of genes examined in the reference genome. |
| Subsystem | Subsystem Filter | Select a subsystem to limit the set of genes from the reference genome to those in that subsystem. |
| Options | Gene Display Options | Specify how the results are to be displayed or sorted. |
You can also modify the algorithm used to determine whether a gene is common or uncommon.
| Commonality | When searching for genes in common, this is the ratio of hits found to total genomes in the set. The default value of 0.80 represents 80%. In that case, if you have a set of 10 genomes, a gene will be considered common if it has hits in 8 genomes from the set. This value is ignored if there is an exclusion set and Use Statistical Algorithm is checked. |
| Use Statistical Algorithm | If you have an exclusion set (that is, if you are looking for genes which differentiate between two sets) and this box is checked, a second-order statistical computation will be used to determine whether or not a gene differentiates. If this box is not checked, a simple percentage calculation will be used. |
| Use Similarities | Normally bidirectional best hits are used to determine if a gene in the reference genome has a hit in a specified inclusion or exclusion genome. If this box is checked, similarities will be used instead. Because the similarity set is larger, this will result in a slower search. |
| Show Matching Genes | If this box is checked, a list of the genes matching the reference gene will be shown. This is useful if you want to know why a particular reference gene is considered common. |
| Cutoff | The similarity score to be used as a cutoff when computing bidirectional best hits or similarities. A lower number means that fewer hits will be found when processing a reference gene. |

