Gene Set Enrichment Analysis (GSEA) - PRO Feature

Content of this page:


Blast2GO includes the GSEA computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. GSEA considers experiments with genome wide expression profiles from samples belonging to two classes, labeled 1 or 2. Genes are ranked based on the correlation between their expression and the class distinction by using any suitable metric. Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout L or primarily found at the top or bottom.

For further details please refer to the GSEA publication: Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545– 15550.

For this analysis, the completion (but not exclusively) of the involved sequences with their annotations must be loaded in the application. This can either be the result of a Blast2GO annotation or the imported annotation by file (.annot), see Gene Ontology Annotation Section of this manual.

This functionality can be found under Analysis → Gene Set Enrichment Analysis (GSEA). A dialog screen appears (see image below). Ranked list of genes can be selected by uploading text files or ID-Value-List .b2g files containing the lists of sequence IDs and a statistical value for each one. A detailed description of each parameter is available by clicking the help icon next to the parameter.

Image gseawizard

Figure 1: GSEA Dialog

Click on the Run button to start the analysis. It may take a while depending on the number of permutations selected.


Once completed the results table will be shown in a new tab (see image below), where the adjusted p-values of each annotation above a given threshold will be shown. The main columns are:

ESNESFDRNominal p-value
Reflects the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genesBy normalizing the enrichment score, GSEA accounts for differences in gene set size and in correlations between gene sets and the expression datasetThe estimated probability that a gene set with a given NES represents a false positive findingEstimates the statistical significance of the enrichment score for a single gene set

For further details please refer to the GSEA User Guide.

Figure 2: GSEA result table

Using the context menu of the rows tagged with the Details tag It is possible to get more details about the GO term, including the enrichment statistics, and also create an ID-List with the core enrichment sequences for each GO term.

Sidebar Options

In the sidebar there are located all possible action that can be performed for this enrichment result, including two options for the visual display of the results:

  1. Make Enriched Graph: use this option to generate a representation on the GO DAG (see image below). Nodes are color-highlighted proportionally to their significance value. The user can choose which type of calculated p-value to use for highlighting and the threshold for filtering out nodes.

Figure 3: Enriched Graph

2. NES vs Significance Chart: this option generates a plot of p-values versus normalized enrichment scores, which provides a quick, visual way to grasp the number of enriched gene sets that are significant (see image below).

Figure 4: NES vs Significance Chart

3. ES Histogram Chart: this option generates an histogram of enrichment scores across gene sets, which provides a quick, visual way to grasp the number of enriched gene sets. (see image below).

Figure 5: ES Histogram Chart

4. Reduce to Most Specific: use this option to remove more general GO terms from the results and get only the most specific terms (with the lowest level in the GO DAG).

Additionally, like many others results in Blast2GO, It is possible to display the enrichment results in two different ways: the Treemap representation to compare the most enriched GO terms by their size and the WordCloud representation to summarise relevant GO terms in a fashionable way.