Quantitative Analysis

Content of this page:

As a Data Mining tool, Blast2GO provides various ways for the joint analysis of groups of annotated sequences.

Descriptive analysis. Combined Graph Function

Blast2GO generates combined graphs where the combined annotation of a group of sequences is visualized together. This can be used to study the joined biological meaning of a set of sequences. Combined graphs are a good alternative to enrichment analysis where there is no reference set to be considered or the number of involved sequences is low. This function is available under the arrow next to the "Graph'' icon. The next images show the Combined Graph Drawing Configuration Dialog, where the following parameters are available: 

  • Graph Title
  • GO Categories

For each Gene Ontology category, a graph will be displayed. Blast2GO allows to extract information from the graph nodes such as tooltip (Figure 4), create a subgraph from that specific GO, create an Id list of the sequences that have been annotated with that particular GO (Figure 5). The generated Id list can then be used within Blast2GO in the select by sequences feature (see Select Sequences Section).



Image b2ggraphs_2

Figure 1: Combined graph visualization

Image comgraphgui_2

Figure 2: Combined Graph Drawing Configuration Dialog allows to provide a graph title header and to choose between the different GO categories


Image combinedgraph

Figure 3: Molecular Function Combined Graph


Image graphtooltip

Figure 4: Graph Node Tooltip


Image nodemanipulation

Figure 5: Extract Node Information

Graph Side Panel

The generated combined graph is interactive and its parameters can be modified from the side panel.

  • View. This section controls the graph visualization within its area.
    • Zoom
    • Collapse All: The nodes will collapse and only the root will be visualized.
    • Expand All: The nodes will expand to the original graph visualization.
    • Re-Layout: The whole graph will be re-scaled to adjust to the visualization area.
  • Search. Allows to search for GO IDs/ Terms/ Description in the Combined Graph.
  • Node Info. This parameter controls the information shown at a node. Possible values are:
    • GO ID: If checked the GO ID will be included in the node.
    • GO Name: The GO Names are shown in the node.
    • GO Description: When checked the GO Description will be included in the node.
    • Nodescore: The node score will be shown in the node.
    • Sequence Names: The names of the sequences annotated at each GO are included in the node. The limit number of names to be displayed is 15.
    • Sequences: The number of sequences annotated with that particular GO will be displayed in the node.
  • Layout.
    • Edge Labels: When checked the labels on the edges will be shown.
    • Expand/Collapse Icon: If checked the ions that represent expand/collapse on the node are displayed.
    • Only ``is a'' Relations: Only the is a relation between nodes will be displayed if the box is checked.
    • Color
      • Ontology: All nodes will be colored according to the ontology category, Biological Process - green; Molecular Function - blue; Cellular Component - yellow.
      • White: The nodes will turn white.
      • By Nodescore: A Score is computed at each node according to the formula:

        where seq is the number of different sequences annotated at a child GO term and dist the distance to the node of the child. GO term Coloring by Score will highlight areas of high annotation density.

      • By Sequence Count: Node color intensity will be proportional to the number of contributing sequences at the node.
  • Options.
    • Sequence Filter: The minimal number of sequences a GO node must have assigned, to be displayed. This filter is used to control the number of nodes present in the graph. It is recommended to start the analysis with a high number that, depending on the number of total sequences, is expected to overload the graph. Depending on the result adjust this value until you obtain a satisfactory graph. Start with 10% of your total number of sequences.
    • Nodescore Filter:
    • Score alpha. The value for parameter alpha in the Score formula Node Score Filter. Only nodes with a Score value higher than the Filter will be shown. Use this parameter to thin out the GO-DAG for low informative nodes.
    • Restore Defaults: All filters will be set to the default values.
  • Charts. (see next section)
  • Save as. The information present in a Combined Graph can be saved as an image (.png) or in table format. This will generate a .txt file where all information related to each node of the plotted Graph is provided in different columns.
  • Overview. Provides a radar-like view of the graph, which allows adjusting the visible window.
  • Open With. Open the graph information as TreeMap or WordCloud (see following sections).


Image graphsidepanel

Figure 6: Combined Graph Side Panel

Charts

Analysis of GO Term associations in a set of sequences can also be done by Pie/Bar Charts. For this analysis, a Combined Graph must have been generated first. Once the graph is visible in the GO Graph panel you can find several icons to visualize the 4 different types of charts.

Four possibilities are available:

  1. Sequence distribution by GO level (Pie-Chart): This pie chart represents the number of sequences for each Gene Ontology term for a given level. See Figure 8.
  2. Sequences per GO terms (Multilevel Pie): This function generates a Pie with the lowest node per branch of the DAG that fulfils the filter condition., e.g. will find all the lowest nodes with the given number of sequences or Score value and will plot them jointly in a Pie representation. See Figure 9.
  3. Top 50 GO terms (Bar-Chart): A bar chart representing the GO terms according to the number of annotated sequences. See Figure 10.
  4. Sequence distribution by GO level (Bar-Chart): This bar chart represents the number of sequences for each Gene Ontology term for a given level. See Figure 11.

When any of these functions are called, a table of node counts is generated and displayed in the statistics tab.




Image graphcharts

Figure 7: Combined Graph Pie and Bar-Charts

Image graph_level_pie_chart

Figure 8: Sequence distribution by GO level: Pie Chart


Image graph_multi_level_pie_chart

Figure 9: Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)



Image graph_top50_bar_chart

Figure 10: Top 50 GO terms



Image graph_level_bar_chart

Figure 11: Sequence distribution by GO level: Bar Chart


WordCloud

A WordCloud is a visual representation for a list of labels. The importance of words, here GO terms, is represented by its font size. The font size depend on either the sequence count or the NodeScore of each GO term. The list of words can be limited to a specific Gene Ontology category (BP, CC or MF). The coloring is random. Several options to change the graphical appearance are available like the number of words, the orientation and shape of the cloud as well as the color scheme.


Figure 12: Convert Graph to Word Cloud

TreeMap

The TreeMap viewer allows to visualize graphs (hierarchical, tree-structured data in general) as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. The size of each rectangle represents the number of sequences associated to a given GO term or a GO's NodeScore.



Image treemap

Figure 13: A TreeMap representing a Gene Ontology Graph.
The size of the rectangles represents the number of sequences or the NodeScore of each GO term. 


Coloured GO Graphs from a text file

We can generate a GO graph from a text (.txt) file which contains a list of GOs and the desired colour for each of them. It is also possible to label groups of GOs with the same name. Figure 15 shows an example that was created introducing the following text file:

GO:0000003    6    Group A
GO:0040007    8    Group B
GO:0050896    1    Group B

The text file has to follow a simple structure, to be processed correctly. It may contain from 2 to 3 columns in each line. The first column has to contain a GO, the second a number (0.0 to ) and the optional third column contains a text that will be written into the octagon of the corresponding GO. The columns must be separated with a tabulator character.
According to the example above Group B has two GO IDs that contain a different values. It is also possible to differentiate these GO IDs by colouring according to their values. In order to colour the octagon according to the value you should select the gradient colour in the next page on the colour graph configuration window (see Figure 16).

Image colourconfigwindow

Figure 14: Colour Configuration Window

Image colorgraph_1

Figure 15: Coloured GO Graph by Group


Image colorgraph_3

Figure 16: Coloured GO Graph by Group value

Image gradient

Figure 17: Select Colour to differentiate values within the same group.


Make GO Graph

The "Make GO Graph'' function allows visualizing any set of GO terms/Ids.


Image makesinglegraph

Figure 18: Make GO Graph

Image makeGOgraph

Figure 19: Make GO ID Graph