Time Course Expression Analysis - PRO Feature
Introduction
This tool is designed to perform time-course expression analysis of count data arising from RNA-seq technology. Based on the maSigPro program, this application allows the detection of genomic features (e.g. genes) with significant temporal expression changes and significant differences between experimental groups. The software package maSigPro, which belongs to the Bioconductor project, implements a two steps regression strategy to find genes for which there are significant expression profile differences in time course RNA-seq experiments.
Please cite maSigPro as:
Conesa A, Nueda MJ (2018). "maSigPro: Significant Gene Expression Profile Differences in Time Course Gene Expression Data." R package version 1.52.0, http://bioinfo.cipf.es/.
General Workflow
The workflow to be followed to perform a time course expression analysis is described in Figure 2.
Load Data
Go to File → Load → Load Count Table and select your .txt file containing the count table in tab-delimited format (Figure 3). It is also possible to create a Count Table within Blast2GO through the "Create Count Table" functionality (see Quantify Expression section).
Figure 3: Count Table File
The Count Table can be saved as 'CountTable' object (File → Save).
Notes:
- This application only accepts raw counts without any type of normalization.
- Replicates for each experimental condition are required.
Run Analysis
Go to rna-seq → Run Differential Expression Analysis and choose the ``Time Course Expression Analysis'' option. Here you can specify the following parameters, which are divided into three different sections: Preprocessing Data (Figure 4), Experimental Design (Figure 5) and Analysis Options (Figure 6).
Preprocessing Data Page
- Filter low count genes:
- CPM Filter: Establish a filter to exclude genes with low counts across libraries, as those genes may interfere with the subsequent statistical approximations. Filtering is performed on a count-per-million (CPM) basis to account for differences in library size between samples (e.g. a CPM of 1 corresponds to a count of 6 in a sample with 6 million reads).
- Samples reaching CPM Filter: Set a minimum number of samples in which the gene's CPM is above the filter level (is expressed). If this value is set to e.g. five, at least 5 of the samples have to be above the given CPM. The number of samples of the smallest group is usually taken (e.g. in an experiment that has two replicates for each condition (or group), a gene should be expressed in at least two samples). Set value to 0 if no filter is desired.
- Normalization procedure:
- Normalization Method: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:
- TMM: Weighted trimmed mean of M-values. In this method, weights are obtained from the delta method on Binomial Data (this method is recommended).
- RPKM: Reads Per Kilobase per Million mapped reads. This method corrects for gene length and the number of sequencing reads (gene length is required).
- Upper-quartile: 75% quantile for the counts for each library is used to calculate the scale factors for normalization.
- None: It is not applied to any normalization procedure.
- Feature Length File: For RPKM normalization load a tab-delimited file (or ID-Value object) with two columns containing the name and length of each gene or genomic feature.
- Normalization Method: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:
Figure 4: Preprocessing Data Page
Experimental Design Page
Experimental design file: Select your .txt file containing your experiment descriptors associated to each sample in tab-delimited format. As demonstrated in Figure 7, rows correspond to samples and columns to experimental descriptors. A column must contain the associated time points for each sample, and another column should show the assignment of samples to experimental groups. Make sure that the names in the first column of the experimental design table are exactly the same as the sample names in the count table header. If your experimental design file has fewer samples than count table, only the samples contained in this file will be analyzed.
Figure 7: Experimental Design File
Figure 5: Experimental Design Page
Analysis Options
- Design Type: Choose the design type to adjust the analysis.
- Single Series Time Course: Detects genes that show significant expression changes over time. You only have to select the time factor of your experimental design in ``Targets''.
- Multiple Series Time Course: Find genes with significant temporal expression changes and significant differences between experimental groups. You have to establish the time and experimental factors, and select the control condition of your experimental design in ``Targets''.
- Statistical Settings:
- Significance Level (Alfa): The level of FDR control used for variable selection in the stepwise regression.
- R-squared Cutoff: Cutoff value for the R-squared of the regression model.
- Visualization of Results:
- Number of Clusters: Establish a number of clusters to group genes by similar expression profiles.
- Clustering Method: Choose a clustering method for data partitioning.
- Hierarchical Clustering: Performs a hierarchical cluster analysis using a set of dissimilarities for the features being clustered.
- K-Means Clustering: Is intended to divide the points into K clusters such that the sum of squares of the points to the centers of the clusters assigned is minimized.
- Model-Based Clustering: The optimal model according to BIC for EM initialized by hierarchical clustering for Gaussian mixture models. This method computes an optimal number of clusters. Keep in mind that this method requires more time.
Figure 6: Analysis Options
Results
Once the input counts have been processed and analyzed via the ``Time Course Expression Analysis'' tool, a new tab is opened containing statistical results obtaining by the stepwise regression statistical test (Figure 8):
- P-value of the regression ANOVA.
- R-squared of the model.
- P-value of the regression coefficients of the selected variables.
- Tags: Indicate the list/s of significant genes in which the feature appears (R-squared ≥ R-squared Cutoff).
- Red tags: Lists of significant genes for each experimental group (only available in ``Multiple Series Time Course'').
- Blue tags: List of significant genes for each variable of the regression model.
Only the genes that have passed the established Significance Level are shown in the new tab. For further details please refer to the maSigPro User's Guide.
Figure 8: Table Viewer
Results can be saved as a TC Results object. Note that is not possible to perform the analysis on this object. For this purpose, you have to open the Count Table object. If you want to see both count table and results, go to the File Manager and open the two .b2g files together.
A result page will show a summary of the time course expression analysis results, including the cluster of features with similar expression profiles (Figure 9). Go to Side Panel → Result Summary in order to visualize the result summary and to export it in pdf.
Figure 9: Result Summary
Charts and Statistics
Different statistics charts can be generated for a global visualization of the results. These charts can be found under the Side Panel of the TimeCourse Results viewer.
- MDS Plot: Generates a two-dimensional scatterplot in which the distances represent the typical log2 fold changes between samples. You can select an experimental factor by which you want to color the MDS graphic.
- Venn Diagram: Diagram showing all possible logical relations between a finite collection of different feature sets (Figure 10(a)). You can choose between two types of Venn Diagram (``Pairwise'' or ``Triple''), and select the sets of significant genes to display.
- Expression Profile by Gene: Graph of gene expression profiles over time for a particular gene (Figure 10(b)). It is possible to see them by right-clicking on the chosen gene, and selecting the ``Show Expression Profile'' option.
- Experiment-wide Expression Profiles: Plot showing the expression level levels across samples for each cluster of genes (Figure 10(c)).
- Summary Expression Profiles: Plot showing the median level expression of each cluster of genes across time (Figure 10(d)).
Figure 10(a): Venn Diagram
Figure 10(b): Expression Profile by Gene
Figure 10(c): Experimental-wide Expression Profiles
Figure 10(d): Summary Expression Profiles