Blast2GO uses the Basic Local Alignment Search Tool (BLAST) to find sequences similar to your query set. Please, refer to http://www.ncbi.nlm.nih.gov/BLASTfor details on the BLAST function. Figures 2, 3 and 4, show the BLAST Configuration Dialog Window that controls the BLAST step.
BLAST in Blast2GO can basically be performed in five different fashions:
- CloudBlast. This is a cloud-based Blast2GO PRO Community Resource for massive sequence alignment tasks. It allows you to execute standard NCBI Blast+ searches directly from within Blast2GO PRO in a dedicated computing cloud. CloudBlast is a high-performance, secure and cost-optimized solution for your analysis. This is a blast service totally independent from the NCBI servers to provide fast and reliable sequence alignments. Please see Run Blast using CloudBLAST section for more information.
QBlast@NCBI. NCBI offers a public service that allows searching molecular sequence databases with the BLAST algorithm. The main advantages of making use of this service are its versatility and that no database maintenance is required. Therefore by selecting this option at Blast2GO no additional installations have to be done.
- AWS Blast. The NCBI provides via Amazon Web Services (AWS) a preconfigured machine image (AMI) which contains the latest BLAST+ release. One can access to the AMI's account through Blast2GO. The AMI's URL has to be provided to Blast2GO and the BLAST searches will run in the Amazon Cloud.
- Local BLAST against own database. It is possible to use BLAST+ executable to query a local/own database. At https://www.blast2go.com/make-own-database-and-blast and at the Make Blast Database section one can see how to prepare and blast locally an own fasta database.
QBlast at NCBI is the only feature available for Blast2GO Basic users.
The next figure shows the menu manner to select between NCBI-, local- BLAST as well as CloudBlast, AWS Blast or blasting against an own database.
Figure 1: Select between NCBI, Local or CloudBlast
Run BLAST at the NCBI
Blast Configuration Page
- Your e-mail address in case you are using the NCBI BLAST web service.
- BLAST program: The algorithm you want to use:
- blastp - Compares an amino acid query sequence against a protein sequence database.
- blastn (-task blastn) - Compares a nucleotide query sequence against a nucleotide sequence database.
- blastx - Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. Used to find potential translation products of an unknown nucleotide sequence
- tblastn - Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
- blastn (-task megablast)
- blastn (-task dc-megablast)
- BLAST DB: The name of the database to search in (eg. nr, swissprot, pdb). To see a list of possible DBs at NCBI seehttp://data.biobam.com/ncbi_blast_dbs_protein.pdf
- Taxonomy Filter: Search for Blast results only in the selected taxonomy.
- BLAST expect value: The statistical significance threshold for reporting matches against database sequences. If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Increasing the threshold shows less stringent matches.
- Number of BLAST hits: The number of alignments you want to achieve (0-100).
BLAST Description Annotator: The BDA finds the best possible description for a new sequence based on a given BLAST result.
Figure 2: Blast Configuration Page
Figure 3: Advanced Page
Figure 4: Save Results Page
Advanced Page (PRO Feature)
- Blast Parameters:
- Word size: One of the important parameters governing the sensitivity of BLAST searches is the length of the initial words. The word size is adjustable in blastn and can be reduced from the default value to increase sensitivity. This word size can also be increased to increase the search speed and limit the number of database hits.
- Low complexity filter: The BLAST programs employ the SEG algorithm to filter low complexity regions from proteins before executing a database search. Default is ON.
- Filter Options:
- HSP length cutoff: A Cutoff value for the minimal length of the first hsp of a balst hit, used to exclude hits with only small local alignments from the BLAST result. The given length corresponds to amino-acids or nucleotides depending the type of performed BLAST.
- HSP-Hit Coverage
- Filter by description: Filter-out Blast hits by description
Save Results Page
The results of the BLAST queries can also be directly saved to a file in different formats by selecting the corresponding check boxes at the BLAST Save Results Page. If the chosen file already exists, upcoming results will be appended. Choose a format type to additionally save your BLAST results.
- XML2: This is a new BLAST results provided by NCBI and can also be loaded into Blast2GO.
- XML: It is recommended to save your BLAST results as XML as this format is supported by the Blast2GO Load BLAST Results function.
- TXT: It saves the blast results of each sequence in text file format.
- HTML: For each sequence a file in htlm format will be saved.
Run BLAST using CloudBLAST (PRO Feature)
CloudBlast offers a highly optimized, self-sustained HPC solution to address a very specific need of the Blast2GO PRO community.
CloudBlast is a BLAST service totally independent from the NCBI servers to provide fast and reliable sequence alignments. It consists of a high performance computing cluster dedicated exclusively to Blast searches.
All Blast2GO PRO subscriptions include "ComputationUnits" to make use of this resource and allows you to perform blast searches for tens of thousands of sequences within a few days against a large collection of protein databases. Each sequence alignment performed in the system consumes a certain amount of computation time depending on the sequence length and the blast algorithm (blastx, blastp) and parameters used. The smaller the database you blast against the more sequences you can analyse with 6.000.000 ComputationUnits (see Help Menu section to know how to monitor the ComputationUnits). This means that e.g. if you blast against the vertebrate NR-subset you would be able to blast approx. one million (1.000.000) sequences. If you decide to blast against the NR database, the largest protein database available, it should allow you to blast approx. 80.000 sequences (with an average length of 800nt per sequence).
Figure 5: CloudBlast Configuration Page
Run BLAST Locally
With Local BLAST you can blast the sequences against own database. Blast2GO allows to create a Blast database from a FASTA file with the option "Make Blast Database'' (see Make Blast Database section). Download and format your database and choose the corresponding folder see Figure 6. Databases have to be formatted for NCBI Blast+.
The main parameters in the Local BLAST Configuration page are very similar to the ones in NCBI and CloudBlast. The main difference is when choosing the database as Blast2GO is expecting a .pal' file or .p*. On the Advanced Page at the "Run Parameters'' it is possible to select the number of threads to be used. This field has not to be set up as Blast2GO detects the number of threads in the computer. The Advanced Page section provides detailed description of each parameter. As in CloudBlast the BLAST results will be saved in XML file format.
Figure 6: Local Blast Configuration Page
Show BLAST Results
As the BLAST search progresses, sequences with successful BLAST results change their color on the Main Sequence Table from white to orange and the BLAST result related columns will be filled. In case no results could be retrieved for a given sequence, this row will turn dark-red.
With a mouse the right click on a sequence, the Single Sequence Menu will be displayed and it is possible to see the BLAST results for each sequence individually. Show BLAST Results (Figure 7) will generate a tab in the Results containing information on the results of the similarity search of the selected sequence. For each of the obtained hits, the following information is given: Hit id and definition Gene name assigned to the hit by its accession e-value of the alignment Alignment length of the longest hsp Positive matches of the longest hsp Hsp similarity of hit: Number of hsps mapped GO-Terms with its evidence code UniProt codes of the hit sequences.
Figure 7: Show BLAST Results
Figure 8: Individual BLAST Result Table View
Figure 9: Individual BLAST Result in Alignment View
Different BLAST statistics charts (Figure 10, 11 and 12) can be generated for a global visualization of the results. These charts provide a general view of the similarity of the query set with the selected databases and can be use to choose cut-off levels for the e-value, similarity and annotation threshold parameters at the annotation step.
Additionally a BLAST hit species distribution chart is available. To generate the BLAST Statistics charts just go to the arrow next to the "Chart'' icon and select the statistics to be displayed (see Figure 10).
Figure 10: Blast Statistics
- E-Value Distribution: This chart plots the distribution of E-values for all selected BLAST hits. It is useful to evaluate the success of the alignment for a given sequence database and help to adjust the E-Value cutoff in the annotation step.
- Similarity Distribution: This chart displays the distribution of all calculated sequence similarities (percentages), shows the overall performance of the alignments and helps to adjust the annotation score in the annotation step.
- Species Distribution: This chart gives a listing of the different species to which most sequences were aligned during the BLAST step.
- Top-Hit Species Distribution: Bar chart showing the species distribution of all Top-Blast hits.
- Hit Distribution: This chart shows a distribution of the number of hits for the blasted sequences in a data-set.
- Hsp Distribution: This bar chart shows the distribution of hsps per hit.
- Hsp/Seq Distribution: This chart shows a distribution of percentages which represents the coverage between the hsps and their corresponding sequences.
- Hsp/Hit Distribution: Same as above but for hits instead of sequences.
Figure 11: Similarity Distribution
Figure 12: Species Distribution
Figure 13: E-Value Distribution
Load BLAST results
If a BLAST result is already available in XML format, it can be directly loaded into Blast2GO by using Load > Load Blast Results in the File menu. You can choose here to import the Blast results as XML file or the new XML2/JSON format. These new formats can be loaded as Zip file.
In the Load Blast Results dialog a whole directory containing a collection of BLAST XML files or a single XML file can be selected Figure 13. The BLAST results will be added to your current Blast2GO session.
Blast2GO PRO also allows the input of TimeLogic DeCypher Blast results.
Figure 14: Load / Import Blast Results
Make Blast Database
This option allows to create a BLAST database from the sequence of any /blast2go project or from a FASTA file (Figure 14). This option can be found in the arrow next to the blast icon.
- Current project: Blast2GO will use the loaded sequences to create the Blast database. Note: If the resulting database will be used for further GO mapping a proper ID and description line with "GO mappable'' information is needed.
- FASTA file: This option allows to choose own FASTA file. The FASTA file have to be correctly formatted for NCBI Blast+.
- Output Folder: Select the directory where to save the created Blast database.
- Blast Database Name: Provide a name for the Blast database
- Taxonomy Options:
- Taxonomy ID: Introduce the NCBI species ID.
- Mapping file: If the sequences come from different species, it is possible to generate a text file with the sequence names and its species id to map to the corresponding sequence in the FASTA file.
TR|A0A022PMT6|ERYGU 4155 TR|A0A022PMU0|ERYGU 4155 TR|A0A059BJ72|EUCGR 71139 TR|A0A059BJ72|EUCGR 71139 TR|A0A061FDU3|THECC 3641 TR|A0A067DJ79|CITSI 2711
Figure 15: Make Blast Database
Other BLAST Functions
- Remove Blast Results: This option will remove the BLAST results from the selected sequences.
- Run Blast-Descriptor-Annotator (BDA): This will run the BDA algorithm. For further details, please see Blast Configuration Page section.
- Recover original Best-Blast-Hit Description: When this option is executed the sequence description column on the Main Sequence Table will contain the top blast hit description and not the one from the BDA.