1. Generals
    1. What is agriGO?
    2. What is GO?
    3. What are the major updates in agriGO v2.0?
    4. What are the unique features of agriGO compare with other GO webserver/database?
    5. How users evaluate agriGO?
  2. Analysis tools
    1. What is SEA analysis?
    2. Which statistics method should I choose in SEA tool?
    3. What is PAGE analysis?
    4. What is BLAST4ID?
    5. How to use comparsion tool?
    6. How do you use custom DAG drawer?
    7. How do you use custom Scatter Plots?
    8. What does PVD mean?
    9. How can I choose the appropriate reference from so many candidate choices?
    10. Which tool should I choose?
    11. Why agriGO recommend doing multi-test adjustment for the p-value?
    12. What does flash bar chart mean and how to use it?
    13. Why graphical/chart image does not display on my PC?
    14. We found that the enriched functional catergories changed when we used different numbers for Minimum number of mapping entries (Advanced options), for example, 1 or 5. Based on what described online, the list coming from 1 should include the one from 5, correct? However, some categories in the list from 5 are not included in the list from 1. Although P values are the same in both cases, the corrected P values are different. Do you have any comments on that? What should we do?
    15. I used 20000 probesets at 4 different time points to the PAGE analysis, however no enriched GO terms were detected. Why?
  3. Datatype, update and GO annotation in agriGO
    1. How many datatypes are supported by agriGO?
    2. How agriGO obtains its data source?
    3. How often does agriGO update?
    4. Can I check results from the old version using the new agriGO?
    5. How to make agriGO add new species/datatype?
    6. Could you please explain how you define GO terms for each probe?
    7. Given any GO term, its relative abundance (the ratio of contained probes to total probes) in Affymetrix array or soybean genome is the same/similar?
  4. Miscellaneous questions
    1. What is your lab focusing?
What is agriGO?
The agriGO is designed to automate the job for experimental biologists to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers (with or without expression information) and it is also a GO-related database. The agriGO specially focus on agricultural species.
 
What is GO?
"The Gene Ontology (GO) project provides a controlled vocabulary to describe gene and gene product attributes in any organism. The GO project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner. There are three separate aspects to this effort: first, we write and maintain the ontologies themselves; second, we make cross-links between the ontologies and the genes and gene products in the collaborating databases, and third, we develop tools that facilitate the creation, maintainence and use of ontologies." Definition from http://www.geneontology.org/
 
What are the major updates in agriGO v2.0?
The updated version, agriGO v2.0, has a large number of species and datatypes available, which have been classified into several groups. Singular Enrichment Analysis (SEA) as the central tool has been improved by the addition of a batch analysis to combat multiple input datasets simultaneously. Custom tools, including custom SEA and SEACOMPARE were inherited from agriGO and some new tools, such as DAG tree and Scatter Plot drawer, were placed together and highlighted on the navigation bar.
 
What are the unique features of agriGO compare with other GO webserver/database?
The agriGO provides heavy support to agricultural species. Not only limited to SEA analysis, GSEA which is achieved using PAGE method is also available. Furthermore we have BLAST4ID tool for ID transfer or annotation. And search as well as download function is accessible. The agriGO can give out rich outputs like graphical result, bar chart result and hierical tree which composing a comprehensive understanding of biological meaning of user's input data.
 
How users evaluate agriGO?
The comments from Faculty of 1000 biology:
This paper describes a new bioinformatic resource that will be of great use to any plant scientist carrying out genomic studies.
agriGO provides an intuitive and relatively user-easy platform for carrying out Gene Ontology (GO) analyses of genomic data from over 30 plant species. The tools provided include Singular Enrichment Analysis (SEA), which analyses a simple gene list for GO enrichment, and Parametric Analysis of Gene Set Enrichment (PAGE), which takes expression levels into account when analyzing GO enrichment. The platform provides publication quality outputs.
 
What is SEA analysis?
SEA analysis means Singular enrichment analysis which is tranditional but widely used. SEA analysis is designed to identify enriched Gene Ontology (GO) terms in a list of microarray probe sets or gene identifiers. Finding enriched GO terms corresponds to finding enriched biological facts, and term enrichment level is judged by comparing query list to a background population from which the query list is derived.
 
Which statistics method should I choose in SEA tool?
When the input list is compared with the previously computed background, or is a subset of reference list, choose hypergeometric or fisher, for latter only when your query number is quite small. When the input list has few or no intersections with the reference list, the Chi-square tests are more appropriate.
 
What is PAGE analysis?
PAGE is Parametric Analysis of Gene Set Enrichment [Kim et. 2005 BMC Bioinfomatics]. PAGE method is using Central Limit Theorem in statistics, this method is simple and efficient. Different to SEA, it takes expression level into account, and can deal with a long list of genes/probesets. PAGE use a two-tailed test to count Z score, and the caculation of p-value using R software
 
What is BLAST4ID?
The BLAST4ID tool is not an analysis tool, but an associated one used mainly for two purposes: 1. Transfer your IDs which are not available to agriGO to available ones, 2. use blast search to annotate your sequences with GO.
 
How to use comparsion tool??
A comparsion tool for results from SEA is developed as one selective tool. User can upload a list of session IDs to do the comparison job. For PAGE analysis, the comparison function is already integreted to the normal output.
 
How do you use custom DAG drawer?
To draw the custom DAG tree, users need to input two columns of data. The first column is the GO terms of interest and the other is the p-values calculated by any statistical algorithm. The cut-off options can filter the GO terms above a given value.
 
How do you use custom Scatter Plots?
The input format and the cutoff meaning are the same as those for the custom DAG tree. The similarity option is used to filter GO terms whose relationships with other GO terms are always above the given value.
 
What does PVD mean?
The p-value distribution (PVD) is drawn for the significant GO terms between the current input and random gene lists. The shape of the diagram can help users distinguish the significant GO terms owing to background bias or integrity level.
 
How can I choose the appropriate reference from so many candidate choices?
The manual sorting method was used to rank the datatypes based on the breadth of GO terms, number of annotated genes, sources and versions. The most recommended type ID is the default, but users are free to choose others.
 
Which tool should I choose?
It will depend on what data you have. If you only have a list of identifiers or only interested about them, SEA will be your choice. And if you like take expression data into count and would like compare several dateset then you may try PAGE. The BLAST4ID is only an associated tool, use it if you really need it.
 
Why agriGO recommend doing multi-test adjustment for the p-value?
In statistics, the multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously. P-value is used for control the type I error rate in one statistical test. Errors in inference, including confidence intervals that fail to include their corresponding population parameters, or hypothesis tests that incorrectly reject the null hypothesis, are more likely to occur when one considers the family as a whole. Several statistical techniques have been developed to prevent this from happening, allowing significance levels for single and multiple comparisons to be directly compared. These techniques generally require a stronger level of evidence to be observed in order for an individual comparison to be deemed "significant", so as to compensate for the number of inferences being made.
 
What does flash bar chart mean and how to use it?
In SEA analysis result, bar in the chart means percentage of genes. The input one represents the percentage of number of genes mapping to the very GO term against the number of all the gene in the input list. And background/reference bar is similar. In PAGE analysis, the bar may mean Z-score or mean value. To use the bar chart is simly compare the height of bars. In practice, custom selection and adjustment of bar chart will benefit you in generating better appearance.
 
Why graphical/chart image does not display on my PC?
The bar chart result need flash player to browse correctly. Also, please check whether your browser is blocking flash since such setting is possible used in some ad. blocker software. And you may need different tool to display different format graphical result, for example: Adobe reader, SVG brower. Contact me if you install related tool but still can not see the results.
 
We found that the enriched functional catergories changed when we used different numbers for Minimum number of mapping entries (Advanced options), for example, 1 or 5. Based on what described online, the list coming from 1 should include the one from 5, correct? However, some categories in the list from 5 are not included in the list from 1. Although P values are the same in both cases, the corrected P values are different. Do you have any comments on that? What should we do?
The option "Minimum number of mapping entries " means that at least N genes can be mapped to one term, then the term will be used in further analysis, in other word, the term is available. In statistics calculation step, each term is computated independently, and this is the reason why the terms have same P-value in two analysis processes. However, since you use different "Minimum number of mapping entries " paramenter, the total number of terms will be different in the two processes, which means the times of statistics calulation is different (aforementioned, one term one calulation). And the times of statistics calulation is the key factor in multiple test adjustment. The default statistics method should be OK if you use pre-computed background provided by agriGO. And the method of multiple test adjustment will be chose by your own judgement. But here is a tip: these methods have different stringency, for example the default BY method is a strict one. And If you like more terms then use looser method, otherwise try strict ones.
 
I used 20000 probesets at 4 different time points to the PAGE analysis, however no enriched GO terms were detected. Why?
The reason may be that too many detected GO terms were detected and then were performed to multiple-test adjustment, and the adjusted p-values were higher than the cutoff. You can use no adjustment, or set higher cutoff, or use GO slim, or set higher 'Minimum number of mapping entries'.
 
How many datatypes are supported by agriGO?
We currently support 394 species, including 865 datatypes. Please check the data statistics page for detailed information. We will continue adding more species and datatypes.
 
How agriGO obtains its data source?
Raw GO annotation data is generated using BLAST, Pfam, InterproScan by agriGO or obtained from B2G-FAR center or from Gene Ontology. Arabidopsis genome data is from TAIR. Rice TIGR genome data is from Rice Genome Annotation Project. Rice KOME data is from KOME database. Rice Gramene data is from Gramene center. Populus genome data is collected from JGI. Soybean and Sorghum genome data is compiled from phytozome. Grape genome data is compiled from Genoscope. Medicago genome data is from Medicago truncatula sequencing resources. Maize genome data is from MaizeSequence.org. Castor bean genome data is from Castor Bean Genome Database. Brachypodium distachyon genome data is from Ensembl. Bovine genome data is from Bovine Genome Database. Silkworm genome data is from SilkDB. M. grisea genome data is from Magnaporthe grisea Database. affymetrixmetrix CSV files and array sequences are from NetAffx.
 
How often does agriGO update?
Normally we will update our database every 3 months, but if we will update agriGO if some important data source is newly available. Improvement and updating to agriGO tools are irregulated.
 
Can I check results from the old version using the new agriGO?
If you select the same species and datatypes between agriGO and agriGO v2.0, then you can find the same results. We maintained the same background files from previous versions for previous users.
 
How to make agriGO add new species/datatype?
Several species have be added to agriGO upon users' requests which are not limited to agricultural organisms. User can contact the agriGO administrator by email (ttian@cau.edu.cn) to discuss more details, and we will finish the addition within 24 hours.
 
Could you please explain how you define GO terms for each probe?
Here is the thing: the affy microarray and soybean genome background are both annotated by GO using InterproScan+BLAST (from B2G-FAR). Your submitted probes will be annotated by GO using pre-computed probe=>GO dataset, no matter which background you choose. The key factor for the difference between your analysis is come from the distant GO distributions in two backgrounds, which may caused by the number of identities, annotation process, or the translation of probeset sequence to protein sequence as GO annotation are normally performed to protein sequence. As there are more identities in soybean genome background, this may be the reason of more enriched terms.
 
Given any GO term, its relative abundance (the ratio of contained probes to total probes) in Affymetrix array or soybean genome is the same/similar?
No. It should be, but it may be not. In other word, I can not guarantee this. Here I can post some reasons 1. The total number of entities in two data set is different. The microarry is less, and I do not know whether there are bias in the process of array designing. 2. The array is designed based on EST or flcDNA, but the genome appear later which is used to predict for genes and proteins. 3. As probeset sequences in array should be translated to protein sequence for GO annotation, and they are come from EST, it may produce difference. Briefly, I can not say the abundance is same or similar. But if you implement microarray and find interesting a batch of genes, use genome as background is reasonable. Anyhow the mission of array is to minitor genes change, it works, isn't it?
 
What is your lab focusing?

Zhen Su's Lab: We employ bioinformatics, comparative genomics, functional genomics, and systems biology approaches
to study some important issues in plant biology. Other databases or web service provided by our lab including:
agriGO v1.2
EasyGO (Gene Ontology enrichment analysis tool)
PlantGSEA (Plant GeneSet Enrichment Analysis Toolkit)