User manual, GOTaxExplorer, Version 1.0
The "genes" and the "completed_genomes" databases are constructed and updated
with a set of JavaTM programs. All of these programs use a
configuration file for determining database details and the data files. The
possible parameters are:
- url
The database uniform resource locator (url).
- user
The username for the database.
- password
The password for the database access.
- driver
Determines the database driver class, this class has to be
in the classpath, default:
com.mysql.jdbc.Driver.
- bp-graph
The file with the WilmaScope graph for biological
process, the absolute path is required, default: ./bp.xwg.
- mf-graph
The file with the WilmaScope graph for molecular
function, the absolute path is required, default: ./mf.xwg.
- cc-graph
The file with the WilmaScope graph for cellular process,
the absolute path is required, default: ./cc.xwg.
- tax-graph
The file with the WilmaScope graph for the taxonomy,
the absolute path is required, default: ./tax.xwg.
- bp-tree
The file with the tree representation for biological
process, the absolute path is required, default: ./bp.root.
- mf-tree
The file with the tree representation for molecular
function, the absolute path is required, default: ./mf.root.
- cc-tree
The file with the tree representation for cellular
process, the absolute path is required, default: ./cc.root.
- tax-tree
The file with the tree representation for the taxonomy,
the absolute path is required, default: ./tax.root.
- completed
The absolute path to the file with the list of
completely sequenced genomes.
- term
The absolute path to the file with the GO term definitions.
The file should be formatted according to the syntax in the term.txt file
from the monthly GO snapshot.
- term2term
The absolute path to the file with the GO edges. The
file should be formatted according to the syntax in the term2term.txt file
from the monthly GO snapshot.
- graph-path
The absolute path to the file with the contents of
the graph_path table. The file should be formatted according to the syntax
in the graph_path.txt file from the monthly GO snapshot.
- pfam
The absolute path to the file with the Pfam entries. The
file needs to be in Pfam format.
- pfam2go
The absolute path to the file with the mappings from
Pfam to GO. The file should be formatted according to the syntax in the
pfam2go file available from the GO website.
- pfam-domains
The absolute path to the file with details on Pfam
hits for Swiss-Prot proteins. The file should be formatted according to the
syntax in the swisspfam file from the Pfam website.
- smart
The absolute path to the InterPro XML file.
- smart2go
The absolute path to the file with the mappings from
SMART to GO. The file should be formatted according to the syntax in the
smart2go file available from the GO website.
- taxa
The absolute path to file with the node and edge
definitions for the taxonomic tree. The file should be formatted according to
the syntax in the nodes.dmp file from the NCBI Taxonomy.
- taxa-names
The absolute path to file with the names of
the nodes in the taxonomic tree. The file should be formatted according to
the syntax in the names.dmp file from the NCBI Taxonomy.
- db
The name of the gene product database that is to be added. The
name of this database serves as key for the absolute path to the
data file.
All of these options can also be used as command line parameters that override
the configuration file. The same exceptions as for GOTaxExplorer apply here. The
syntax of the configuration file is the same as for GOTaxExplorer files (see
Section ).
There is a separate update program for every source database to allow for an
incremental update after the sources have been updated. The following programs
are available:
- BuildGraphs
Builds the WilamScope graphs for the
three ontologies and the taxonomy.
- BuildTrees
Builds the tree representations for the
three ontologies and the taxonomy.
- CalculateGOSimilarities
Calculates the GO term
probabilities and
2#2,
3#3,
and
4#4 for all possible GO term pairs.
Similarities that are zero are not written to the database.
- CreateCompletedGenomesDB
Uses a file with the NCBI
taxonomy accession numbers of completely sequenced species to
generate the "completed_genomes" database.
- ParseGo
Parses the GO data files and fills the GO
tables in the "genes" database.
- ParsePfam2Go
Parses the "pfam2go" file and fills the
PFAM2GO table in the "genes" database.
- ParsePfam
Parses the Pfam data file and fills the
Pfam table in the "genes" database.
- ParsePfamDomains
Parses the "swisspfam" file and fills the
PFAM_HIT2GENE_REGION table in the "genes" database.
- ParseSmart2Go
Parses the "smart2go" file and fills the
SMART2GO table in the "genes" database.
- ParseTaxonomy
Parses the Taxonomy data files and fills the
Taxonomy tables in the "genes" database.
- ParseUniProt
Parses the UniProt data files and fills the
GENE tables in the "genes" database.
- UpdateCompletedGenomesDB
Adds data from new species
to the "completed_genomes" database.
- UpdateGo
Updates the GO tables in the "genes" database.
- UpdatePfam2Go
Updates the PFAM2GO table in the "genes"
database.
- UpdatePfam
Updates the Pfam table in the "genes" database.
- UpdatePfamDomains
Updates the PFAM_HIT2GENE_REGION table
in
the "genes" database.
- UpdateSmart2Go
Updates the the SMART2GO table in the
"genes" database.
- UpdateTaxonomy
Updates the Taxonomy tables in the "genes"
database.
- UpdateUniProt
Updates the GENE tables in the "genes"
database.