2. Starting GenomeComp
Chapter Contents
2.1 Running GenomeComp
2.2 Configuration Window
2.3 Main Window
2.3.1 Overview
2.3.1.1 Color representation
2.3.1.2 Fixed length and color
2.3.1.3 Miscellaneous
2.3.1.3.1 Figure scale
2.3.1.3.2 Length filter
2.3.1.3.3 Island color
2.3.1.3.4 ORFs color
2.3.2 The File Menu
2.3.2.1 Reset
2.3.2.2 Start
2.3.2.3 Save setting
2.3.2.4 Load DEFAULT
2.3.2.5 Save
2.3.2.6 Exit
2.3.3 The Options Menu
2.3.3.1 Configure
2.3.3.2 Compare type
2.3.3.2.1 Self-compare
2.3.3.2.2 Two sequences
2.3.3.2.3 Three sequences
2.3.3.3 Input format
2.3.3.3.1 Fasta format sequence
2.3.3.3.2 Genbank format sequence
2.3.3.3.3 EMBL format sequence
2.3.3.3.4 BLAST output file
2.3.3.4 Anchor style
2.3.3.4.1 Left
2.3.3.4.2 Center
2.3.3.4.3 Right
2.3.3.4.4 Custom
2.3.4 The Search Menu
2.3.4.1 Find ORF
2.3.4.2 Find Next
2.3.4.3 Go to Position
2.3.5 The List Menu
2.3.5.1 Length setting
2.3.5.1.1 All length
2.3.5.1.2 Above 50 bp
2.3.5.1.3 Above 1 Kbp
2.3.5.1.4 Above 5Kbp
2.3.5.1.5 Custom
2.3.5.2 Specific regions
2.3.6 The Hide Menu
2.3.6.1 Nothing
2.3.6.2 Control panel (left)
2.3.6.3 Display panel (right)
2.3.7 The View Menu
2.3.7.1 Title
2.3.7.2 ORFs
2.3.7.3 ORFs text
2.3.7.4 Ruler
2.3.7.5 Ruler text
2.3.7.6 Island bar
2.3.8 The Help Menu
2.3.8.1 Topic
2.3.8.2 About
2.4 Some Pop-up Windows
2.4.1 Confirm sequence name and length, Select program and parameters Window
2.4.2 The Detail Information Window
2.4.3 Specific regions list Window
2.4.4 The Save Window
On Unix/Linux systems the
way to run GenomeComp is to execute the main program called "GenomeComp1.3_XXX"
with it's path on command line like this:
> /your_install_path/GenomeComp1.3/GenomeComp1.3_XXX
An alternative way to run
the source code of this program if the main program doesn't work is
like this:
> perl /your_install_path/GenomeComp1.3/GenomeComp1.3
If it still can not work,
you might need to see the previous part of
this manual carefully and make sure you have every system
requirements done! For binaries users please make sure you have
downloaded the properly package suit to your operation system.
On Windows systems with
Perl and Tk module installed correctly GenomeComp can be started by
double clicking on the "GenomeComp1.3.pl" icon or its shortcut
anywhere.
If all goes well you will
be presented with the GenomeComp Configuration Window
and Main Window.
The above configuration
window will be presented first when running GenomeComp, which will let
you locate some external programs in your system and define the working
directory and project name for the following comparison.
GenomeComp needs to run
the external program BLAST (Altschul et al., 1997) to perform
the comparison if the Input format is FASTA,
GENBANK or EMBL. The recent version of stand-alone BLAST program (binaries
for different systems) are available on the anonymous
Ftp server of NCBI (all of these three program 'blastall', 'megablast'
and 'formatdb' are in this package). If the entry is blank which means
GenomeComp could not automatically find the program in your system path,
so you will have to locate them manually by clicking the "Browse..."
button.
The "Working environment"
options in the configuration window are very useful because it defines
the name and location of some temporary files of GenomeComp, and also
the default name and path for saving the result files.
Click "OK" button
in the window when you have finished your configuration, then those
information will be saved and this configuration window will be hidden
automatically.
On top of the main window
is the menu bar for GenomeComp (described later in this page). The main
window can be divided into two main parts, control panel and display
panel. The former is the left part of the main window which containing
some custom configuration options, and the latter is the right part.
The display panel also contains two components, the top canvas will
show the dynamic comparison result and the bottom message window will
display the brief information of selected items in the canvas.
The top of the control
panel are three input entries for users to locate the input files. The
number of the valid entries will alter automatically according to the
options of the Compare type and Input
format from the options menu.
The middle of the control
panel are three groups of Figure options in a sunken frame, these
settings are very important for the dynamic comparison result shows
in the top canvas of display panel.
-
Color
representation : It allows users to restrict different colors
to different length ranges of comparison matches. Both of the length
ranges and the representation colors can be user defined. Directly
change the value in the entries to custom the length ranges or click
on the "Set..." buttons to select other color.
-
Fixed
length and color : It allows users to define certain colors
for some fixed lengths of matches (not a range). It's might be useful
when users are especially interested in some certain lengths of
matches.
-
Miscellaneous
: It contains some unclassified but important options.
-
Figure
scale : It is a scale bar which can be adjusted by
mouse dragging to set the value from 5 to 500. Set the scale small
for detailed graphical comparison result and otherwise for global
view. If the scale is larger than 20, the names of ORFs will not
be displayed. The comparison figure in the
canvas is easily zoomable by this scale now!
-
Length
filter : It is a grooved frame for setting the range
of length for matches to be displayed in the canvas. The first
entry is the lower limit and the second for upper limit (null
means infinate). After the comparison figure
has been displayed in canvas, users can change the limitation
value and then take effect the new setting only by click the 'Apply'
button.
-
Island
color : Click the color region to set the color for
representing those unmatched regions in the sequences. It
will take effect immediately in the canvas.
-
ORFs
color : Click the color region to set the color for
representing the ORFs in the sequences (only when the Input
format is Genbank or EMBL). It will
take effect immediately in the canvas.
The bottom of the control
panel are three function buttons which are all equivalents of the commands
in the file menu and there are here just for convenient
purpose.
-
Reset
: Clean all the user modification in the GenomeComp main window
and initialize all the options to the default values. This command
can help users to erase error settings and bring them back to the
default values. But it should be carefully used since this operation
is irreversible. The equivalent of it is the button in the main
window with the same name and its short cut is "Ctrl+R"
in the keyboard.
-
Start
: Start a comparison project and display the result graphically
and dynamically in the canvas at the top right of the main window.
The equivalent of it is the button in the main window with the same
name and its short cut is "Ctrl+T" in the keyboard.
-
Save
setting : Save the current settings into the '.ibprc'
file in user's home directory, so there will be automatically loaded
whenever running GenomeComp.
-
Load
DEFAULT : Recover all of the current settings by
the program presetting options (the options that user first run
the program). This might be helpful when some settings were invalid
unexpected.
-
Save
: Save the graphical comparison result in the canvas into a local
file in PostScript format. To get more details about this command
see the part The Save Window. Its short cut
is "Ctrl+S" in the keyboard.
-
Exit
: Exit from the GenomeComp program. The equivalent of it is the
button in the main window called "Quit" and its short
cut is "Ctrl+X" in the keyboard.
-
Configure
: Recall the configuration window to view or
edit the former settings. Its short cut is "Ctrl+G" in
the keyboard.
-
Compare
type : It's a cascade menu. Users should select one of the
three choices according to their project.
-
Self-compare
: This means users just give one sequence as input, so GenomeComp
will perform a self-comparison for it. This command will be helpful
to discover some structural features like repeat sequences in
the given sequence.
-
Two
sequences : This is the default choice which means
inputting two sequences to compare with each other.
-
Three
sequences : This is an extend function for multi-genome-comparison.
It allows users to input three sequences and GenomeComp will automatically
perform the comparison between the reference sequence and the
other two sequences. Then display all the results in the canvas
synchronously. see the part Three Sequences
Comparison for more details.
-
Fasta
format sequence : Since this format of inputs do not
contain any annotation information of the sequences, the comparison
result based on such inputs will be less informative.
-
Genbank
format sequence : This is the default format for input
files because they are used comprehensively now.
-
EMBL
format sequence : Almost as good as the default one.
-
BLAST
output file : This is another type of choice. Only
recommend for those who could not run local BLAST and have to
use the exist BLAST output file directly, or those users that
do not care the annotation information.
-
Left
: It should be the common choice for similar lengths of input
sequence pairs, which means the two/three input sequences will
be presented on alignment from the start (e.g. the left side of
the canvas).
-
Center
: It is the default value which means alignment from the center
of every input sequences. So the center of every sequences will
be presented in a vertical line in the center of the canvas.
-
Right
: This value should be used only for some specific purpose. It
will make the end of every input sequences be aligned in a vertical
line in the right end of the canvas.
-
Custom
: Users can use this option to specify the adjustments of present
position for each input sequence manually. So any regions in those
pairs could be anchored in the same position for better visualization.
This selection will pop up a dialog window for inputting adjustment
values.
-
Find
ORF : This command allows users to search for some certain
genes by name in the comparison result display in canvas. Its short
cut is "Ctrl+F" in the keyboard.
-
Find
Next : This is a convenient command for locating the next
ORF with the same name. The equivalent of it is the button in the
pop-up window of "Find ORF" called "Next" and
its short cut is the function key "F3" in the keyboard.
-
Go
to Position : This command help users to move the scrolled
bar of the canvas to certain sequence position quickly. It's very
useful when the sequences are very long. Its short cut is "Ctrl+P"
in the keyboard.
-
All
length : Force GenomeComp to report all length of
specific regions in the specific regions list window.
-
Above
50 bp : Let GenomeComp report only those larger than
50 base pairs specific regions. (Default value)
-
Above
1 Kbp : Let GenomeComp report only those larger than
1 kilo base pairs specific regions.
-
Above
5 Kbp : Let GenomeComp report only those larger than
5 kilo base pairs specific regions.
-
Custom
: If selected it will pop up a dialog window and allow users to
custom threshold for specific regions list.
This is a cascade menu
for users to manipulate the display mode of the main window.
-
Nothing
: It's the default mode when GenomeComp start up which means showing
both of the control panel and display panel. Its short cut is "Ctrl+N"
in the keyboard.
-
Control
panel (left) : Hide the control panel in the left of common
main window, thus only display panel is displayed. So users might
can get maximal visualization about the canvas in the right display
panel. Its short cut is "Ctrl+L" in the keyboard.
-
Display
panel (right) : Hide the display panel in the right of common
main window, thus only control panel is displayed. Its short cut
is "Ctrl+D" in the keyboard.
This is a cascade menu
for users to choose the items that should be displayed in the canvas
(all of them will be displayed by default). Unmark any of them will
remove the corresponding items from the canvas.
-
Title
: The sequences name in the left start of the canvas. Sometimes
when the figure be zoomed out, the title text would be covered by
others. So it might be useful to remove them if you don't want to
redraw the figure.
-
ORFs
: The arrows that represent the ORFs in the sequences (available
only when provide GenBank or EMBL format inputs). Unmark this item
will automatically unmark the 'ORFs text' item below.
-
ORFs
text : The name text displayed above or below the ORFs
arrows. Mark this item will automatically mark the 'ORF' item above.
-
Ruler
: The black vertical line to mark the sequences length. Unmark it
will automatically unmark the 'Ruler text' item below.
-
Ruler
text : The number text above or below the ruler. Mark this
item will automatically mark the 'Ruler text' item above.
-
Island bar : The colored bar for representing the matches
in each sequence.
In the process of using
GenomeComp users will get some pop-up dialog windows for input values
or confirm actions. And also some warn or error message windows if GenomeComp
meets something wrong, so please check your operations according to
their messages.
This window will pop up
when users click the "Start" button on the main
window or execute the "Start"
command in the File menu with input sequences
in Fasta, Genbank
or EMBL format.
In the left frame of the
window GenomeComp will get the name and length of each input sequences
automatically. Here the sequence name is just a mark used by GenomeComp
and also display in the canvas to distinguish each other, so it is not
very serious and you can change it as you wish. But the sequence length
is very important for the graphical result display, please make sure
your input if you disagree the result counted by GenomeComp.
In the top right frame
users should select one of the external program to carry out the sequence
comparison. The default one is "megablast" which is suitable
for genome wide sequence (such as several mega base pairs long). Of
course please make sure you have specified the location of the program
you chose in the configuration window.
In the bottom right frame
users can manually set some parameters for running the external program
you chose above. Some common values have been preassigned and please
do not change them unless you are familiar with these parameters indeed.
After finishing the setting,
click "OK" button to continue the comparison or click "Cancel"
to back to the main window without any comparisons.
The "Reset" button can help you clean
all the user modification in this window and initialize all the options
to the default values.
When users are viewing
the comparison result in the canvas with the mouse be drawn on each
significant part, the comparison or sequence annotation information
will be briefly reported in the message window below. And the pair-wised
sequence comparison results or detailed information about ORFs can be
clicked out in a pop up window like the above.
By invoking the Specific
regions command from the list menu, this list
window will pop up and list the specific regions from both sequences
separately in the list box. Note that self-compare can not call this
window. The program will give two comparison results at the same window
if performing three sequences comparison.
The amount of reported
specific regions in each input sequence and the current threshold setting
are displayed above the list box. Users can quick locating those listed
regions simply by double-clicking them in the list box.
Sometimes after finishing
the sequences comparison, users would like to save the graphical comparison
result in a local file for further analysis et. al. Hence the Save
command in the file menu will be useful. The above
window will pop up and let users to specify the file name and location
to save all figure in the canvas or just certain part of it.
The saved file is in PostScript
format, which can be viewed using the free Ghostscript
program from Aladdin Inc. Note that the figure in this file is
a static graph without any dynamic displays as those in the canvas of
GenomeComp main window. But it might be very important
and useful when you want to present your comparison result in publications.
|