The purpose of this document is to introduce you to using syntom to examine genomic assemblies using the Consed software package.
To obtain an account on syntom, please contact Andreas Matern (alm13@cornell.edu)
You can log onto syntom from any computer which has an internet connection. I'll provide examples for logging on using a Windows NT machine. From the Start menu, select Run.

Enter telnet syntom.cit.cornell.edu in the Open field and hit OK. If all goes well, the follwing screen should appear:

Enter your login name at the prompt and hit enter, and then enter your password and hit enter.
The window will now tell you where your last login was from and then display a friendly message :-)
You are now logged into syntom.
|
Command |
Function |
|
ls |
lists all the files in the directory |
|
cd |
puts you in your home directory |
|
cd /path/to/directory |
changes your directory |
|
exit |
logs out |
|
cp file1 file2 |
copies file1 to file2 |
|
mv file1 file2 |
moves file1 to file1 |
|
cd .. |
changes one directory up the directory tree |
|
pwd |
prints the working directory |
These are just some basic commands. There are many more -- I have some Linux books on my shelf in Theresa's office, feel free to check them out but please don't steal them....
Consed uses a graphical interface which requires an X-windows emulator. We'll use Exceed (which is installed on most of the machines in G-04). To start Exceed on your machine go to the start menu -> Programs -> Exceed -> Exceed. A Hummingbird picture should open and then disappear. You now need to tell syntom the address of your machine so it can draw the windows on the appropriate machine. The easiest way to figure out what your IP address is to logout of your current telnet session to syntom (type exit) and then log back in (Run -> telnet syntom.cit.cornell.edu) After the Password: field there should be a line which says:
Last login:
<date> <time> from genomics8.cit.cornell.edu (<- this
will be the address)
That's the IP address of the machine you are using.
To tell syntom where to display the windows, simply type:
export
DISPLAY=genomics8.cit.cornell.edu:0
Don't forget to append the colon zero (:0) !
The BAC19 consed information is here: /home/amatern/sequences/bac19new/
To get to it, type cd /home/amatern/sequences/bac19new
To see what files are there, type ls
You'll see three directories:
|
chromat_dir |
the chromatographs |
|
phd_dir |
the phd files generated by phrap |
|
edit_dir |
this is the directory you'll be working in |
To get to edit_dir, simply type cd edit_dir
Typing ls gives you a list of all the files that are there:

To start consed, simply type consed_linux
If all goes well, there should be a consed window on your screen!
Here is the documentation from consed, I've added a little bit of information, but for the most part, this is the documentation that comes with the program. There are a couple of features of consed that are not yet implemented on syntom. When you find something that doesn't work, e-mail me and I'll get it working....
CONTENTS:
WHAT IS NEW IN CONSED 9.0
QUICK TOUR OF CONSED
ADVANCED PHRAP/CONSED USAGE
INSTALLING CONSED
NOTE TO SGI USERS
FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY
MONITORS AND MICE FOR CONSED
PRIMER PICKING PARAMETERS
AUTOFINISH PARAMETERS
NEW ACE FILE FORMAT
WHAT THE COLORS MEAN
------------------------------------------------------------------------
WHAT IS
NEW IN CONSED 9.0
This section
is mainly intended for advanced consed users.
Novice
users
should consult the Quick Tour (below).
---------------------------------------------------------------------
Note to
Linux users: The 'scroll all traces'
bug is fixed.
Note to
Solaris 2.7 users: Consed now works on
this version of
solaris.
---------------------------------------------------------------------
Autofinish
Improvements
* Reverses (universal primer reverse reads)
are now suggested in order
to
close gaps and improve low quality regions in addition to flanking
gaps.
* Autofinish now evaluates itself--after you
do the reads it
suggests,
you can run it and it will tell you how well the reads
solved
the problems they were supposed to solve.
* Oligos are tagged (when you use
-doExperiments).
* doNotFinish tags can be used to tell
autofinish to not try to
finish
particular regions.
* There are many more flags, allowing you a
great amount of control
over
autofinish. For example, if you wanted
the first round of autofinish
to not
choose any custom oligo experiments, fine.
If you wanted
autofinish
to only close gaps and not improve the error rate within
contigs,
fine.
* The autofinish output is very detailed and
verbose. Thus in
addition
there are 3 summary lists of experiments to do (one file for
forward
universal primer experiments, one file for reverse universal
primer
experiments, and one file for custom oligo walks.) These
summary
files are easily imported into Excel.
You can use the last
one to
email order oligos.
---------------------------------------------------------------------
Consed:
* Consed already had the ability to tear a
contig into 2 and join 2
contigs
into one. Now it also has the ability
to move a single read
to a
different location within an assembly.
Now you have much better
control
in fixing a misassembly.
* You used to be able to compare a contigs to
one other contig. Now
you can
compare a contig to many other contigs.
* For sites with LONG, LONG read names: you can now customize how
much
space consed saves for displaying read names.
You can also
customize
the initial size of the important windows.
* In the Traces Window, you can move left and
right with the arrow
keys.
* In the Aligned Reads Window, you can
instantly move to the
beginning
and/or end of a read. Similarly, you
can move instantly
to the
beginning and/or end of the consensus.
* The ABI base calls can be hidden, if you
like, thus allowing you
to see
more traces at once.
* All documentation windows can be searched
and printed out.
* In the past you could see all tags of a
particular type for a
particular
contig. Now there is also a function to
see all tags of a
particular
type in any contig.
* You can now write all contigs to a file in
FASTA format with a
single
click.
* You can navigate to multiple locations while
staying in the Aligned
Reads
Window--you don't have to switch windows with each location.
* For the primers that consed picks, consed
will show you the
alignment
of the closest false match. This will
help you in deciding
if you
want to raise consed.primersMaxMatchElsewhereScore
* The template picking part of primer picking
has been further improved:
-------------------------------------------------
(template)
---> (primer)
<----distance to end of
template---->
This 'distance to end of template' gives
the longest read you
could
possibly make with this primer and this template. If this
distance
is too short, you can now reject the template.
The consed
resource
to set is:
consed.primersWhenChoosingATemplateMinPotentialReadLength: 500
* If you want to pick templates yourself, you
can turn off consed's
template
picking. This is particularly useful if
you haven't bothered
to
customize determineReadTypes.perl
* If you are using an old version of phred or
if you haven't
installed
it correctly (with all kinds of bad effects), consed will
warn
you.
* Previously, consed reported the error rate
for a contig. But some
contigs
have long tails of low quality bases and you would like to
know
the error rate for the contig without that long tail. Now you
can do
that: You can get the error rate for a
specified region.
* Programmers can now append RT tags to the
ace file. (See FOR
PROGRAMMERS
AND FELLOW TRAVELLERS in README.txt)
* Programmers can popup a trace by a command
from a different
program.
----------------------------------------------------------------------------
Release
9.0
Consed
is a program for viewing and editing assemblies assembled with
the
phrap assembly program.
If you
are already an advanced consed user, you should read through
this
and do any of the exercises on features that you are unfamiliar
with. I frequently run across people who are doing
something in
consed
a hard way month after month, and request a new feature to make
things
easier, when that new feature is already in consed.
If you
have never used consed before, to follow this Quick Tour will
take
you less than 2 hours. However, it will
save you approximately 2
days in
agony. If you have 2 extra days to
spare, and prefer to waste
them in
agony, then do not do this Quick Tour and instead immediately
skip
down to 'INSTALLING CONSED' below.
When
you do the quick tour, I encourage you to be free about changing
the
data set. If you really mess things up
(such as changing all a
read's
bases to N's), no problem--just delete the data set and start
again
with a fresh copy.
The software is already downloaded and your syntom profile should be
correctly set so that you don't need to do the following - ALM
1) After downloading the distribution with netscape (see
www.phrap.org
and click on 'consed'), copy the distribution to a unix computer (if
it is not already on one), and then unpack the files by typing the
appropriate line below (which one depends on what you named the file
downloaded by netscape):
zcat consed_solaris.tar.Z | tar -xvf -
zcat consed_alpha.tar.Z |
tar -xvf -
zcat consed_hp.tar.Z |
tar -xvf -
zcat consed_sgi.tar.Z |
tar -xvf -
zcat consed_linux.tar.Z |
tar -xvf -
Note: You must untar on a
UNIX computer--not on an NT computer.
2) The only unix commands you must learn are
the following 3:
pwd -- this tells you were you are
ls -- this tells you what files are there (Same as DIR in DOS)
cd -- this moves you (Same as CD in DOS)
That's
it--use them a lot!
USING
CONSED GRAPHICALLY
3) Type the following:
cd
/home/amatern/sequences/standard/edit_dir
4)
start consed by typing the appropriate command below:
consed_linux
Two
windows will appear. One of these will
have the list of .ace
files
and say 'select assembly file to open' and
'standard.fasta.screen.ace.1'. Double click on that name. The first
window
goes away.
You
will now see a list of one contig and a list of reads. This is the
'Main
Consed Window'.
Double
click on 'Contig1'.
The
'Aligned Reads Window' will appear.
Try
scrolling back and forth. Try scrolling
by dragging the thumb of
the
scrollbar. Also try scrolling by
clicking on the 4 << < > >>
buttons
for scrolling by small amounts. For
scrolling by tiny
amounts,
click on the arrows at either end of the scrollbar. For
scrolling
by huge amounts, use the middle mouse button and just click
on some
location on the scrollbar. For
scrolling to the beginning or
end of
the contig, use the <<< or >>> buttons.
(Question:
why can't you just move the scrollbar to the extreme left
in
order to go to the beginning of the contig?
Answer: in typical
assemblies,
there are reads that protrude beyond the beginning of the
contig
and reads that protrude beyond the end of the contig. Moving
the
scrollbar to the extreme left will scroll the contig to the
beginning
of the leftmost read--typically far to the left of the
beginning
of the contig. Thus you should get in
the habit of using
the
<<< and >>> buttons.)
Notice
the colors. The bases that are in red
are the ones that
disagree
with the consensus.
Notice
the different shades of grey background (around the bases).
They
have the following meanings, but first, you need to understand
the
meaning of the quality values:
A
quality value of 10 means 1 error in ten to the 1.0 power
A
quality value of 20 means 1 error in ten to the 2.0 power
A
quality value of 30 means 1 error in ten to the 3.0 power
A
quality value of 40 means 1 error in ten to the 4.0 power
and for
quality values in between:
A
quality value of 25 means 1 error in ten to the 2.5 power
Get the
idea?
(These
have actually been empirically verified--if you are interested
in the
gory details, read the phred papers:
Ewing
B, Hillier L, Wendl M, Green P: Basecalling of automated
sequencer
traces using phred. I. Accuracy assessment.
Genome Research
8,
175-185 (1998).
Ewing
B, Green P: Basecalling of automated sequencer traces using
phred.
II. Error probabilities. Genome
Research 8, 186-194 (1998).
In that
same copy of the journal is a paper about consed, as well.)
Also
notice the upper and lowercase. This is
just a cruder indication
of the
quality of the bases.
5) To see the quality value of a particular
base, point at it and click
with
the left mouse button.
These
quality values are shown in grey scales:
Quality
0 through 4 is given by dark grey
Quality
5 through 9 is given by a shade lighter
Quality
10 through 14 is given by a shade still lighter
.
.
.
Quality
of 40 through 97 is given by white (the brightest shade)
A
quality value of 99 is reserved for bases that have been edited and
the
user is absolutely sure of the base ('high quality edited').
A
quality value of 98 is reserved for bases that have been edited and
the
user is not sure of the base ('low quality edit').
The
ends of the reads shows bases that are grey and have a black
background. These are the low quality ends of the reads
or the
unaligned
ends of reads, as determined by phrap.
To see
the quality of a base, click on it. You
will see the quality
displayed
in the Info Box on the Aligned Reads Window.
6) Click on a base on a read. Then hold down the control key and
type
'a'. You will move to the beginning of
the read. Hold down the
control
key and type 'e'. You will move to the
end of the read.
(Emacs
users will recognize these commands.)
7)
Scroll so that location 490 is about in the middle of the aligned
reads
window. Push the left mouse button down
on the menu item 'Dim'.
There
will be a list of choices that will appear.
Drag the cursor
down to
'Dim Nothing' and release. Now look
what happened to the
color
of the bases. The ends of the reads
that used to be with a
black
background now appear red with a grey background. You are
seeing
the clipped-off bases with all the same information as any
other base. Since there is a huge amount of red
(discrepant) bases,
the
screen becomes distracting and busy.
Thus by default the low
quality
clipped-off bases are made with a black background and a grey
foreground
so they don't distract you.
Notice
there is a distinction here between 'low quality ends of
reads'
and 'unaligned ends of reads'.
Unaligned ends of reads can be
low
quality as well, or they can be high quality, as in the case of
chimeric
reads.
You can
play with the dimming options a bit. Then return it to 'Dim
Low
Quality' for the rest of this tour.
TRACES
AND EDITING
8)
Point with the mouse at a base of one of the reads and click with the
both
mouse buttons simultaneously. It's
difficult at first, but you'll quickly get the hang of it. (If you have a 2 button mouse, see MONITORS
AND MICE FOR CONSED below.) The Trace
Window showing the traces for that stretch of read should popup.
There
are 3 rows of bases in the trace window:
'con'
is the consensus
'edt'
is where you can edit the base calls of the read
'phd'
is the original phred base calls
Notice
that a red rectangle blinks (the 'cursor') in the corresponding
positions
of the Aligned Reads Window and the Trace Window.
9) Try
editing in the Trace Window. You can
click the left mouse
button
on a base in the 'edt' line to set the cursor (a blinking red
rectangle). You can directly overstrike a base by typing
a letter.
Try
this. Try undoing it (by clicking on
'undo' ). If you want to
undo
more than one edit, you will have to go back to the main consed
window
and click on the button labeled 'Undo Edit...'--you will learn
that
later.
You can
move left and right with the arrow keys.
We
believe that the user should change a base call only while
examining
the traces. That is why editing is done
here--not in the
Aligned
Reads Window.
10) You can insert a column of pads by pushing
the space bar. Try
this. (You may need to click on a base on the
'edt' line first.)
(For
those of you new to editing assemblies, a 'pad', which in consed
and
phrap is represented by the '*' character, is used to align
two or
more sequences such as these:
gttgacagtaatcta
gttgacataatcta
in
which one sequence has an inserted or deleted base with respect to
the
other. By inserting the pad character,
it is possible to get a
good
alignment:
gttgacagtaatcta
gttgaca*taatcta
This is
the purpose of pad character--it is just a placeholder.)
You can
then overstrike a pad with a base. In
this way you
can
insert a base, and still preserve the alignment.
11) Try
highlighting a stretch of a read on the edt line by holding
down
both mouse buttons and dragging the cursor over some bases.
They
will turn yellow as you drag. Then
release the mouse buttons. A
window
will popup giving you some choices of what to do with those
(yellow)
bases.:
Make High Quality--makes the highlighted
bases edited high quality
(99).
This tells phrap (when it reassembles) that you are
sure of the sequence here.
Change Consensus--make the highlighted
bases edited high quality and
change the consensus to agree with
that stretch of the read.
This is a directive to phrap (upon
reassembly) to use that
stretch of that read to be the
consensus.
Make low quality--makes the highlighted
bases edited low quality.
This tells phrap (when it reassembles)
that you are not sure
of the bases here and phrap can go
ahead and make a join even
if the bases in this region don't
match perfectly.
Make Low Quality to Left End--same as
above, but all the way to
the left end of the read.
Make Low Quality to Right End--same as
above, but all the way to
the right end of the read.
Change to n's--Change the highlighted
bases to n's which means
they are unknown bases. This tells phrap (when it
reassembles) to not make any join
based on these bases. It is
useful when you believe the bases may
be in the chimeric
portion of a read.
Change to n's to left--same as above but
to left end.
Change to n's to right--same as above but
to right end.
Add Comment Tag--allows user to add a
comment to a stretch of read
bases.
Add Tag--allows user to add any tag to a
stretch of read bases.
Dismiss--you decided you don't really want
to do anything with
this stretch of bases.
This
popup is made so that nothing else works until you choose
something. Try each of these choices, except for tags,
which you'll
try
below.
'Change
Consensus' has an additional function--if a read extends out
on the
right beyond the end of the consensus, you can extend the
consensus
by using this function. You might want
to do this, for
example,
if crossmatch did not correctly find the cloning site and
thus
clipped too much. You can add these
bases back to the consensus
by
using 'Change Consensus'. (You can't
try it with this dataset
since
no read extends beyond the end of the consensus, but you may see
this
phenomenon with your own data.)
12) To
delete a base, overstrike it with a '*' character. (Phrap
ignores
'*', so this is the same as deleting the character.) If you
overstrike
all bases in a column with * characters so the entire
column
consists of *'s (including the consensus base), there is no way
to
remove the column. This is OK since
when you export the
consensus
(try the exercise on EXPORTING THE CONSENSUS), the
*'s are
not exported. While you are editing in
consed,
we
believe there should be a visual indication that a base was
deleted.
SAVING
THE ASSEMBLY
13) To save the assembly, pull down the 'File'
menu on the Aligned
Reads
Window, and release on 'Save assembly'.
A box will pop up with
a
suggested name. I suggest you always
use the one it suggests. The
idea is
that the ace files:
(project).fasta.screen.ace.1
(project).fasta.screen.ace.2
(project).fasta.screen.ace.3
(project).fasta.screen.ace.4
(project).fasta.screen.ace.5
are in
order of how old they are. If you feel
you are taking up too
much
disk space, then start deleting the ace files starting at the
oldest. I do not recommend that you overwrite
existing ace files.
The
version numbers just keep growing, and that is not a problem.
EXPORTING
THE CONSENSUS
14) Exporting the consensus. Bring the Aligned Reads Window into view
again.
Hold down the left mouse button on the 'File' menu and
release the button on 'Export consensus
sequence'. Notice that the
consensus will be stored (in this case) in a
file called
'Contig1.fasta'. Click 'OK'. There is now
a file in your edit_dir
directory called 'Contig1.fasta' that has the
consensus sequence in
it.
If you want to see the file, bring up another Xterm (if you are
UNIX literate), and type:
cd standard/edit_dir
more Contig1.fasta
15) Fancier exporting the consensus. Bring the Aligned Reads Window
into view again. Hold down the left mouse button on the 'File' menu
but this time release on 'Export consensus
sequence (with
options)...'. Just export a little snip of the consensus, from 400
to 410.
(You will notice this contains a pad * character.) Ask for
both the bases file and the quality
file. Click 'OK'. Consed will
want to call this file 'Contig1.fasta'
again. You can overwrite the
existing file.
Look in your other Xterm at these files:
more Contig1.fasta
more Contig1.fasta.qual
The one file contains the bases (but no *
pads) and the other
contains the corresponding qualities of those
bases.
16) Exporting the consensus of all contigs at
once: Go to the Main
Consed Window. Point to 'File', hold down the left mouse button, and
release on 'Write all contigs to fasta
file'. You then can choose a
filename for all contigs to be written to.
17)
(For this step, first click on the 'Dim' menu and release on 'Dim
Nothing'.) Point to the 'Color' menu, hold down the
left mouse button
and
release on 'Color Means Edited and Tags'.
Notice that the bases
that
you have edited (make sure you have edited some bases) will stand
out in
either white or grey (depending on whether the base was made
high
quality or low quality). Observe this
both in the Trace Window
and the
Aligned Reads window. This colormode is
useful if you are
interested
in easily spotting which bases are edited.
Return
to the 'Color Means Quality and Tags' colormode by the
following: point to the 'Color' menu, hold down the
left moust button
and
release on 'Color Means Quality and Tags'.
FIND
MAIN WINDOW
18) On
the Aligned Reads window, click on 'Find Main Win'. This will
cause
the Consed Main Window to pop up in the event you have buried it under
other
windows or iconified it. (This may not
with some settings of
your X
emulator. In that case you will have to
find and click on the
Main
Window to bring it up.)
MULTIPLE
UNDO EDIT
19) Now
that the Consed Main Window is visible, click the 'Undo Edit...'
button. There will be a popup indicating the most
recent edit. Click
'undo'. Then you will see the edit that was done
before that. Click
'undo'. You can continue undoing if you like. You now know how to
undo
more than one edit. You cannot choose
which edits to undo and
which
to not undo--edits can only be undone in precisely reverse order
from
the order you made them.
SCROLLING
TRACES AND ALIGNED READS TOGETHER
20) In
the Aligned Reads window, scroll along the contig to a
different
point. Click the left mouse button on a
read whose trace is
already
up. Notice that the existing trace
instantly scrolls to the
corresponding
location. Now go to the Trace Window
and scroll the
traces
to a new location. Click on the edt
line with the left mouse
button. You will notice that the Aligned Reads window
will instantly
scroll
to the corresponding location. Thus you
can keep the Aligned
Reads
window and the traces scrolled to the same location.
EXAMINING ALL
TRACES
21) Go
to a region where there are lots of reads, say base 1660. Push
down
the right mouse button and release on 'Display traces for all
reads'. You will see all traces displayed in a
scrolling window. You
can
drag the scrollbar on the right down and up to see all the traces.
This
feature is particularly useful for polymorphism/mutation
detection
work. This feature was added to work in
cooperation with
polyphred. To see it in action, exit consed.
CONSED-POLYPHRED
INTERACTION
Polyphred
is a program for finding polymorphic sites; it was developed by
Debbie
Nickerson's group (contact them at http://droog.mbt.washington.edu).
We have
a test database, 'polyphred', which has had polyphred run on
it
already. Polyphred has put a
polymorphism tag on each polymorphic
site.
Type:
cd
../../polyphred/edit_dir
ls
../../consed_(computer
type)
where
(computer type) is one of solaris, hp, alpha, sgi, or linux.
Double
click on example2.fasta.screen.ace.1
When
consed comes up, you should see 2 contigs.
Double
click on Contig2
In the
Aligned Reads Window, push the left mouse button while pointing
to the
'Navigate' menu and release on
'Toggle
feature: when navigating to consensus
location, pop up all
traces
(currently off)'
That
will turn this feature on.
Now
push the left mouse button while pointing to the 'Navigate' menu
and
release on 'Tags'. Up should pop a list
of tag types. Double
click
on 'polymorphism'. Polyphred has
already been run so the
consensus
is tagged with polymorphism tags at each polymorphic site.
Up will
pop a window labelled 'Polymorphism Tags' with a list of
sites. Click on 'Next'.
If you
correctly followed the instructions above, all the traces should
pop up
at the first polymorphic site. You may
want to reposition the
traces
window to see it better.
Now
ignore the original 'Polymorphism Tags' window and instead click
on
'Next' in the *traces* window. This
will take you to the next
polymorphic
site. Pretty nice, huh?
After
you are done playing with this feature, exit consed and go back
to the
previous database:
cd
../../standard/edit_dir
ls
../../consed_(computer
type)
Double
click on standard.fasta.screen.ace.1
Double
click on Contig1 to bring up the Aligned Reads Window again in
preparation
for the next step.
NAVIGATING
22) In
the Aligned Reads window, pull down the Navigate menu and
release
on 'Low consensus quality'. You will
see a list of locations.
Move
the 'Low consensus quality' window down so you can see the
Aligned
Reads window. Repeatedly click on
'Next' until you reach the
end of
the list. (Low consensus quality means
an area in which the
bases
each have too high probability of being wrong.) This saves you from
having
to look through large amounts of high quality data trying to
find
problem areas.
Alternatively,
you can click on the 'Prev' and 'Next' buttons on the
Aligned
Reads Window. Thus you can keep the
Aligned Reads Window in
front
with input focus and keep the Low consensus quality window
pushed
out of the way.
You may
want to click on the 'Save' button in the Low consensus
quality
Window to save to a file a copy of this list of problem areas
as you
work through them.
In our
experience, this will be the most important navigate list you
will
use. In fact, finishing consists mainly
of adding reads and
rephrapping
until this list is reduced to nothing.
23)
Dismiss the Low consensus quality window.
Pull down the
'Navigate'
menu again and release on 'High quality discrepancies as
above,
but omitting tagged compressions and G_dropouts'. You will
probably
notice there are no entries (unless you created some yourself
by
editing). That is because there are no
high quality discrepancies
with
this dataset. So let's force there to
be some by lowering the
quality
threshold. First, dismiss the High
quality discrepancies
window.
Click
on 'Find Main Win'. In the main consed
window, pulldown the
'Options'
menu and release on 'General Preferences'.
Notice that the
default
for 'Threshold for High Quality Discrepancy' is 40. Change it
to 15
and click 'Apply & Dismiss'.
Then
follow the steps above to bring up the High quality discrepancies
menu. Now you will see several entries. Click 'next' repeatedly to
go
successively to the next high quality discrepancy in the Aligned
Reads
Window.
You can
also double click on a particular line in the High quality
discrepancies
window to go to that location.
Alternatively, you can
single
click on a line and then click the 'Go' button.
Dismiss
the High quality discrepancies window.
24)
Similarly, try the other navigate lists: Unaligned high quality
regions
(this list will be empty with this data set), Edits, Regions
covered
by only 1 strand and only 1 chemistry, and Regions covered by only 1
subclone.
Unaligned
high quality regions are regions in which the traces are
high
quality so there is no question of the bases, but the region
differs
so much from other reads that phrap has given up trying to
align
the region with the consensus. This
could be due to a chimeric
read,
or perhaps the read belongs somewhere else.
We
believe that regions covered by only 1 subclone should be covered
by a
2nd subclone to prevent the possibility of there being a deletion
in the
single subclone.
There
are so many different problem lists that you may forget to check
one of
them and thus miss a serious problem.
Thus we combined them
all
into a single list. This is the first
menu item: 'Low Cons/High
Qual
Discrep/Single Stranded/Single Subclone/Unaligned High'. We
suggest
you use this list.
25)
Also try navigate by tags by selecting 'tags' under navigate: when
the
Select Tag Type Window appears, double click on 'compression'.
(Note
that you can't do anything else until you deal with this
window.) This gives a list of a particular tag type
in a particular
contig.
26) There is also a way of getting a list of a
particular tag type in
all
contigs: Click on 'Find Main Win'. In the Main Consed Window,
point
to the 'Navigate' menu, hold down the left mouse button, and
release
on 'Tags in all contigs'. Continue as
in the previous step.
PRIMER-PICKING
****
Temporary step ****
After
you have completed the 'install vector files' step (below), you
should
never do this.
Click
on 'Find Main Win'. On the Main Window,
open the Options menu,
and
release on 'Primer Picking Preferences'.
Notice the question
'Screen
Primers Against Sequences in File?' (If
you have trouble
finding
this question, scroll the Primer Picking Preferences list
down. It is between
'PrimersNumberOfTemplatesToDisplayInFront' and
'Pick
subclone templates for primers?' Click
on 'False'. Then click
'Apply
& Dismiss' and the Primer Picking Preferences box will pop
down.
(In
real use, 'Screen Primers Against Sequences in File?' should be
set to
'True'. I have had you set it to False
just this once so you
can go
ahead and see how this is supposed to work until your system
administrator
has time to correctly install the vector sequences file.
****
end of temporary step ****
27) Go
to some location near the right end of the contig, say base
2570. Click with the right mouse button on the
consensus and click on
either
one of the top strand primer choices (either from subclone
template
or from clone template). Consed will
pause a moment, and
then
there will appear a selection of primers that pass all of
consed's
requirements. Templates are also chosen
for each primer.
You may
have to scroll the primer list to the right to see the
templates. Consed lists these templates in order of
quality--all of
them
will cover the read you want to make.
Double
click on one of the primers in the Primers Window. That will
cause
the Aligned Reads Window to scroll to show that oligo in
context. Click on 'Accept Primer'. Notice that a yellow oligo tag is
created
on the consensus for that primer. That
tag contains all the
information
you need to order that oligo and do the reaction--you will
learn
how to pop it up below under 'tags'.
What is
the difference between 'Pick Primer from Subclone Template'
and
'Pick Primer from Clone Template'?
There
are 3 differences:
A. which vector file the primers are screened
against. In the former
case,
the primer is screened against the file primerSubcloneScreen.seq
and in
the latter case against the file primerCloneScreen.seq
B. In checking for false matches elsewhere in
the assembly, if the
template
is the whole clone, then consed must check for false matches
in the
*entire* assembly, including all other contigs. But if the
template
is just going to be a subclone, consed only needs to check
elsewhere
in that subclone. Actually, to be
conservative, consed
checks
for false matches +/- the maximum insert size of a subclone.
C. If you are picking primers for subclone
template, then the primer
picker
can also pick the subclone templates.
If it doesn't find any
suitable
subclone template, it will reject the primer.
(By default,
picking
of subclone templates is turned off.
You can turn it on
temporarily
or permanently. To turn it on
temporarily, go to the
Consed
Main Window, point to the Options menu, hold down the left
mouse
button and release on 'Primer Picking Preferences'. Scroll down
to
'Pick Subclone Templates for Primers' and click 'True'. Click on
'Apply
and Dismiss'. To change this
permanently, see CONSED
CUSTOMIZATION
below. Beware: you must correctly customize
determineReadTypes.perl
for template picking to work. See
INSTALLING
CONSED
below.
If you
are interested in the details of primer-picking, see the
section
'PRIMER PARAMETERS' (below).
When
you are done editing and have saved the assembly and exited
consed,
run ace2Oligos.perl (supplied with this distribution--make
sure
your system administration installed it) which will extract all
the
oligos you just created. This is handy
for email ordering of
oligos.
In the
xterm, type:
ace2Oligos.perl
standard.fasta.screen.ace.2 oligos.txt
where
standard.fasta.screen.ace.2 is whatever the name is of the ace
file
you just saved.
SEARCH
FOR STRING
28) Try
the 'Search for String' button (left side of the Aligned Reads
Window). Type in a string (such as aaaca), and click
'ok'. There
should
be a list of 'hits'. Double click on
one of the hits (or
single
click on it and click on 'go'.) Notice
that the Aligned Reads
Window
scrolls to that position and has the cursor on the found
string. (It might be complemented.)
Dismiss
this window. Try this again, only this
time in the Search For
String
Window select 'Search Just Reads'. Then
click 'OK'. You will
notice
there are many more hits. This is
because this shows hits in
each
read, even if they are at the same consensus position.
COPY
AND PASTE
29) In
the Aligned Reads Window, swipe some bases by holding down the
left
mouse button. You should see the bases
turn yellow, at least
temporarily. Then click the 'Search for String'
button. Use the
middle
mouse button to paste the bases you have just swiped into the
'Query
string:' box. Notice that you can swipe
bases either from the
consensus
or from a read.
The
search for string is case-insensitive so don't worry about the
pasting
being upper or lowercase.
CORRECTING
FALSE JOINS MADE BY PHRAP
30) Phrap may put several reads together that
you believe do not belong
together. (For example, you may see several high
quality
discrepancies
between the reads.) If you are sure
these reads do not
belong
together, you can force a subsequent reassembly by phrap to not
assemble
those reads together. You do this by
finding a location
where
there is a high quality discrepancy.
Then click on the read
with
the right mouse button and release on 'Tell phrap not to overlap
reads
discrepant at this location'. There are
no high quality
discrepancies
with this dataset so consed won't let you do this.
(Try it
and see.) However, when you use your
own data, you may get
the
chance!
ADDING
READS
31) For
this to work, your system administrator must have set up
everything
correctly. (See below in INSTALLING CONSED.)
Assuming you
have
set everything up correctly, you can now experiment with adding
reads.
Now
bring up consed again using ace file standard.fasta.screen.ace.1
If it
asks if you want to apply edits, just say 'no'.
On the
Main Window, click on the Add New Reads button. There will
appear
a list of files ending with .fof. These are files that contain
lists
of chromatograms. Double click on
'reads_to_add.fof' There
should
be lots of progress output in the xterm from which you started
consed. When it completes, there will be a Reads
Added Window popup
with a
report of which reads were added. In
this case, it should say
that 9
reads were successfully added and list them.
TEARS
AND JOINS
32) When
phrap really screws up, you may want to just tear the contig
apart
in several places and then join the pieces back together in a
different
way. Although we discourage you from
doing this, we do give
you the
power to do it, if you want to. Let's
try it:
Go to
location 1550. Point the mouse at the
consensus base at 1550
and
push the right mouse button down.
Release the button on 'Tear
Contig
at This Consensus Position'. Up will
pop a list of reads with
2
little buttons next to them <- and ->.
Leave everything as it is
and
just click 'Do Tear'. (If you want to
play around with which
reads
goes into which contig, do that another time.)
Now you
should have 2 Aligned Reads Windows on top of each other. One
should
contain 'Contig2' and the other 'Contig3'.
Now
let's join these 2 contigs back together:
Click
on 'Search for String' and type in the following bases:
agctgccatc
Click
'OK'.
Search
for string should find 2 locations, one in Contig2 and one in
Contig3:
Contig2 (consensus) 1447-1456
(uncomplemented)
Contig3 (consensus) 829-838
(uncomplemented)
Double
click on the first one. The Aligned
Reads Window for Contig2
will
scroll to location 1447 and the window will raise up. In that
Aligned
Reads Window, click on 'Compare Cont'.
Now
double click on the 'Contig3' line in the above Search for String
results. The Aligned Reads Window for Contig3 will
scroll to location
829 and
lift up. In that Aligned Reads Window,
click on 'Compare
Cont'.
Now the
Compare Contigs Window should be visible.
In the Compare
Contigs
Window, try scrolling back and forth.
You can change the
cursors
(blinking red), but if you do, please return them to the
locations
1447 and 829 for the next step. The cursors
'pin' these
bases
together when doing an alignment. (The
algorithm is a pinned
Smith-Waterman
alignment.)
Click
on Align. Try scrolling the alignment
by dragging the thumb in
the
lower half of the Compare Contigs. An
'X' means there is a
discrepancy
between the 2 contigs. There is also a
'P' (see if you
can
find it!) The P indicates the bases
that you pinned together.
Click
with the left mouse button on either contig in the bottom
alignment. You will notice that both contigs will have
the red
blinking
cursor in the same position. Click on
'Scroll Both Aligned
Reads
Windows' and look at the Aligned Reads Windows to see that they
scroll
to the corresponding positions. You can
have traces up for the
contigs,
and they will scroll as well.
Experiment with this. Then
click
'Join'. The 2 previous Aligned Reads
Windows will disappear and
there
will be a new one which has a new contig 'Contig4'. You have
made a
join!
It is
possible to have more than one Compare Contigs windows up at a
time. This allows you to investigate a repeat that
has more than 2 copies.
Compare
Contigs is one method of exploring joins of contigs that were
not
made by phrap. Another method is to use
phrapview, supplied with
phrap. phrapview gives a high level view of all
internal joins while
'compare
contigs' shows the alignment of a single internal join. Some
users
have found them to work well together--phrapview to find a join
and,
having found it, 'compare contigs' to examine it in more detail.
REMOVING
READS
33) You can also remove individual reads and put
them into their own
contigs. For example, in the Aligned Reads Window, go
to location
2000. Point to the read name of read djs74_2664.s1
and hold down the
right
mouse button. Release on 'Put read
djs74_2664.s1 into its own
contig.' Consed will ask you 'Are you sure...?' Answer 'yes'.
Presto-chango! The read is put into its own contig and the
old
contig
is redrawn without the read in it. At
this point you should
save
the assembly--you should always save the assembly after removing
a read.
TAGS
34)
Bring up a trace for a read (as above).
Swipe some bases on the
'edt'
line with the middle mouse button. A
list of choices will popup.
Select
'Add Comment Tag'. Type in a comment in
the box that appears,
and
click 'OK'. You will now see a blue box
both in the Aligned Reads
Window
and in the Traces Window on that read.
To see
the comment, you can click on that blue tag in the Aligned
Reads
Window with the right mouse button and release on 'Tag: comment
Show
more info?'. Alternatively, you can
click on the blue tag in the
Traces
Window with the right mouse button.
Try
creating some other kinds of tags: again swipe some bases in the
Trace
Window. But this time instead of
clicking 'Add Comment Tag',
click
on 'Add Tag'. Select another tag
type. You will notice that
different
tags are in different colors. You can
always click with the
right
mouse button on the tag (as above) if you forget what a
particular
color means.
You can
also define your own tag types. See
below CREATING CUSTOM TAG
TYPES
for how to do that.
35) You
can create really, really long tags as follows: Just create a
short
version of the tag as above for where you want the tag to start.
Then
figure out the consensus position of where you want the tag to
end. In the Aligned Reads Window, click on the
short tag with the
right
mouse button and release on 'tag: show more info?' (as above).
A Tag
Window will appear for that tag. In the
Tag Window, simply
change
the End Unpadded Consensus Position to the place you want it to
end. Then click 'OK'. You will now notice that the tag will be as
long as
you wanted.
36) You
can create tags on the consensus in the same way. In the
Aligned
Reads Window, use the middle mouse button to swipe some bases
on the
consensus in the Aligned Reads Window.
Up will pop a list of
tag
types. Click on one of them. Try it again somewhere else. Try
it with
the tag type being 'comment'. In this
case, you must enter a
comment. Notice the pretty colors! If you forget what a particular
color
means, you can click on the colored tag with the right mouse
button
and it will tell you.
37) Try creating some tags that overlap each
other. You will notice
that
the overlapping region will be purple.
If you want to know which
tags
overlap, you can click with the right mouse button on the purple
and you
will be told all tags that are on that base.
38) If
you have many tags that overlap and thus are purple, you can
hide
some less relevant tag types so there is less purple and there is
less
distraction. Make sure you have a few
tags visible. Then click
on
'Find Main Win'. In the Main Window,
open the Options menu, and
release
on 'Hide Some Tag Types'. A list of tag
types will popup.
Select
the type that you have visible (above).
Then click 'OK'. Go
back to
the Aligned Reads Window. That tag
should still be visible.
Click
on the button 'Some Tags' in the upper right part of the Aligned
Reads
Window. Your tag should disappear. The 'Some Tags' button
should
have changed to 'Sh All Tags'. Click on
it again. Your tags
should
have reappeared.
INCREMENTAL
SEARCH FOR READ NAME
39)
Restart consed. Instead of clicking on a
read or contig name,
type a
read name into the 'Find read:' box.
Try typing djs74_2 You
will
notice that as you type each letter, the first item in the list
that
matches the letters typed will be highlighted.
Experiment with
deleting
a few letters and typing others. This
is a powerful method
of
quickly getting to the read name you are interested in. When you
get to
the read you want, just type carriage return or click the 'OK'
button.
ONLINE
DOCUMENTATION
40) On the Aligned Reads Window, click on the
'Help' menu and release
on
'Show Documentation'. You will see this
document.
GOTO
POSITION
41) In
the Aligned Reads Window, click in the 'Pos:' box in the upper
right-hand
corner. Type in a number, such as 540,
and push the
'Return'
or 'Enter' key. The Aligned Reads
Window will scroll to
position
540. We find this feature is
particularly useful when one
person
wants another person to look at something in the sequence.
HIGHLIGHTING
READ NAMES
42) In the Aligned Reads Window, click on a read
name with the left
mouse
button. The name will turn
magenta. Click again and it will
turn
yellow again. Try turning it magenta
and then scrolling. This
feature
is helpful in keeping track of a particular read as you scroll.
COMPLEMENTING
THE CONTIG
43) Push 'Comp Contig' in the Aligned Reads
Window to complement the
contig. This displays the opposite strand of the
contig including the
consensus
and all reads. Push this button again
to uncomplement it.
RECOVERY
FROM CRASHES
44) It is important to feel that your data are
safe, even if the
computer
(or consed) were to crash. Consed will
recover your data
from
such a crash.
Make an
edit (remember, edits are made in the Trace Window) and jot
down
its location. Also note the name of the
ace file which is
displayed
in the upper left box in the Aligned Reads Window. Then
simulate
a crash by going to the xterm where you started consed and
typing
control-C. Restart consed and select
the same ace file you
noted
(above). A box will come up saying
'There is an edit history (a
.wrk
file) Consed may have crashed during a previous session with this
same
file. Do you want to apply those
edits?' Click on 'yes'. Go
and
find the edits you made before consed crashed--you will find them.
This is
the purpose of the .wrk files--they are a log file of your
edits
and they are added to as you make edits.
45) You should save your edits by pulling open
the 'File' menu on the
Aligned
Reads Window, and releasing on 'Save assembly'.
PROTEIN
TRANSLATION AND OPEN READING FRAMES
46) If you would like, you can see the amino
acid translation of the
consensus in all reading frames. In the Aligned Reads Window, push
down the left mouse button on the 'Misc' menu
and release on 'Show
Top Strand Protein Translation'. Try again but this time release on
'Show Bottom Strand Protein
Translation'. Notice that there are 2
characters that are in magenta color. What are those characters?
Why are they made in a different color? To not show the protein
translation, push down the left mouse button
on the 'Misc' menu and
release on 'Don't show protein translation'.
47) You can search for open reading frames
within a contig. In the
Aligned Reads Window, push the left mouse
button on 'Navigate' and
release on 'Search for Open Reading
Frames'. Notice that the open
reading frames are shown for all 6 reading
frames and are sorted by
length.
ERROR
RATE
48) In the Aligned Reads Window is a box (upper
right) labelled
'Err/10kb'. This is the estimated error rate for this
contig, and it
is a
good indicator of when you are done (or not done) finishing.
In
addition, you can find the error rate for a particular region of
contig
as follows: Point at 'Misc' menu, hold
down the left mouse
button,
pull down and release on 'Show Error Info For Region'. Fill
in the
boxes for left and right consensus position, click on
'Calculate'
and you will be given the error and single subclone data
for
that region.
RUNNING
PHRED and PHRAP
phred
and phrap *must* be run via the phredPhrap perl script. If you
don't
do this, you are on your own. If you
run phred on its own, and
then
you run phrap on its own, you will get an ace file that will not
be usable
by consed. If you try to run phred and
phrap without using
the
phredPhrap script, you are on your own.
After you have run into
problems
(and you probably will), then do not email me--instead please
use the
phredPhrap script. To use the phredPhrap
script to run phred
and
phrap:
49) Type:
phredPhrap
-V
It
should say:
991019
If it
does not, then you probably have not installed all the perl
scripts
from the scripts directory, as directed above.
50) Make a copy of the standard dataset. E.g.,
cp -r
standard test
cd test
51) Delete all the file in phd_dir and edit_dir:
rm
phd_dir/*
rm
edit_dir/*
52) cd edit_dir
53) Run phredPhrap by typing
phredPhrap
That's
it--you no longer need to type *any* arguments, and generally
you
should not. (Please do *not* use the
-notags option any longer.)
If you
want to add phrap options, you can do that:
e.g.,
phredPhrap
-forcelevel 3
Then
run consed on the resulting ace file as indicated in the beginning of
the Quick
Tour (above). If you have any problems,
this is the time to
diagnose
them before you use your own data.
After
you have done this successfully, you are ready to use your own
data.
AUTOFINISH
Note: Before you use autofinish on your own data,
you must modify
determineReadTypes.perl. See INSTALLING CONSED below for information
about
this.
54) cd
to autofinish/edit_dir
55) Try starting consed by typing:
../../consed
-autofinish -ace autofinish2.fasta.screen.ace.2
(Note
'consed' above may be 'consed_solaris', 'consed_alpha',
'consed_hp',
'consed_sgi', or 'consed_linux' depending on your
executable. If you have trouble, use that 'ls' command
(see above)! )
If
autofinish says:
Run-time
exception error; current exception: InputDataError
No handler for exception.
Abort
that
means that you have not followed the instructions under
'INSTALLING
CONSED' below. Please follow those
instructions and then
try
this again.
Consed
will create 5 files:
autofinish.fof
(project
name).991014.155627.out
(project
name).991014.155627.univForwards
(project
name).991014.155627.univReverses
(project
name).991014.155627.customPrimers
The
'991014.155627' is the date and time in format YYMMDD.HHMISS.
The first
file, autofinish.fof, is a file of filenames.
It contains
the
names of the other files.
The
.out file is the autofinish output file.
If you want to know
*why*
autofinish picked the reads it did, it will tell you. It will
tell
you lots more, such as the orientation of the contigs.
If you
correctly installed consed, it will print out a list of
experiments
you should do to make reads in order to reduce the number
of
errors below a target threshold.
(project
name).991014.155627.univForwards
is the summary file of the suggested
universal forward subclone reads
(project
name).991014.155627.univReverses
is the summary file of the suggested
universal reverse subclone reads
(project
name).991014.155627.customPrimers
is the summary file of the suggested
custom primer reads
These
are the files you will typically use for directing your bench
work. If you like, you can import these files into
Excel since the
fields
are separated by commas.
This
finishing tool is designed to be run in batch after each
assembly. In a high throughput operation, the
production people can
make
these reads without anyone using consed to examine the assembly
interactively. Only when autofinish cannot help you any
longer
(either it reduces the number of expected errors below your
error
threshold, or it says it can't help you further), must you bring
up
consed graphically and examine the assembly.
AUTOFINISH
TARGET ERROR RATE
Now
let's experiment with some of the autofinish options. By default,
autofinish
will suggest finishing reads until the error rate is less
than
100 errors per megabase. Suppose you
want fewer errors. Fine:
56) Create a file in edit_dir called .consedrc
and put
the following line in it:
consed.autoFinishMaxAcceptableErrorsPerMegabase:
10
(Note:
I have put the following already in your .consedrc
consed.autoFinishAllowWholeCloneReads:
false
That
tells autofinish to not suggest any sequencing reactions directly
off the
BAC or cosmid, since most labs don't like these sequencing
reactions--they
prefer sequencing reactions off M13 or plasmids. So I
suggest
you leave this line the way it is.)
Run
autofinish again the same as before:
../../consed
-autofinish -ace autofinish.fasta.screen.ace.1
You
will notice two differences in the output:
First, near the top of
the
autofinish output file it will say:
consed.autoFinishMaxAcceptableErrorsPerMegabase:
10
whereas
before it said:
consed.autoFinishMaxAcceptableErrorsPerMegabase:
100
A
second difference is that this time it suggested additional
experiments.
Note
for UNIX novices: Earlier, I said that
you only needed to know 3
UNIX
commands: pwd, ls, and cd. Now I want you to learn one variant:
ls -tlr
This is
the same as ls, but it puts one file on a list and prints the
lines
so that the most recent files are on the bottom. Since you will
be
created many, many files as you work through these autofinish
exercises,
this command gives an easy way to see the files you have
just
created, without having to always look at autofinish.fof to look
for the
names of the files you just created.
AUTOFINISH: CHANGING COSTS
57) Now please change it back to
consed.autoFinishMaxAcceptableErrorsPerMegabase:
100
or else
just comment out the line by putting a '!' in the first column
like
this:
!consed.autoFinishMaxAcceptableErrorsPerMegabase:
10
and run
autofinish again:
../../consed
-autofinish -ace autofinish.fasta.screen.ace.1
Check
that it now says:
consed.autoFinishMaxAcceptableErrorsPerMegabase:
100
near
the top of the autofinish output file.
Notice
that it calls 3 custom primer subclone sequencing reactions and
3
universal primer sequencing reactions.
Suppose
you want to indicate that your lab can make oligos very
cheaply--as
cheaply as doing a universal primer reaction.
You can do
this by
lowering the relative cost of subclone sequencing
reactions. Put the following in .consedrc
consed.autoFinishCostOfCustomPrimerSubcloneReaction:
20
And
then run autofinish again:
../../consed
-autofinish -ace autofinish.fasta.screen.ace.1
Check
that it now says:
consed.autoFinishCostOfCustomPrimerSubcloneReaction:
20
near
the top of the autofinish output file.
You
will notice that there are now 4 custom primer experiments and 2
universal
primer experiments.
AUTOFINISH: CHANGING MELTING TEMPERATURES
58) Look near the top of the autofinish output
file and you will see the
following
lines:
consed.primersMinMeltingTemp:
50
consed.primersMaxMeltingTemp:
55
Some
labs prefer to use primers with higher melting temperatures. In
your
.consedrc file, put the following lines:
consed.primersMinMeltingTemp:
55
consed.primersMaxMeltingTemp:
60
Then
run autofinish again:
Check
that it now says:
consed.primersMinMeltingTemp:
55
consed.primersMaxMeltingTemp:
60
near
the top of the autofinish output file.
Compare
the first experiment from the last 2 autofinish runs.
Everything
should be the same except that the primers are longer at
their
3' ends but are otherwise the same primers.
AUTOFINISH: OTHER CONTROL
59) Try adding to .consedrc the following:
consed.autoFinishCloseGaps:
false
and run
autofinish again.
What
happened?
Another
parameter that people sometimes change is:
consed.autoFinishMinNumberOfErrorsFixedByAnExp:
0.1
One
finisher says that she prefers to set this at 0.5 errors and to
decrease:
consed.autoFinishMaxAcceptableErrorsPerMegabase:
1
This
has the effect of making autofinish try to resolve every
region
where errors are clustered tightly together, even if the total
error
rate for the entire BAC is very low.
You can
change any of the parameters listed at the top of the
autofinish
output file (or actually any of the more exhaustive list of
resources
listed in the 'Info' menu, 'Show Consed Resources' list.)
We
believe the defaults are an excellent starting point.
AUTOFINISH: NOT REPEATING FAILED EXPERIMENTS
60) If
you are serious about doing the experiments autofinish
suggests,
consed
-ace (ace file name) -autofinish -doExperiments
-doExperiments
causes autofinish to record its suggestions in the ace
file. If one of these suggested reads fails to fix
a problem, when
autofinish
is run again it won't pick the same read again.
If a
forward or reverse universal primer read failed, autofinish (when
run in
a subsequent round) will not suggest that same experiment. If
a
custom primer read fails, autofinish will not pick that same
experiment
again, and it won't pick a custom primer read that is even
close
to the failed one. 'Close' is defined
by the resource:
consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead:
50
You can
change the default of 50 if you like.
In
addition, autofinish (the next time it is run) will tell you how
well
each experiment did in solving the problem it was intended to
solve.
See the
EVALUATING
EXPERIMENTS
section
of the autofinish output file.
(Note
to programmers: the format of the
autoFinishExp tags is likely
to
change--parse them at your peril!)
-doExperiments
will also cause oligos to be tagged.
(You can turn
this
off by setting:
consed.autoFinishTagOligosWhenDoExperiments:
false
Primer
id's created by autofinish use the same naming scheme as
primers
created in consed and they will not conflict with each other.
For
example, if autofinish creates oligos djs14.1, djs14.2, and
djs14.3,
then the next primer that a user accepts will be djs14.4. If
autofinish
is run a second time, it will start with primer djs14.5.
You
should not type '-doExperiments' if you do not intend to do the
experiments
autofinish suggests. If you use
-doExperiments, but you
don't
really do the experiments, and then you run autofinish again,
autofinish
will be very upset--it will think that all of its suggested
experiments
failed (because it can't find them). It
will see that all
of the
problems are still present but it will think that it should not
choose
any of those same experiments again so it will suggest
different
experiments that will not be as ideal.
AUTOFINISH: doNotFinish particular regions
61) If there is a region that you don't care to
finish (e.g., it has
already
been finished or you know there is no gene there), then you
can put
a doNotFinish tag on the consensus and autofinish will not try
to
finish this area. Try putting a
doNotFinish tag on the region from
1 to
200. Run autofinish again. You will notice that there will no
longer
be any experiments to solve weak regions in the consensus.
----------------------------------------------------------------------------
ADVANCED
PHRAP/CONSED USAGE
62) BACKING OUT EDITS AFTER YOU HAVE SAVED THE
ASSEMBLY
If you
decide that all your edits are terrible and you want to start
over
(perhaps you have been training a new finisher), the cleanest
solution
is to delete everything in phd_dir and edit_dir , but leave
everything
in chromat_dir and just run
phredPhrap
again.
63) SELECTIVELY BACKING OUT EDITS AND REMOVING
READS
If you
want to back out all edits in just particular reads, I have
provided
a perl script to do this:
revertToUneditedRead
(read name)
What it
does it copy the .phd.1 to 1 greater than the highest
version.
Then
you must reassemble using the phredPhrap script to create an ace
file
that has no edits for that particular read.
It will have all
edits
for all other reads.
Why
doesn't it just delete all phd files except for the
.phd.1? In that case, consed could not read any
previous ace file
since
all previous versions of ace files would refer to phd files that
have
been deleted.
64) REMOVING READS FROM AN ASSEMBLY
Create
a file containing the filename of all the reads you want to
remove,
one filename per line.
Then
use the perl script
removeReads <file of filenames>
Then
reassemble using the phredPhrap script.
65) ADDING READS WITHOUT CHROMATOGRAM FILES
This
may happen if you, for example, download sequence from Genbank
and
want to assemble it along with your reads.
There
are 2 ways to do this, depending on whether you want to edit the
read or
not.
a) If you want to edit the read, run mktrace to
produce a fake trace. It
will
have all perfect peaks.
Run:
mktrace
(name of file with fasta sequence)
Then
run the phredPhrap script normally. You
will be able to bring up
the
traces in consed and edit the read.
b) If it is not important to edit the reads,
there is a method that
is a
little faster. Create just a fake phd
file using:
fasta2Phd.perl
(name of file with fasta sequence)
It will
create a file whose name is taken from the fasta file name:
for
example, if the fasta filename is Contig1.c.fasta, then the phd file
will be
called Contig1.c.phd.1 The fasta name in the file is ignored.
You can
then put this in the phd_dir, and reassemble using the
phredPhrap
script.
Note:
all fake reads should end with an extension .c or .a or .c1 or
.c2 ...
or .a1 or .a2 or ... This is important
because it tells
consed
and autofinish that this data cannot be used as a template for
a
primer.
Note: when you are creating phd files such as
this, you must start with
(read
name).phd.1 Do not start with (read
name).phd.2 or any higher
version
number. This is because consed looks
for the .1 version in
order
to find the original phred calls so it expects there to be a .1
version.
If the
reads are really fake (you don't want
templates
66) WHY ARE ALL THE READS NOT IN THE ASSEMBLY?
You will
notice that there are some contigs that contain only one
read. You will also notice that there are some
reads that are not
shown
by consed at all, since phrap did not put them into the ace
file. Why?
If a
read does not have a significant match (with Smith-Waterman score
exceeding
minscore) to any other read, that read is not included in
the ace
file. Instead, that read is put in the
'.singlets' file.
That
read will not appear in consed.
If a
read does have a significant match to any other read, then it
will
appear in the ace file and be shown by consed.
However, such a
read
might have other problems: it might not be possible to assemble
such a
read with other reads (in the case of EST's this read may be a
unique
representative of a particular gene (or a genomic sequence
contaminant)
that happens to contain an Alu repeat and thus happens to
match
other reads in the data set; or it may represent the only read
of a
particular alternatively spliced form; or it may have data
anomalies
of some sort (chimeras, etc.). Such a
read would end up in
a
contig all of its own.
67)
VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS
If you
have a chromatogram, you can use consed to view it, even if it
hasn't
been assembled into the ace file. This
is common with cDNA
assemblies
in which the reads don't overlap and thus phrap doesn't put
them
together into a contig.
To do
this, make the same edit_dir, phd_dir,
and
chromat_dir as above, put the chromatogram into chromat_dir, run
phred
on it to generate the phd file which goes into phd_dir.
Then go
to edit_dir and run:
phd2Ace.perl
(name of phd file)
For
example, if your phd file is myRead.phd.1
from
edit_dir, type:
phd2Ace.perl
myRead.phd.1
This
will produce myRead.ace
Then
just start consed normally:
consed
-ace myRead.ace
and you
can view the chromatogram.
MULTIPLE
TRACE POPUP
68)
Bring up dataset standard. In the
Aligned Reads window, scroll to
a
region that has many reads and that has some discrepancies--try
position
1162. Hold down the shift key, and
click with the middle
mouse
button on the consensus. At this
location 3 traces will
popup--these
are the 2 highest quality traces that agree with the
consensus
(on each strand) and the highest quality trace that
disagrees
with the consensus. This feature is
useful in areas of high
coverage
when you want to rapidly examine just the most significant
traces
rather than looking at all of them.
MAXIMUM
NUMBER OF TRACES DISPLAYED
69) Bring
up dataset standard. Scroll to position
1162. Bring up 4
reads
and then try bringing up additional reads.You will notice that
new
reads are put at the top of the stack of traces and, once there
are 4
traces displayed, traces are automatically removed from the
bottom
of the stack. If you want to change
this maximum number of
traces
to something besides 4, you can do that: In the Main Consed
Window
(click on 'Find Main Win' on the Aligned Reads window), pull
down
the 'Options' menu, and release on 'General Preferences'. Try
changing
the 'Max Number of Traces Shown' to 3.
Then click 'Apply and
Dismiss'. Now dismiss the Trace Window and again start
adding
additional
traces to the Trace Window. You will
notice that now the
number
of traces shown will not exceed 3.
HOTKEYS
FOR EDITING
70) If you do a lot of editing, you will want to
have a faster method
of
doing these edits than having the popup and selecting an option.
Thus
the following hot keys exist:
< and > (less than and greater than)
to make n's to the left
and the right (respectively) of the
cursor
control-l and control-r to make low
quality to the left and
the right (respectively) of the cursor
overstriking with a capital letter (e.g.,
C instead of c) causes
the base to become high quality rather
than low quality
overstriking with a lower case letter
causes the base to become
low quality
Give
these a try.
71) Now
go to the menu labelled 'color', and pulldown and release on
'color
means match'.
Now you
notice different colors: The
colors
have the following meaning:
Blue:
agrees with consensus
Orange: disagrees with consensus
Yellow: this stretch of this read was used
to form the consensus
Grey:
Low quality or unaligned ends of reads
Now go
back to the colormode 'color means quality and tags' (the
default)
for the next exercise.
(The
other colormodes will mean more to you later.)
ALPHABETICAL
ORDERING OF READS
72) The reads can be ordered in two ways:
a) alphabetically
b) first all the top strand reads and then
all the bottom
strand reads. The top strand reads are then ordered
by the left end of the reads. Same with the bottom
strand reads.
Try changing
between a) and b). In the Main Consed
Window (click on
'Find
Main Win' on the Aligned Reads Window if you can't find the Main
Consed
Window because it is covered up with other windows), pull down
the
'Options' menu, and release on 'General Preferences'. Find
'Display
reads sorted alphabetically or by strand/left end of read.'
Switch
it between 'alpha' and 'strand'. Then
click 'Apply and
Dismiss'. Notice the effect in the Aligned Reads
Window. Many
polymorphism
and mutation detection labs find that alphabetically
sorting
is most useful, while many genomic sequencing labs find that
sorting
by strand/left end of read is most useful.
SCROLLING
TRACES INDEPENDENTLY
73)
Dismiss all of your Trace Windows. Then
popup traces for 2
different
reads in approximately the same location.
Scroll one of
them. You may want to scroll by clicking the
arrows or clicking to
the
left or right of the thumb. You will
notice that both will
scroll. Consed will do its best to have
corresponding peak lined up.
(Consed
can't line all of them up because the peak spacing is not
uniform
and differs from read to read.) Try
removing a trace by
clicking
on one of the 'Remove' buttons in the Trace Window. Try
adding
other traces. Then click on 'No' for scrolling
the traces
together
and try scrolling. You will now observe
that they scroll
separately.
ABI
BASE CALLS
74) If
you want to see the ABI base calls, no problem. Just go to the
Main
Consed Window. Pull down the 'Options'
menu and release on
'General
Preferences'. Click on 'True' for 'Show
ABI Bases in Trace
Window'
and then click 'OK' at the bottom of the window. The ABI
bases
will not be shown immediately--you must first dismiss the trace
window
and bring it up again. You will then
see an additional line
with
the ABI base calls.
MEASURING
ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION
75) Some contigs have long tails of low quality
bases and you would
like to
find out the error rate for the contig without that long
tail. On the Align Reads Window, pull down the
Misc menu, and release
on
'Show Errors for a Region'. This will
tell you both the error rate
for the
region and the number of single subclone bases for that region.
------------------------------------------------------------------------
INSTALLING
CONSED
Consed
used to use .Xdefaults for consed parameters--no longer. Now
consed
uses ~/.consedrc for most of the same parameters. Thus you
should
remove consed parameters from .Xdefaults and put them in
.consedrc
in your home directory.
Before,
when you made a typo with one of the consed parameters, it was
just
silently ignored. Now consed makes a
big fuss. So you need to
be
prepared to find out all of the parameters that have not been
working
all this time.
To
start with, type:
cd ~
touch
.consedrc
That
will create a new empty consed parameter directory. You can add
lines
to it as you need to customize consed.
Although most consed
parameters
now go into .consedrc, there are still a very few that need
to stay
in .Xdefaults. Here is the rule: if the parameter starts
with
consed.
such as
consed.gunzipFullPath:
/bin/uncompress
then it
goes into .consedrc
If the
parameter starts with
consed*
such as
consed*contigwin.background:
Black
then it
goes in .Xdefaults
You can
also make such customizations system-wide (for everyone) or
for
just a specific project. See CONSED
CUSTOMIZATION (below) for
more
information.
76)
Follow the first few steps of USING CONSED GRAPHICALLY of the
Quick Tour (above). If you have problems, it may be due to your X
emulator.
See 'MONITORS FOR CONSED' below.
77) The
default locations for most of consed, phred, and phrap require
that
there be a directory /usr/local/genome
I
strongly suggest you make such a location--it will save you many
headaches
of trying to customize scripts for other locations. If you
can't
actually use /usr/local/genome, then you could make
/usr/local/genome
be a link to the real location--that will work just
as
well.
78) Make sure that /usr/local/genome/bin is in
every consed users' PATH.
79) Put
the consed executable in /usr/local/genome/bin
80) Check this by logging on as a user and
typing:
consed
-V
You
should see 'Version 9.0'. If you see
something else, you have
some
debugging to do.
81) Build phd2fasta:
Go to
the misc/phd2fasta directory and type 'make'
Move
the phd2fasta executable to /usr/local/genome/bin
82) Build mktrace:
Got to
the misc/mktrace/980701 directory and type 'make'
Move
the mktrace executable to /usr/local/genome/bin
83) Move all perl scripts from the scripts
directory to
/usr/local/genome/bin
Make
sure all are executable (chmod a+x *)
DELETE
ANY PREVIOUS VERSIONS OF THESE SCRIPTS OR YOU WILL BE SORRY!
(Bugs
have been fixed.)
84) Get perl 5.
You can check where to get perl via the perl web
site:
http://www.perl.com/perl/info/software.html
(If you
don't know about perl, try it--it will save you a
huge
amount of time over developing the same utilities in C, awk, or
csh or
sh.)
85) From the misc subdirectory, copy
primerCloneScreen.seq and
primerSubcloneScreen.seq
to the directory
/usr/local/genome/lib/screenLibs
(You
may have to create this directory.)
Take a
look at these files. They are dummy
files indicating the fasta
format
of the sequences that should be put in them.
You should put
into
primerCloneScreen.seq the vector sequence of the cloning vectors
you are
using (BAC or cosmid) and into primerSubcloneScreen.seq the
sequencing
vectors you are using (plasmid, M13, etc).
Don't be too
generous
in putting lots of vectors into the files!
The larger they
are,
the slower primer picking will be. Our
files are only this big:
-rw-r--r-- 1 root
root 29938 Nov 7
1997 primerCloneScreen.seq
-rw-r--r-- 1 root
root 7381 Aug 13 1997 primerSubcloneScreen.seq
and
primer picking is quite fast enough.
Now
that you have set this up, you should try the PRIMER PICKING
sections
(above) in the Quick Tour to make sure this works. Note that
you
should *not* do the temporary step in the beginning of PRIMER
PICKING. That is because you want the primers
screened against vector.
86) You should also create a file
/usr/local/genome/lib/screenLibs/vector.seq
This
contains all the vector that you want to mask out before
phrapping. In general, it is the combination of
primerCloneScreen.seq
and
primerSubcloneScreen.seq
87) You should also create a file
/usr/local/genome/lib/screenLibs/repeats.fasta
In this
file, put any repeats that you want to have automatically
tagged. These typically are ALU sequences. If you don't want to tag
anything,
then comment out (put '#' as the first character of the
line)
the following lines in phredPhrap:
Change:
!system(
\"$tagRepeats $szAceFileToBeProduced\" )
|| die \"some problem running
$tagRepeats\";
to:
#!system(
\"$tagRepeats $szAceFileToBeProduced\" )
# || die \"some problem running $tagRepeats\";
88)
determineReadTypes.perl
Phrap,
Consed's primer picking, and Consed/Autofinish all need the
following
information for each read:
is it a univeral primer forward, a
universal primer reverse,
or a walking read?
what is its template name?
Generally
this information can be determined from the read name, using
*your*
naming convention. Modify the perl
script
determineReadTypes.perl
to put this information at the end of the phd file
using
WR info items.
Consed
allows you to check that you have correctly modified
determineReadTypes.perl: On the Main Consed Window, point to 'Info',
hold
down the left mouse button, and release on 'Show Info for Each
Read'. Check that the information presented is
correct. If, for
example,
consed thinks that there are templates that have 9 or more
reads,
it is likely that you have not correctly customized
determineReadTypes.perl
Once
you have correctly customized determineReadTypes.perl, then
uncomment
the line in phredPhrap which calls determineReadTypes.perl
TEST
RUNNING PHREDPHRAP
89) See
the section RUNNING PHRED and PHRAP (in the Quick Tour)
TESTING
ADDING NEW READS
90) It will make your life easier if phred,
phrap, and crossmatch are
all
where consed expects them: in
/usr/local/genome/bin
91) Decide where to put phred's parameter file
and edit both
addReads2Consed.perl
and phredPhrap to reflect this location.
I
generally
prefer to put it in /usr/local/genome/lib to keep all of the
phred/phrap/consed
files in one place. Alternatively, you
could put
it in
/usr/local/etc/PhredPar/phredpar.dat which is the historical
location
of this file.
92)
Next you should test the ADDING NEW READS step in the Quick Tour
(above). This step requires that everything be set up
correctly and
in the
correct location. Hopefully the error
messages are clear
enough
to help you if you have set up anything incorrectly.
USING
YOUR OWN DATA
93) Create the following directory structure:
Directory
structure:
top level directory (generally named after
the BAC or cosmid)
subdirectory
'chromat_dir'--chromatograms go in here
subdirectory 'phd_dir'--phd files will
automatically be put here
subdirectory 'edit_dir'--ace files
will automatically be put here
If you
already have your chromatograms somewhere else, you can make
chromat_dir
be a link to wherever you have them.
The
various phrap and crossmatch files will be put into edit_dir by
the phredPhrap
script.
94) cd to the edit_dir directory, and type:
phredPhrap
If you
are successful, the script will tell you so and you can bring
up
consed on the ace file:
95) Type:
consed
You
should see a file with the extension .ace.1
Double
click on it.
You
should see a list of contigs.
Double
click on the one you want to see.
Follow
the first few steps of the Quick Tour under USING CONSED
GRAPHICALLY
above. You should at least go as far as
viewing traces.
96)
Appending expid to the phd files
If you
are using autofinish, and would like autofinish to tell you how
well
your reads are succeeding, then the phd files must be appended
with
the experiment id's. In the 3
autofinish summary files
(*.univReverse,
*.univForwards, and *.customPrimers), you will see
information
like this:
univ
rev,,,->,-329,-249,71,Contig1,3,djs228_1034
or
this:
tgaagaaatggctgactcc,56,1,->,3258,3338,3658,Contig1,4,djs228_2813,5,djs228_168,6,djs228_1248
The '3'
just before the djs228_1034 is an experiment id. There is
also an
expid '4' just before djs228_2813, an expid '5' before
djs228_168,
and an expid '6' just before djs228_1248.
Autofinish
doesn't know what you will end up calling these reads it is
telling
you to make. Autofinish only knows
those reads by the numbers
3, 4,
5, and 6. So when you make the reads,
autofinish needs to be
informed
that this is 'experiment 3' or whatever.
You do this by
appending
in the phd file the following structure:
WR{
expid
addExpid 990811:140818
5
}
where
WR stands for 'whole read item',
expid for 'expid'
addExpid is the name of the program that
you will write that
will append this information
990811:140818 is the date and time in
format YYMMDD:HHMISS
5 is the expid
This
program must be run *after* phred runs to create the phd files.
Thus
your program must have some method of determining what the expid
of each
read is. What the University of
Washington Genome Center does
is to
have the finishers put the expid as part of the filename. This
makes
it easy for a program to look at the phd file and figure out
what
the expid is and then write the WR item into that phd file.
Alternatively,
you could keep a database and, after the phd file is
created,
look into the database to see what the expid is.
When
you have successfully added expid's to the phd files, the next
time
you run autofinish on this project, it will have in the
'EVALUATE'
section of the autofinish output file, lots of interesting
information
about how well the reads succeeded.
USING
NON-STANDARD LOCATIONS FOR FILES
You
have a lot of work to do. You will need
to edit nearly every
script
mentioned above. In addition, you will
need to make sure that
the
CONSED_PARAMETERS environment variable is set for every user and
that
the CONSED_PARAMETERS file points to the new locations for these files:
consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening:
/usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq
consed.primersCloneFullPathnameOfFileOfSequencesForScreening:
/usr/local/genome/lib/screenLibs/primerCloneScreen.seq
consed.primersBadTemplatesFile:
badTemplates.txt
consed.fullPathnameOfAddReads2ConsedScript:
/usr/local/genome/bin/addReads2Consed.perl
consed.fullPathnameOfCrossMatch:
/usr/local/genome/bin/cross_match
consed.fullPathnameOfPhred:
/usr/local/genome/bin/phred
As you
can see, sticking with the default of /usr/local/genome will
make
your life easier--not just at installation, but even in day to
day
operations.
(Remember--/usr/local/genome could be just a link)
--------------------------------------------------------------------------
NOTE TO
SGI USERS
In
/usr/lib, there must be a file: libCsup.so
If you
don't have this file, you must get it from SGI. To get it, if
you are
on Irix 6.2 through 6.4, request:
SG0001637
'C++ Exception handling patch for 7.00 (and above) compilers
on irix
6.2' (it's on the 'Development Options 7.1' CD).
If you
are on Irix 5.3, install patch 1600
To make
things easier for you, I've included my libCsup.so
This
might save you having to get the patches above.
----------------------------------------------------------------------------
FOR
PROGRAMMERS AND FELLOW TRAVELLERS ONLY
CONSED
VERSION
On the
command line, type:
consed
-v
This is
particularly useful to system administrators to make sure the
latest
version is installed on all computers.
CONSED
CUSTOMIZATION
Click
on the 'Info' menu on the Main Consed Window and release on menu
item
'Show Consed Resources'. This shows you
what is available to be
changed
by putting in your ~/.consedrc file.
Changes
in ~/.consedrc only affect one user. If
you want to make a
change
to affect all consed users on the system, put a file in some
central
location (e.g., /usr/local/genome/lib/.consedrc ) and then
have
every user set the environment variable CONSED_PARAMETERS to
that
location:
setenv
CONSED_PARAMETERS /usr/local/genome/bli/.consedrc
Anything
the user puts in ~/.consedrc will override whatever is in the
CONSED_PARAMETERS
file.
You can
also have different parameters for different projects. Put a
.consedrc
file in the edit_dir of a particular project.
When you are
working
on that project, whatever is in that .consedrc will override
whatever
is in your ~/.consedrc file or the
CONSED_PARAMETERS file.
COMPRESSING
CHROMATOGRAMS
If you
are interested in compressing your chromatogram files, go into
chromat_dir
and gzip one of the chromatogram files.
Make sure that
gunzip
is in /usr/local/bin (You can change
this location via the
consed
resource
consed.gunzipFullPath:
/usr/local/bin/gunzip
--see
CONSED CUSTOMIZATION (above), but it will be easiest for
you and
your users if you just put gunzip in /usr/local/bin and not
have to
bother with consed resources.)
Restart
consed and bring up the corresponding trace.
You will notice
no
appreciable delay.
CONSED
-ACE
Try
bringing up consed like this:
consed
-ace (name of ace file)
This
can be useful if you are going to have consed brought up from
some
other program.
NO PHD
FILES
Try
bring up consed like this:
consed
-nophd
This
mode does not allow editing and does not show quality
information. It allows you to view an assembly when you
don't have
phd
files or chromatograms but you only have the ace file. You will
not be
able to see the quality information, since that information is
kept in
the phd files. I do not recommend nor
support this option!
CREATING
CUSTOM TAG TYPES
The
following consed resources are available for creating custom tag
types:
consed.tagColorCustomTag1:
consed.tagColorCustomTag2:
consed.tagColorCustomTag3:
consed.tagColorCustomTag4:
consed.tagColorCustomTag5:
consed.tagColorCustomTag6:
consed.tagColorCustomTag7:
consed.tagColorCustomTag8:
consed.tagColorCustomTag9:
consed.tagColorCustomTag10:
consed.tagColorCustomTag11:
consed.tagColorCustomTag12:
consed.tagColorCustomTag13:
consed.tagColorCustomTag14:
consed.tagColorCustomTag15:
consed.customTag1:
consed.customTag2:
consed.customTag3:
consed.customTag4:
consed.customTag5:
consed.customTag6:
consed.customTag7:
consed.customTag8:
consed.customTag9:
consed.customTag10:
consed.customTag11:
consed.customTag12:
consed.customTag13:
consed.customTag14:
consed.customTag15:
consed.tagColorCustomConsensusTag1:
consed.tagColorCustomConsensusTag2:
consed.tagColorCustomConsensusTag3:
consed.tagColorCustomConsensusTag4:
consed.tagColorCustomConsensusTag5:
consed.tagColorCustomConsensusTag6:
consed.tagColorCustomConsensusTag7:
consed.tagColorCustomConsensusTag8:
consed.tagColorCustomConsensusTag9:
consed.tagColorCustomConsensusTag10:
consed.tagColorCustomConsensusTag11:
consed.tagColorCustomConsensusTag12:
consed.tagColorCustomConsensusTag13:
consed.tagColorCustomConsensusTag14:
consed.tagColorCustomConsensusTag15:
consed.customConsensusTag1:
consed.customConsensusTag2:
consed.customConsensusTag3:
consed.customConsensusTag4:
consed.customConsensusTag5:
consed.customConsensusTag6:
consed.customConsensusTag7:
consed.customConsensusTag8:
consed.customConsensusTag9:
consed.customConsensusTag10:
consed.customConsensusTag11:
consed.customConsensusTag12:
consed.customConsensusTag13:
consed.customConsensusTag14:
consed.customConsensusTag15:
When
you create a custom tag type, you specify its name and the color
you
want it displayed in.
For
example:
consed.tagColorCustomTag1:
SlateBlue2
consed.tagColorCustomTag2:
SlateBlue2
consed.tagColorCustomTag3:
SlateBlue2
consed.tagColorCustomTag4:
brown
consed.tagColorCustomTag5:
MediumPurple
consed.tagColorCustomTag6:
purple
consed.customTag1:
polymorphismInsertion
consed.customTag2:
polymorphismDeletion
consed.customTag3:
polymorphismSubstitution
consed.customTag4:
qualityCoreComment
consed.customTag5:
coordinatorApproval
consed.customTag6:
coordinatorComment
(All of
these tag types are read tag types.
Consensus tag types are
specified
separately--see the consed resource names (above).)
Once
you have done this, the user of consed can add tags of these
types
in the method described in TAGS of the Quick Tour (above).
ADDING
TAGS FROM OTHER PROGRAMS
You can
also write external programs that add tags to the ace file
and/or
the phd files. Both RT (read) and CT
(consensus) tags can be
appended
to the end of the ace file. BEGIN_TAG
tags can be appended
to the
end of the phd files. Do not rewrite
the ace file or the phd
file--there
is no need to do so and it will cause problems.
CONTROL
OF CONSED FROM SOME OTHER PROGRAM
Consed
can be controlled by some other program.
For example, you
might
have a program that displays mapping data and you would like the
user to
be able to click on a location and have consed come up showing
the
bases in that region. This feature
allows a programmer to do
this.
The
external program can start up consed as follows:
consed
-socket (local port number) -ace (ace filename)
For
example,
consed
-socket 5432 -ace standard.fasta.screen.ace
After
consed completes coming up (including you clicking whether you
want to
apply edits), you will see the message in the xterm:
success
bind to local port number: 5432
And
then you will see a file created by consed in the default
directory
called consedSocketLocalPortNumber
This
gives the port number of the Berkeley socket that consed has
opened
and is listening on. Thus your program
can read this file and
create
a connection to the Berkeley socket created by consed.
Once
the connection is established, your program can send commands to
consed
at that socket indicating to consed which contig to display and
what
consensus position to scroll to.
Currently, the only acceptable
commands
are:
Scroll
(contigname) (consensus position)<return>
PopupTraces
(read name) (unpadded read position in the direction of
sequencing)<return>
'Unpadded
read position in the direction of sequencing' is the
position
from the right end, if the read is a bottom strand read.
Just
send such a command to the Berkeley socket, and consed will
respond
appropriately.
AUTOMATIC
ORDERING OF OLIGOS
I heard
of a finisher who manually ordered 72 oligos.
She had to
cut/paste
the bases of each oligo. That is not
only painful, but also
error
prone. I've supplied you a script that
you can use to
automatically
determine which oligos have been newly requested since
the
last order, aggregate them into a single order, and email the
request
off.
The
script is ace2Oligos.perl. It takes as
parameters the name of an
ace
file and the name of the oligo file.
The oligo file is a list of
oligos
that have been ordered for that particular project, and looks
like
this:
name=G1980A181.1
sequence=ctgcatggctaggga
template=seq
from subclone
date=980427
temp=52
name=G1980A181.2
sequence=tcttactttctgactttcattt
template=seq
from clone
date=980427
temp=50
ace2Oligos.perl
finds all oligo tags in the ace file and makes sure
that
all of them are in this oligo file.
To
automatically order oligos each night, there is an additional
script
you will have to write. I suggest that
you run your script
each
night under cron and that it do the following:
for
each project, it will look for the most recent ace file. It will
run
ace2Oligos.perl on that ace file and direct the oligo file to be
in the
parent directory of edit_dir, phd_dir, and chromat_dir for that
project. Thus there will be one oligos file for each
project. Your
script
will run ace2Oligos.perl once for each project.
Then
your script would, for each project, look in the oligos file for
new
oligos, and aggregate the unordered oligos into a central file,
which
it would email to the oligo company. If
it finds any new oligos
in an
oligo file, it draws a line at the bottom:
-------------------------------
which
indicates that all oligos have been ordered.
When this script
looks
at this file the next night, it uses this line to determine
whether
any additional oligos have been requested since the previous
order. (The idea of this line came from St
Louis.) Thus the oligos
file
tells you which oligos have been ordered and which have not yet
been
ordered.
97) CUSTOM NAVIGATION
In the
Main Window, there is also a Navigate menu.
Pull it down and
release
on the Custom Navigation menu item. A
box will popup saying
'Select
custom navigation file:'
There
will be a file:
custom_navigation.nav
Double
click on it.
You will
see the now-familiar custom navigation box.
Click 'Next'
repeatedly
until you get to the end of the list.
Consed
doesn't write such a file--it just reads it.
This feature
allows
you the ability to write your own programs that select
locations
that you want your finishers to examine.
Your program
writes
a file, the user reads that file into consed in this manner,
and you
can go to each of the locations.
98) LONG, LONG, LONG READ NAMES
If you
have very long read names, you might not be able to see the
whole
name in the Aligned Reads Window. You
can solve this by
increasing
the consed resource:
consed.alignedReadsWindowMaxCharsForReadNames:
20
--------------------------------------------------------------------------
MONITORS
AND MICE FOR CONSED
If your
monitor is part of a Unix computer (a Sun, an HP, a DEC, an
SGI, or
a Linux box) or is an Xterminal, then you will have absolutely
no
problems.
You
must have 3 button mouse or 3 button emulation. 3 Button
emulation
is tricky since consed uses all 3 buttons of the mouse and
it also
uses Control-Middle-Mouse-button, Shift-Middle-Mouse-Button
and
Control-Right-Mouse-Button. So if you
are going to try to just
use a 2
button mouse (or, God-forbid, a 1 button mouse), you should
make
sure that you can emulate each of those.
Often, if you push the
left
and right mouse buttons at the same time, your X server will
interpret
that to be the middle mouse button. But
you must consult
your X
emulator or X server to know what it will do--that is out of
consed's
control.
If your
monitor is a PC running Windows or NT, then you must have an X
emulator
installed and running. X emulators
include: Exceed, XWin32,
Reflection
X, and OpenNT. Any of these will work
if configured
correctly
(and the 'correctly' is the key). I
encourage you to use
single
window mode and then use a Unix window manager such as CDE,
fvwm,
or mwm.
If your
monitor is a MAC, then you must also have an X emulator, such
as
Exodus or MACX installed and running.
You *must* use this emulator
in
single window mode, and then use a Unix window manager such as CDE,
fvwm,
or mwm. (If you don't use single window
mode, consed might
crash
in some circumstances.)
--------------------------------------------------------------------------
PRIMER
PICKING PARAMETERS
The
following are primer picking resources.
Many are used for both
consed
and autofinish. There are some that
are just used for
autofinish
and some that are just used for consed.
A great
deal of science and experimentation has gone into
setting
these defaults and I suggest you do not change them until you
have
experimented and know what you are doing.
You can
set these via the .consedrc file.
In
addition, for a particular consed session, you interactively change
many of
these in the following manner: On the main window, point to
'Options',
hold down the left mouse button and release on 'Primer
Picking
Preferences.' You can modify the
resource of interest and
then
click on 'Apply and Dismiss'. The new
value of the resource will
be in
affect only until you restart consed.
In the
following, I have annotated the parameters with the following
symbols:
(YES) freely customize to your own site
(OK) don't change unless you have a specific need
and know what you
are doing
(NO) don't change this!
This is
what they mean (I suggest you skip over this for now):
consed.primersAssumeTemplatesAreDoubleStrandedUnlessSpecified:
false
bool
! you
can put the template type in the phd file in a WR template item
!
consed will have a list of these and know which are single and
!
double stranded
(YES)
consed.primersLookThisFarForForwardVectorInsertJunction:
125
int
! don't
change this--if no X's this far from beginning of read, then
!
assume that you are in insert
(NO)
consed.primersMinimumLengthOfAPrimer:
15
int
(YES)
consed.primersMaximumLengthOfAPrimer:
25
int
(YES)
consed.primersALittleLessThanAverageInsertSizeOfASubclone:
1500
int
// for
finding templates
! used
to calculate extent of a template for choosing templates
(YES)
consed.primersMaxInsertSizeOfASubclone:
3000
int
// for
checking for false-annealing
! check
+/- this distance from the primer for false-annealing
(YES)
consed.primersDNAConcentrationNanomolar:
50.0
double
! used
for melting temperature--don't change this!
Consed
uses the nearest-neighbor (with salt concentration
correction)
formula, just as all modern primer picking
programs
do
(NO)
consed.primersMaxMatchElsewhereScore:
17
int
! used
for testing false-annealing to template and to vector
In
choosing a primer, it is important that the primer not
stick
somewhere besides the place you are trying to get a
read--a
'false match'. This can cause a primer
to fail even
if the
false match is not perfect. The worst
kind of false
matches
are those the extend to the 3' end of the primer, and
worse
yet if they have a high percentage of G/C matches since
G and C
bind more tightly than A and T. The
algorithm used
here
takes both of these effects into account.
This parameter
sets
the max acceptable false match.
In
practice, it is this parameter that eliminates most
primers. You can get consed to give you some primers
by
raising
this parameter, but if you do, you should be aware of
the
danger of mispriming. To make you aware
of that danger,
you can
do this: when you choose a primer (see above for how
to do
this), look in the xterm. It will show
you the best
alignment
of each primer with some other location in the
assembly. By looking at this you will gain an idea of
what
the
PrimersMaxMatchElsewhereScore means and you won't be too
free
about raising it above the default.
(OK)
consed.primersMaxMeltingTemp:
55
int
(YES)
consed.primersMaxSelfMatchScore:
6
int
!
cutoff for self-annealing of a primer
In
choosing a primer, you don't want the primer to bind to
itself
(form a hairpin) or bind to another copy of itself. It
is
particularly bad if it binds to another copy at its 3' end.
This
parameter is used in the algorithm that tests this.
(OK)
consed.primersMinMeltingTemp:
50
int
(YES)
consed.primersMinQuality:
30
int
! you
must be sure of the sequence of a primer or it won't anneal to where you want
Some
primers fail because the primers don't match where they
are
supposed to. This is because the
sequence where the
primer
is supposed to stick isn't accurately known.
Thus it
is
important to be certain of the sequence where the primer is
chosen
from. This parameter is an indication
of this
certainty--it
is the min quality of every base in an
acceptable
primer.
(NO)
consed.primersNumberOfBasesToBackUpToStartLooking:
50
int
Consed
is designed for you to put the cursor on the left-most (or
right-most)
edge of a region that you want to cover with a new read.
Since
the data quality immediately after an oligo is not good, you
don't
want the oligo immediately next to the region you want to cover,
but
rather a little bit back from it. This
parameter gives how far
back. e.g., if this is 50 and you want a read at
position 1000,
primers
will be searched before base 950 but not in the region 950 to
1000
This
parameter is not used for autofinish--just for consed.
(OK)
consed.primersNumberOfTemplatesToDisplayInFront:
2
int
! this
shows the number of templates to show in the interactive primer picking window
(OK)
consed.primersPickTemplatesForPrimers:
false
bool
! when
picking primers for subclone templates, pick templates also.
! If
there is no suitable template for a primer, do not pick the
!
primer. If you like to pick your own
templates, you might want to
! turn
this off for a little improvement in speed.
(YES)
consed.primersPrintInfoOnRejectedTemplates:
true
bool
!
whether to print which templates were rejected and why (this output can be
large )
(OK)
consed.primersSaltConcentrationMillimolar:
50.0
double
! used
for melting temperature--don't change this!
(NO)
consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening:
/usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq
RWCString
!
vector sequence file if choosing subclone (e.g., M13, plastmid) templates
(OK)
consed.primersCloneFullPathnameOfFileOfSequencesForScreening:
/usr/local/genome/lib/screenLibs/primerCloneScreen.seq
RWCString
!
vector sequence file if choosing clone (e.g., cosmid, BAC) template
(OK)
consed.primersScreenForVector:
true
bool
!
whether or not to screen primers for annealing to vector
It is
important that the primers not stick to the vector of the
template. Thus you must provide consed with two
files--a file in
fasta
format of all subclone vectors, and a file in fasta format of
all
clone vectors. Consed will not accept
any primer that has a match
against
the appropriate one of these vectors (depending on whether you
are
choosing primers for clone template or from subclone template). A
primer
that has a false match to a vector is rejected if that false
match
has a score worse than PrimersMaxMatchElsewhereScore
(OK)
consed.primersMaxLengthOfMononucleotideRepeat:
4
int
Finishers
have seen that primers with mononucleotide repeats fail more
often. This parameter says that a primer with AAAA
is acceptable but
AAAAA
is not.
(OK)
consed.primersBadTemplatesFile:
badTemplates.txt
FileName
! file
of templates that you've tried, don't work, and you don't want to try again
(OK)
consed.primersToleranceForDifferentBeginningLocationOfUniversalPrimerReads:
100
int
!
different forward reads or different reverse reads
! can
differ by up to this amount in the starting location
! If
they differ by more, then there is something wrong
! with
the template (it is mislabeled?) so don't use it again for walking
(NO)
consed.primersTooManyVectorBasesInWalkingRead:
10
int
! if
there are this many x's, then don't walk again on this template
(OK)
consed.primersWhenChoosingATemplateMinPotentialReadLength:
500
int
! when
choosing templates for a custom primer, only choose a template
! if
the read can be chosen at least this long
This
currently can only be set via your .consedrc file. It is used in
picking
templates for a primer. Clearly you
don't want a template to
end too
soon after the primer. This parameter
indicates the minimum
number
of bases that a template must extend after the primer location.
(OK)
consed.primersWindowSizeInLooking:
450
int
This is
the width of the region in which consed looks for
primers. So if
PrimersNumberOfBasesToBackupToStartLooking is
50 and
PrimersWindowSizeInLooking is 450, and you are looking
for a
forward primer, then the consed will look from 500 bases
to the
left of the cursor up to 50 bases to the left of the
cursor. If you are looking for a reverse primer,
then consed
will
start looking 50 bases to the right of the cursor and
continue
until 500 bases to the right of the cursor.
(OK)
You can
also read about this in the consed paper:
Gordon,
D., C. Abajian, and P. Green. 1998. Consed: A graphical tool
for
sequence finishing. Genome Research. 8:195-202
--------------------------------------------------------------------------
AUTOFINISH
PARAMETERS
Autofinish
uses many of the primer picking parameters.
Autofinish
also
has additional parameters.
In the
following, I have annotated the parameters with the following
symbols:
(YES) freely customize to your own site
(OK) don't change unless you have a specific need
and know what you
are doing
(NO) don't change this!
bool
means the value must be true or false
int
means the value must be an integer
double
means the value must be a decimal number
consed.autoFinishAllowWholeCloneReads:
true
bool
A
'whole clone read' as opposed to a 'subclone read' is when the
sequencing
template for the sequencing reaction is the entire
assembly. If you are assembling a BAC, a whole clone
read is one
that is
sequenced directly off the BAC. If the
assembly is a full
length
cDNA, then a whole clone read is one in which the sequencing
reaction
is off a complete cDNA.
This
resource tells autofinish that it is ok to suggest whole clone reads.
entire
clone (BAC or cosmid). If you don't
want to use the whole BAC as a
template
for any reads, change to false.
(YES)
consed.autoFinishAverageInsertSize:
1500
int
used
for calling reverses. This determines
the location of the potential
reverse
read. If you have a forward read
already, autofinish uses this
number
as an estimate of how far away the beginning of the reverse read
should
end up.
(YES)
consed.autoFinishCallReversesToFlankGaps:
true
bool
If
there is a forward-reverse pair flanking a gap, print it out
If
there is not, suggest reverses to flank the gap. Useful to help align
and
orient the contigs.
(YES)
consed.autoFinishCallHowManyReversesToFlankGaps:
2
int
How
many forward/reverse pairs must flank a gap.
If there are fewer than
this
number, autofinish will try to suggest more reverses to do. If there
are
already this number or more forward/reverse pairs, it will list them
and not
suggest any more.
(YES)
consed.autoFinishCloseGaps:
true
bool
This
allows you to turn off choosing reads to close gaps. For example, if
you
choose to close all gaps by PCRs using manually-picked primers, you
should
change this to false.
(YES)
Cost
Parameters:
consed.autoFinishCostOfResequencingUniversalPrimerSubcloneReaction:
20.0
consed.autoFinishCostOfCustomPrimerSubcloneReaction:
60.0
consed.autoFinishCostOfCustomPrimerCloneReaction:
80.0
consed.autoFinishCostOfDeNovoUniversalPrimerSubcloneReaction:
60.0
double
(YES)
Compares
universal primer subclone resequencing reaction, universal
primer
subclone denovo reaction (a reverse where you just have a
forward
or a forward where you just have a reverse), custom primer
subclone
reaction, custom primer clone reaction, and to decide which
to
favor. These parameters give you
control over which type of
reactions
autofinish prefers when it has a choice.
The
default costs have been chosen by Seattle and St Louis. They
reflect
the fact that ordering an oligo is more expensive than using a
universal
primer. They also reflect the fact that
whole clone
reactions
(sequencing off the BAC) are more difficult to do than
subclone
reactions (sequencing off the plasmid).
consed.autoFinishCoverSingleSubcloneRegions:
true
bool
! this
allows you to turn off choosing reads to cover single subclone regions
(YES)
consed.autoFinishCoverLowConsensusQualityRegions:
true
bool
! this
allows you to turn off choosing reads to cover low consensus quality regions
(YES)
consed.autoFinishCreateExpSummaryFiles:
true
bool
! this
allow you to turn off creating the 3 experiment summary files: forward
universal primer, reverse universal primer, and custom primers
(OK)
consed.autoFinishDoNotFinishWhereTheseTagsAre:
doNotFinish
RWCString
list of
tag types separated by spaces. E.g.,
doNotFinish
repeat
tells
autofinish that you are not interested in finishing in this region
(OK)
consed.autoFinishDumpTemplates:
false
bool
! for
debugging, this allows you to dump all information about the templates--insert
locations
(OK)
consed.autoFinishExcludeContigIfOnlyThisManyReadsOrLess:
2
int
!
exclude contigs that are probably E. coli contamination
(OK)
consed.autoFinishExcludeContigIfDepthOfCoverageOutOfLine:
true
bool
(OK)
consed.autoFinishExcludeContigIfDepthOfCoverageThisMuchMoreThanLargestContig:
2.0
double
!
exclude contig if its depth of coverage is much greater than other
!
contigs (this indicates contamination)
(NO)
consed.autoFinishHowManyTemplatesYouIntendToUseForCustomPrimerSubcloneReactions:
2
int
! this
tells autofinish which templates you are planning on using which is necessary
to figure out which regions will still be single subclone regions
(YES)
consed.autoFinishMaxAcceptableErrorsPerMegabase:
100
int
!
target error rate
(YES)
consed.autoFinishMinNumberOfErrorsFixedByAnExp:
0.1
double
! if an
experiment solves fewer errors than this, it isn't worth doing so won't be
chosen, even if the target error rate has not yet been achieved
(OK)
consed.autoFinishMinNumberOfForwardReversePairsToCalculateAverageInsertSize:
100
int
! if
there are fewer forward/reverse pairs than this, then the parameter
!
consed.autoFinishAverageInsertSize is used instead. These parameters are
! when
calling reverses to figure out where the reverse should go
(NO)
consed.autoFinishMinNumberOfGapErrorsFixedByAGapClosingExp:
30
int
(NO)
consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead:
50
int
! this
tells autofinish when it wants to make a new custom primer read, how far this
read must be from any previous custom primer reads on the same strand
(NO)
consed.autoFinishLookForRepeatedForwardUniversalPrimerReadThisFarAway:
200
int
! this
tells autofinish how far to look for the tag of a previously called universal
primer read
(NO)
consed.autoFinishNumberOfGapClosingReadsPerContigEnd:
3
int
! don't
make any more experiments than this to extend into a gap
(YES)
consed.autoFinishMinNumberOfSingleSubcloneBasesFixedByAnExp:
1
int
! if an
experiment will only fix less than this number of single subclone bases, don't
do it even if the total number of single subclone bases in the contig is too
high
(OK)
consed.autoFinishNumberOfBasesBetweenContigsAssumed:
200
int
! gap
size--each base in the gap counts as 1 error so autofinish tries to extend into
gaps
(NO)
consed.autoFinishPotentialHighQualityPartOfReadStart:
80
int
This is
how far the high quality region of the read is from the
beginning
of the read.
(OK)
consed.autoFinishPotentialHighQualityPartOfReadEnd:
300
int
--------------------------------------------
^ ^
^
beginning A B
of read
<----------------->
consed.autoFinishPotentialHighQualityPartOfReadStart
<------------------------------->
consed.autoFinishPotentialHighQualityPartOfReadEnd
You can
adjust these depending on your assessment of the typical
quality
of your data.
(OK)
consed.autoFinishReversesForFlankingGapsTemplateMustProtrudeFromContigThisMuch:
100
int
! we
don't want these templates in which it goes into vector right at
! the
end of the template
(OK)
consed.autoFinishTagOligosWhenDoExperiments:
true
bool
! when
autofinish is run with -doExperiments, tags the oligos
! it
chooses
(OK)
consed.autoFinishTryHarderToSuggestExperimentsToCoverLowQualityRegions:
true
bool
!
consed tries to cover a low quality region with a read of a different strand
! or
chemistry from the existing reads covering that area. If it can't find
! any
read of a different strand or chemistry, should it suggest a read of
! the
same strand and chemistry as an existing read?
This parameter says \"yes\".
(OK)
----------------------------------------------------------------------------
NEW ACE
FILE FORMAT
There
is a new ace file format (since early 1998).
If you still
haven't
changed to the new ace file format, you must do so now since
it
contains information that is not contained in the old ace file
format. This additional information (e.g., the
alignment and quality
clipping
values) are essential for some of the consed functions (e.g.,
navigate
by single stranded, navigate by single subclone, autofinish)
to work
correctly.
Another
reason to switch to the new ace format is that you will get
faster
consed startup performance. The new ace
file format is also
much
smaller (about 60% as big as the old).
The new
phrap (Aug 1998 and better) writes the new ace format (using
the
-new_ace switch). Since consed now uses
the additional
information
found only in the new ace format, if you are editing an
assembly,
you should first re-phrap to take advantage of this
additional
information.
Consed
can read either old or new ace format.
Consed
can also write either new or old ace format.
It write the new
ace
format by default--see 'Options'/'General Preferences'. Also see
the
consed resource:
consed.writeThisAceFormat:
2
(where
2 means 'new' and 1 means 'old')
If you
have scripts that read the ace file, you will need to modify
those
scripts for the new ace format. Here is
the format:
Ace
File Format
Refer
to the accompanying sample_ace_file.txt (below)
AS
<number of contigs> <total number of reads in ace file>
CO
<contig name> <# of bases> <# of reads in contig> <# of
base segments in contig> <U or C>
The U
or C indicates whether the contig has been complemented from the
way
phrap originally created it. Thus this
is always U for an ace
file
created by phrap.
BQ
This
starts the list of base qualities for the unpadded consensus
bases. The contig is the one from the previous CO,
hence no name is
needed
here.
AF
<read name> <C or U> <padded start consensus position>
This
line replaces the 'AssembledFrom*' line in the previous ace file
format. C or U means complemented or
uncomplemented. The <read name>
is the
true read name (no .comp on it as with the previous ace file
format.)
BS
<padded start consensus position> <padded end consensus position>
<read name>
This
replaces the 'BaseSegment*' line from the previous ace file format.
RD
<read name> <# of padded bases> <# of whole read info items>
<# of read tags>
QA
<qual clipping start> <qual clipping end> <align clipping
start> <align clipping end>
This is
new information not found in the previous ace file. If the
entire
read is low quality, then <qual clipping start> and <qual
clipping
end> will both be -1. These
positions are offsets from the
left
end of the read (left, as shown in consed).
Hence for bottom
strand
reads, the offsets are from the end of the read. The offsets
are
1-based. That is, if the left-most base
is in the aligned,
high-quality
region, <qual clipping start> = 1 and <align clipping
start>
= 1 (not zero).
DS
CHROMAT_FILE: <name of chromat file> PHD_FILE: <name of phd file>
TIME: <date/time of the phd file> CHEM: <prim, term, unknown, etc>
DYE: <usually ET, big, etc> TEMPLATE: <template name> DIRECTION:
<fwd or rev>
There
can be additional information on this line.
This is
replaces the DESCRIPTION line from the old ace file.
The
following is for transient read tags (those generated by
crossmatch
and phrap). They are not fully
implemented, and the format
may
eventually change. The read is implied
by the location of the
whole
read info item within the ace file.
They are found after the WR
lines
for a read.
RT{
<read
name> <tag type> <what program created tag> <padded cons pos
start> <padded cons pos end> <date when tag was created in form
YYMMDD:HHMISS>
}
for
example:
RT{
djs14_680.s1
matchElsewhereLowQual phrap 904 933 990823:114356
}
There
are consensus tags now in the ace file.
All consensus tags have
the
following format:
CT{
<contig
name> <tag type> <what program created tag> <padded cons pos
start> <padded cons pos end> <date when tag was created in form
YYMMDD> <NoTrans>
(possibly
additional information)
}
The
NoTrans is optional--it indicates that, when you reassemble, this
tag
should not be transferred to the new assembly.
This is true with
tags
that should be recreated each time because they have to do with
the
assembly (e.g., repeat tags).
e.g.,
CT{
Contig206
repeat tagRepeats.perl 118732 119060 990823:115033 NoTrans
AluY
}
In the
case of most consensus tag types, there is only 1 line for the
consensus
tag. In the case of comment tags and
oligo tags, there are
additional
lines of information. The comment tag
includes the comment
on the
additional lines. The oligo tag has the
following information:
<oligo
name> <oligo bases from 5' to 3'> <melting temp> <C or U
indicating
whether the oligo is top strand or bottom strand relative
to the
orientation of the contig as created by phrap>
WA{
<tag
type> <what program created tag> <date tag was created in form
YYMMDD:HHMISS>
1 or
more lines of data
}
This
line is a 'whole assembly' tag. It is
used for information
referring
to the assembly as a whole. Currently,
phrap puts its
version
and phrap command line options in a WA tag.
You can
append CT, WA, and RT tags to the end of the ace file in any
order
you like.
------------------------------------------------------------------------
WHAT
THE COLORS MEAN
See the
beginning of the Quick Tour. But here
is a very partial list
of the
colors:
Greyscale
of background indicates quality
Grey
base with black background--clipped off part of read (either due
to low
quality or due to alignment)
Red
base--discrepant with consensus
Black
base--agrees with consensus
Colored
area covering half of a base--tag (see Quick Tour)
Purple
tag--more than 1 tag covering a base
This document was last updated on October 27, 1999 by Andreas Matern