Mini-Users Guide to Using Consed on Syntom

 

The purpose of this document is to introduce you to using syntom to examine genomic assemblies using the Consed software package. 

 

Obtaining an Account

 

To obtain an account on syntom, please contact Andreas Matern (alm13@cornell.edu)

 

Logging On

 

You can log onto syntom from any computer which has an internet connection.  I'll provide examples for logging on using a Windows NT machine.  From the Start menu, select Run.

 

 


 

 

 

 


Enter telnet syntom.cit.cornell.edu in the Open field and hit OK.  If all goes well, the follwing screen should appear:

 


 

 


Enter your login name at the prompt and hit enter, and then enter your password and hit enter. 

 

The window will now tell you where your last login was from and then display a friendly message :-)

 

You are now logged into syntom.

 

Basic Commands

 

Command

Function

ls

lists all the files in the directory

 

cd

puts you in your home directory

 

cd /path/to/directory

changes your directory

 

exit

logs out

 

cp file1 file2

copies file1 to file2

 

mv file1 file2

moves file1 to file1

 

cd ..

changes one directory up the directory tree

pwd

prints the working directory

 

 

These are just some basic commands.  There are many more -- I have some Linux books on my shelf in Theresa's office, feel free to check them out but please don't steal them....

 

Starting X Windows

 

Consed uses a graphical interface which requires an X-windows emulator.  We'll use Exceed (which is installed on most of the machines in G-04).  To start Exceed on your machine go to the start menu -> Programs -> Exceed -> Exceed.  A Hummingbird picture should open and then disappear.  You now need to tell syntom the address of your machine so it can draw the windows on the appropriate machine.  The easiest way to figure out what your IP address is to logout of your current telnet session to syntom (type exit) and then log back in (Run -> telnet syntom.cit.cornell.edu)  After the Password: field there should be a line which says:

 

Last login: <date> <time> from genomics8.cit.cornell.edu (<- this will be the address)

 

That's the IP address of the machine you are using.

 

To tell syntom where to display the windows, simply type:

 

export DISPLAY=genomics8.cit.cornell.edu:0

 

Don't forget to append the colon zero (:0) !

Getting to the Cereon BAC directory

 

The BAC19 consed information is here:  /home/amatern/sequences/bac19new/

 

To get to it, type cd /home/amatern/sequences/bac19new

 

To see what files are there, type ls

 

You'll see three directories:

 

chromat_dir

the chromatographs

phd_dir

the phd files generated by phrap

edit_dir

this is the directory you'll be working in

 

 

To get to edit_dir, simply type cd edit_dir

 

 

Typing ls gives you a list of all the files that are there:

 

 


 

 

 

 


To start consed, simply type consed_linux

 

If all goes well, there should be a consed window on your screen!

 

Here is the documentation from consed, I've added a little bit of information, but for the most part, this is the documentation that comes with the program.  There are a couple of features of consed that are not yet implemented on syntom.  When you find something that doesn't work, e-mail me and I'll get it working....

CONSED 9.0 DOCUMENTATION

 

CONTENTS:

    WHAT IS NEW IN CONSED 9.0

    QUICK TOUR OF CONSED

    ADVANCED PHRAP/CONSED USAGE

    INSTALLING CONSED

    NOTE TO SGI USERS

    FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY

    MONITORS AND MICE FOR CONSED

    PRIMER PICKING PARAMETERS

    AUTOFINISH PARAMETERS

    NEW ACE FILE FORMAT

    WHAT THE COLORS MEAN

   

 

 

------------------------------------------------------------------------

 

WHAT IS NEW IN CONSED 9.0

 

 

This section is mainly intended for advanced consed users.  Novice

users should consult the Quick Tour (below).

 

---------------------------------------------------------------------

 

Note to Linux users:  The 'scroll all traces' bug is fixed.

 

Note to Solaris 2.7 users:  Consed now works on this version of

solaris.

 

---------------------------------------------------------------------

Autofinish Improvements

 

 

*  Reverses (universal primer reverse reads) are now suggested in order

to close gaps and improve low quality regions in addition to flanking

gaps. 

 

*  Autofinish now evaluates itself--after you do the reads it

suggests, you can run it and it will tell you how well the reads

solved the problems they were supposed to solve.

 

*  Oligos are tagged (when you use -doExperiments). 

 

*  doNotFinish tags can be used to tell autofinish to not try to

finish particular regions.

 

*  There are many more flags, allowing you a great amount of control

over autofinish.  For example, if you wanted the first round of autofinish

to not choose any custom oligo experiments, fine.  If you wanted

autofinish to only close gaps and not improve the error rate within

contigs, fine.

 

*  The autofinish output is very detailed and verbose.  Thus in

addition there are 3 summary lists of experiments to do (one file for

forward universal primer experiments, one file for reverse universal

primer experiments, and one file for custom oligo walks.)  These

summary files are easily imported into Excel.  You can use the last

one to email order oligos.

 

 

---------------------------------------------------------------------

Consed:

 

*  Consed already had the ability to tear a contig into 2 and join 2

contigs into one.  Now it also has the ability to move a single read

to a different location within an assembly.  Now you have much better

control in fixing a misassembly.

 

*  You used to be able to compare a contigs to one other contig.  Now

you can compare a contig to many other contigs.

 

*  For sites with LONG, LONG read names:  you can now customize how

much space consed saves for displaying read names.  You can also

customize the initial size of the important windows.

 

*  In the Traces Window, you can move left and right with the arrow

keys.

 

*  In the Aligned Reads Window, you can instantly move to the

beginning and/or end of a read.  Similarly, you can move instantly

to the beginning and/or end of the consensus.

 

*  The ABI base calls can be hidden, if you like, thus allowing you

to see more traces at once.

 

*  All documentation windows can be searched and printed out.

 

*  In the past you could see all tags of a particular type for a

particular contig.  Now there is also a function to see all tags of a

particular type in any contig.

 

*  You can now write all contigs to a file in FASTA format with a

single click.

 

*  You can navigate to multiple locations while staying in the Aligned

Reads Window--you don't have to switch windows with each location.

 

*  For the primers that consed picks, consed will show you the

alignment of the closest false match.  This will help you in deciding

if you want to raise consed.primersMaxMatchElsewhereScore

 

*  The template picking part of primer picking has been further improved:

 

    ------------------------------------------------- (template)

            --->  (primer)

 

                <----distance to end of template---->

 

    This 'distance to end of template' gives the longest read you

could possibly make with this primer and this template.  If this

distance is too short, you can now reject the template.  The consed

resource to set is:

 

    consed.primersWhenChoosingATemplateMinPotentialReadLength: 500

 

*  If you want to pick templates yourself, you can turn off consed's

template picking.  This is particularly useful if you haven't bothered

to customize determineReadTypes.perl

 

*  If you are using an old version of phred or if you haven't

installed it correctly (with all kinds of bad effects), consed will

warn you. 

 

*  Previously, consed reported the error rate for a contig.  But some

contigs have long tails of low quality bases and you would like to

know the error rate for the contig without that long tail.  Now you

can do that:  You can get the error rate for a specified region.

 

*  Programmers can now append RT tags to the ace file.  (See FOR

PROGRAMMERS AND FELLOW TRAVELLERS in README.txt)

 

*  Programmers can popup a trace by a command from a different

program.

 

 

----------------------------------------------------------------------------

 

 

 

QUICK TOUR OF CONSED

 

 

Release 9.0

 

Consed is a program for viewing and editing assemblies assembled with

the phrap assembly program.

 

If you are already an advanced consed user, you should read through

this and do any of the exercises on features that you are unfamiliar

with.  I frequently run across people who are doing something in

consed a hard way month after month, and request a new feature to make

things easier, when that new feature is already in consed.

 

If you have never used consed before, to follow this Quick Tour will

take you less than 2 hours.  However, it will save you approximately 2

days in agony.  If you have 2 extra days to spare, and prefer to waste

them in agony, then do not do this Quick Tour and instead immediately

skip down to 'INSTALLING CONSED' below.

 

When you do the quick tour, I encourage you to be free about changing

the data set.  If you really mess things up (such as changing all a

read's bases to N's), no problem--just delete the data set and start

again with a fresh copy.

 

The software is already downloaded and your syntom profile should be correctly set so that you don't need to do the following - ALM

 

1) After downloading the distribution with netscape (see www.phrap.org

and click on 'consed'), copy the distribution to a unix computer (if

it is not already on one), and then unpack the files by typing the

appropriate line below (which one depends on what you named the file

downloaded by netscape):

 

zcat consed_solaris.tar.Z | tar -xvf -

zcat consed_alpha.tar.Z   | tar -xvf -

zcat consed_hp.tar.Z      | tar -xvf -

zcat consed_sgi.tar.Z     | tar -xvf -

zcat consed_linux.tar.Z   | tar -xvf -

 

Note:  You must untar on a UNIX computer--not on an NT computer.

 

2)  The only unix commands you must learn are the following 3:

pwd   -- this tells you were you are

ls    -- this tells you what files are there  (Same as DIR in DOS)

cd    -- this moves you  (Same as CD in DOS)

That's it--use them a lot!

 

USING CONSED GRAPHICALLY

 

3)  Type the following:

 

cd /home/amatern/sequences/standard/edit_dir

 

 

4) start consed by typing the appropriate command below:

 

consed_linux

 

Two windows will appear.  One of these will have the list of .ace

files and say 'select assembly file to open' and

'standard.fasta.screen.ace.1'.  Double click on that name.  The first

window goes away.

 

You will now see a list of one contig and a list of reads.  This is the

'Main Consed Window'. 

 

Double click on 'Contig1'.

 

The 'Aligned Reads Window' will appear. 

 

Try scrolling back and forth.  Try scrolling by dragging the thumb of

the scrollbar.  Also try scrolling by clicking on the 4 << < > >>

buttons for scrolling by small amounts.  For scrolling by tiny

amounts, click on the arrows at either end of the scrollbar.  For

scrolling by huge amounts, use the middle mouse button and just click

on some location on the scrollbar.  For scrolling to the beginning or

end of the contig, use the <<< or >>> buttons. 

 

(Question: why can't you just move the scrollbar to the extreme left

in order to go to the beginning of the contig?  Answer: in typical

assemblies, there are reads that protrude beyond the beginning of the

contig and reads that protrude beyond the end of the contig.  Moving

the scrollbar to the extreme left will scroll the contig to the

beginning of the leftmost read--typically far to the left of the

beginning of the contig.  Thus you should get in the habit of using

the <<< and >>> buttons.)

 

Notice the colors.  The bases that are in red are the ones that

disagree with the consensus.

 

Notice the different shades of grey background (around the bases).

They have the following meanings, but first, you need to understand

the meaning of the quality values:

 

A quality value of 10 means 1 error in ten to the 1.0 power

A quality value of 20 means 1 error in ten to the 2.0 power

A quality value of 30 means 1 error in ten to the 3.0 power

A quality value of 40 means 1 error in ten to the 4.0 power

 

and for quality values in between:

 

A quality value of 25 means 1 error in ten to the 2.5 power

 

Get the idea?

 

 

(These have actually been empirically verified--if you are interested

in the gory details, read the phred papers:

 

Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated

sequencer traces using phred. I. Accuracy assessment.  Genome Research

8, 175-185 (1998).

 

Ewing B, Green P: Basecalling of automated sequencer traces using

phred. II. Error probabilities.  Genome Research 8, 186-194 (1998).

 

In that same copy of the journal is a paper about consed, as well.)

 

Also notice the upper and lowercase.  This is just a cruder indication

of the quality of the bases.

 

5)  To see the quality value of a particular base, point at it and click

with the left mouse button.

 

 

 

 

These quality values are shown in grey scales:

 

Quality 0 through 4 is given by dark grey

Quality 5 through 9 is given by a shade lighter

Quality 10 through 14 is given by a shade still lighter

.

.

.

Quality of 40 through 97 is given by white (the brightest shade)

 

A quality value of 99 is reserved for bases that have been edited and

the user is absolutely sure of the base ('high quality edited').

 

A quality value of 98 is reserved for bases that have been edited and

the user is not sure of the base ('low quality edit').

 

The ends of the reads shows bases that are grey and have a black

background.  These are the low quality ends of the reads or the

unaligned ends of reads, as determined by phrap.

 

To see the quality of a base, click on it.  You will see the quality

displayed in the Info Box on the Aligned Reads Window.

 

6)  Click on a base on a read.  Then hold down the control key and

type 'a'.  You will move to the beginning of the read.  Hold down the

control key and type 'e'.  You will move to the end of the read.

(Emacs users will recognize these commands.)

 

7) Scroll so that location 490 is about in the middle of the aligned

reads window.  Push the left mouse button down on the menu item 'Dim'.

There will be a list of choices that will appear.  Drag the cursor

down to 'Dim Nothing' and release.  Now look what happened to the

color of the bases.  The ends of the reads that used to be with a

black background now appear red with a grey background.  You are

seeing the clipped-off bases with all the same information as any

other base.  Since there is a huge amount of red (discrepant) bases,

the screen becomes distracting and busy.  Thus by default the low

quality clipped-off bases are made with a black background and a grey

foreground so they don't distract you.

 

Notice there is a distinction here between 'low quality ends of

reads' and 'unaligned ends of reads'.  Unaligned ends of reads can be

low quality as well, or they can be high quality, as in the case of

chimeric reads.

 

You can play with the dimming options a bit.  Then return it to 'Dim

Low Quality' for the rest of this tour.

 

 

 

TRACES AND EDITING

 

8) Point with the mouse at a base of one of the reads and click with the

both mouse buttons simultaneously.  It's difficult at first, but you'll quickly get the hang of it.   (If you have a 2 button mouse, see MONITORS AND MICE FOR CONSED below.)  The Trace Window showing the traces for that stretch of read should popup.

 

There are 3 rows of bases in the trace window:

 

'con' is the consensus

'edt' is where you can edit the base calls of the read

'phd' is the original phred base calls

 

Notice that a red rectangle blinks (the 'cursor') in the corresponding

positions of the Aligned Reads Window and the Trace Window.

 

 

9) Try editing in the Trace Window.  You can click the left mouse

button on a base in the 'edt' line to set the cursor (a blinking red

rectangle).  You can directly overstrike a base by typing a letter.

Try this.  Try undoing it (by clicking on 'undo' ).  If you want to

undo more than one edit, you will have to go back to the main consed

window and click on the button labeled 'Undo Edit...'--you will learn

that later.

 

You can move left and right with the arrow keys.

 

We believe that the user should change a base call only while

examining the traces.  That is why editing is done here--not in the

Aligned Reads Window.

 

10)  You can insert a column of pads by pushing the space bar.  Try

this.  (You may need to click on a base on the 'edt' line first.)

 

(For those of you new to editing assemblies, a 'pad', which in consed

and phrap is represented by the '*' character, is used to align

two or more sequences such as these:

     gttgacagtaatcta

     gttgacataatcta

in which one sequence has an inserted or deleted base with respect to

the other.  By inserting the pad character, it is possible to get a

good alignment:

     gttgacagtaatcta

     gttgaca*taatcta

This is the purpose of pad character--it is just a placeholder.)

 

You can then overstrike a pad with a base.  In this way you

can insert a base, and still preserve the alignment.

 

11) Try highlighting a stretch of a read on the edt line by holding

down both mouse buttons and dragging the cursor over some bases.

They will turn yellow as you drag.  Then release the mouse buttons.  A

window will popup giving you some choices of what to do with those

(yellow) bases.:

 

 

    Make High Quality--makes the highlighted bases edited high quality

        (99).  This tells phrap (when it reassembles) that you are

        sure of the sequence here.

    Change Consensus--make the highlighted bases edited high quality and

        change the consensus to agree with that stretch of the read.

        This is a directive to phrap (upon reassembly) to use that

        stretch of that read to be the consensus.

    Make low quality--makes the highlighted bases edited low quality.

        This tells phrap (when it reassembles) that you are not sure

        of the bases here and phrap can go ahead and make a join even

        if the bases in this region don't match perfectly.

    Make Low Quality to Left End--same as above, but all the way to

        the left end of the read.

    Make Low Quality to Right End--same as above, but all the way to

        the right end of the read.

    Change to n's--Change the highlighted bases to n's which means

        they are unknown bases.  This tells phrap (when it

        reassembles) to not make any join based on these bases.  It is

        useful when you believe the bases may be in the chimeric

        portion of a read.

    Change to n's to left--same as above but to left end.

    Change to n's to right--same as above but to right end.

    Add Comment Tag--allows user to add a comment to a stretch of read

        bases.

    Add Tag--allows user to add any tag to a stretch of read bases.

    Dismiss--you decided you don't really want to do anything with

        this stretch of bases.

 

This popup is made so that nothing else works until you choose

something.  Try each of these choices, except for tags, which you'll

try below.

 

'Change Consensus' has an additional function--if a read extends out

on the right beyond the end of the consensus, you can extend the

consensus by using this function.  You might want to do this, for

example, if crossmatch did not correctly find the cloning site and

thus clipped too much.  You can add these bases back to the consensus

by using 'Change Consensus'.  (You can't try it with this dataset

since no read extends beyond the end of the consensus, but you may see

this phenomenon with your own data.)

 

12) To delete a base, overstrike it with a '*' character.  (Phrap

ignores '*', so this is the same as deleting the character.)  If you

overstrike all bases in a column with * characters so the entire

column consists of *'s (including the consensus base), there is no way

to remove the column.  This is OK since when you export the

consensus (try the exercise on EXPORTING THE CONSENSUS), the

*'s are not exported.  While you are editing in consed,

we believe there should be a visual indication that a base was

deleted.

 

SAVING THE ASSEMBLY

 

13)  To save the assembly, pull down the 'File' menu on the Aligned

Reads Window, and release on 'Save assembly'.  A box will pop up with

a suggested name.  I suggest you always use the one it suggests.  The

idea is that the ace files:

 

 

(project).fasta.screen.ace.1

(project).fasta.screen.ace.2

(project).fasta.screen.ace.3

(project).fasta.screen.ace.4

(project).fasta.screen.ace.5

 

are in order of how old they are.  If you feel you are taking up too

much disk space, then start deleting the ace files starting at the

oldest.  I do not recommend that you overwrite existing ace files.

The version numbers just keep growing, and that is not a problem.

 

 

EXPORTING THE CONSENSUS

 

14)  Exporting the consensus.  Bring the Aligned Reads Window into view

 again.  Hold down the left mouse button on the 'File' menu and

 release the button on 'Export consensus sequence'.  Notice that the

 consensus will be stored (in this case) in a file called

 'Contig1.fasta'.  Click 'OK'.  There is now a file in your edit_dir

 directory called 'Contig1.fasta' that has the consensus sequence in

 it.  If you want to see the file, bring up another Xterm (if you are

 UNIX literate), and type:

 

 

 cd standard/edit_dir

 more Contig1.fasta

 

 

15)  Fancier exporting the consensus.  Bring the Aligned Reads Window

 into view again.  Hold down the left mouse button on the 'File' menu

 but this time release on 'Export consensus sequence (with

 options)...'.  Just export a little snip of the consensus, from 400

 to 410.  (You will notice this contains a pad * character.)  Ask for

 both the bases file and the quality file.  Click 'OK'.  Consed will

 want to call this file 'Contig1.fasta' again.  You can overwrite the

 existing file. 

 

 Look in your other Xterm at these files:

 

 more Contig1.fasta

 more Contig1.fasta.qual

 

 The one file contains the bases (but no * pads) and the other

 contains the corresponding qualities of those bases.

 

 

16)  Exporting the consensus of all contigs at once:  Go to the Main

 Consed Window.  Point to 'File', hold down the left mouse button, and

 release on 'Write all contigs to fasta file'.  You then can choose a

 filename for all contigs to be written to.

 

 

 

 

17) (For this step, first click on the 'Dim' menu and release on 'Dim

Nothing'.)  Point to the 'Color' menu, hold down the left mouse button

and release on 'Color Means Edited and Tags'.  Notice that the bases

that you have edited (make sure you have edited some bases) will stand

out in either white or grey (depending on whether the base was made

high quality or low quality).  Observe this both in the Trace Window

and the Aligned Reads window.  This colormode is useful if you are

interested in easily spotting which bases are edited.

 

Return to the 'Color Means Quality and Tags' colormode by the

following:  point to the 'Color' menu, hold down the left moust button

and release on 'Color Means Quality and Tags'.

 

FIND MAIN WINDOW

 

18) On the Aligned Reads window, click on 'Find Main Win'.  This will

cause the Consed Main Window to pop up in the event you have buried it under

other windows or iconified it.  (This may not with some settings of

your X emulator.  In that case you will have to find and click on the

Main Window to bring it up.)

 

 

MULTIPLE UNDO EDIT

 

19) Now that the Consed Main Window is visible, click the 'Undo Edit...'

button.  There will be a popup indicating the most recent edit.  Click

'undo'.  Then you will see the edit that was done before that.  Click

'undo'.  You can continue undoing if you like.  You now know how to

undo more than one edit.  You cannot choose which edits to undo and

which to not undo--edits can only be undone in precisely reverse order

from the order you made them.

 

SCROLLING TRACES AND ALIGNED READS TOGETHER

 

20) In the Aligned Reads window, scroll along the contig to a

different point.  Click the left mouse button on a read whose trace is

already up.  Notice that the existing trace instantly scrolls to the

corresponding location.  Now go to the Trace Window and scroll the

traces to a new location.  Click on the edt line with the left mouse

button.  You will notice that the Aligned Reads window will instantly

scroll to the corresponding location.  Thus you can keep the Aligned

Reads window and the traces scrolled to the same location.

 

EXAMINING  ALL  TRACES

 

21) Go to a region where there are lots of reads, say base 1660.  Push

down the right mouse button and release on 'Display traces for all

reads'.  You will see all traces displayed in a scrolling window.  You

can drag the scrollbar on the right down and up to see all the traces.

This feature is particularly useful for polymorphism/mutation

detection work.  This feature was added to work in cooperation with

polyphred.  To see it in action, exit consed.

 

CONSED-POLYPHRED INTERACTION

 

Polyphred is a program for finding polymorphic sites; it was developed by

Debbie Nickerson's group (contact them at http://droog.mbt.washington.edu).

 

We have a test database, 'polyphred', which has had polyphred run on

it already.  Polyphred has put a polymorphism tag on each polymorphic

site.

 

Type:

 

cd ../../polyphred/edit_dir

ls

../../consed_(computer type)

 

where (computer type) is one of solaris, hp, alpha, sgi, or linux.

 

Double click on example2.fasta.screen.ace.1

 

When consed comes up, you should see 2 contigs.

Double click on Contig2

 

In the Aligned Reads Window, push the left mouse button while pointing

to the 'Navigate' menu and release on

 

'Toggle feature:  when navigating to consensus location, pop up all

traces (currently off)'

 

That will turn this feature on.

 

Now push the left mouse button while pointing to the 'Navigate' menu

and release on 'Tags'.  Up should pop a list of tag types.  Double

click on 'polymorphism'.   Polyphred has already been run so the

consensus is tagged with polymorphism tags at each polymorphic site. 

Up will pop a window labelled 'Polymorphism Tags' with a list of

sites.  Click on 'Next'.

 

If you correctly followed the instructions above, all the traces should

pop up at the first polymorphic site.  You may want to reposition the

traces window to see it better. 

 

Now ignore the original 'Polymorphism Tags' window and instead click

on 'Next' in the *traces* window.  This will take you to the next

polymorphic site.  Pretty nice, huh?

 

 

After you are done playing with this feature, exit consed and go back

to the previous database:

 

cd ../../standard/edit_dir

ls

../../consed_(computer type)

Double click on standard.fasta.screen.ace.1

 

Double click on Contig1 to bring up the Aligned Reads Window again in

preparation for the next step.

 

 

 

NAVIGATING

 

22) In the Aligned Reads window, pull down the Navigate menu and

release on 'Low consensus quality'.  You will see a list of locations.

Move the 'Low consensus quality' window down so you can see the

Aligned Reads window.  Repeatedly click on 'Next' until you reach the

end of the list.  (Low consensus quality means an area in which the

bases each have too high probability of being wrong.)  This saves you from

having to look through large amounts of high quality data trying to

find problem areas.

 

Alternatively, you can click on the 'Prev' and 'Next' buttons on the

Aligned Reads Window.  Thus you can keep the Aligned Reads Window in

front with input focus and keep the Low consensus quality window

pushed out of the way.

 

You may want to click on the 'Save' button in the Low consensus

quality Window to save to a file a copy of this list of problem areas

as you work through them.

 

In our experience, this will be the most important navigate list you

will use.  In fact, finishing consists mainly of adding reads and

rephrapping until this list is reduced to nothing.

 

23) Dismiss the Low consensus quality window.  Pull down the

'Navigate' menu again and release on 'High quality discrepancies as

above, but omitting tagged compressions and G_dropouts'.  You will

probably notice there are no entries (unless you created some yourself

by editing).  That is because there are no high quality discrepancies

with this dataset.  So let's force there to be some by lowering the

quality threshold.  First, dismiss the High quality discrepancies

window.

 

Click on 'Find Main Win'.  In the main consed window, pulldown the

'Options' menu and release on 'General Preferences'.  Notice that the

default for 'Threshold for High Quality Discrepancy' is 40.  Change it

to 15 and click 'Apply & Dismiss'.

 

Then follow the steps above to bring up the High quality discrepancies

menu.  Now you will see several entries.  Click 'next' repeatedly to

go successively to the next high quality discrepancy in the Aligned

Reads Window.

 

You can also double click on a particular line in the High quality

discrepancies window to go to that location.  Alternatively, you can

single click on a line and then click the 'Go' button.

 

Dismiss the High quality discrepancies window.

 

 

24) Similarly, try the other navigate lists: Unaligned high quality

regions (this list will be empty with this data set), Edits, Regions

covered by only 1 strand and only 1 chemistry, and Regions covered by only 1

subclone.

 

Unaligned high quality regions are regions in which the traces are

high quality so there is no question of the bases, but the region

differs so much from other reads that phrap has given up trying to

align the region with the consensus.  This could be due to a chimeric

read, or perhaps the read belongs somewhere else.

 

We believe that regions covered by only 1 subclone should be covered

by a 2nd subclone to prevent the possibility of there being a deletion

in the single subclone.

 

There are so many different problem lists that you may forget to check

one of them and thus miss a serious problem.  Thus we combined them

all into a single list.  This is the first menu item: 'Low Cons/High

Qual Discrep/Single Stranded/Single Subclone/Unaligned High'.  We

suggest you use this list.

 

25) Also try navigate by tags by selecting 'tags' under navigate: when

the Select Tag Type Window appears, double click on 'compression'.

(Note that you can't do anything else until you deal with this

window.)  This gives a list of a particular tag type in a particular

contig.

 

26)  There is also a way of getting a list of a particular tag type in

all contigs:  Click on 'Find Main Win'.  In the Main Consed Window,

point to the 'Navigate' menu, hold down the left mouse button, and

release on 'Tags in all contigs'.  Continue as in the previous step.

 

 

 

PRIMER-PICKING

 

 

**** Temporary step ****

 

After you have completed the 'install vector files' step (below), you

should never do this.

 

Click on 'Find Main Win'.  On the Main Window, open the Options menu,

and release on 'Primer Picking Preferences'.  Notice the question

'Screen Primers Against Sequences in File?'  (If you have trouble

finding this question, scroll the Primer Picking Preferences list

down.  It is between 'PrimersNumberOfTemplatesToDisplayInFront' and

'Pick subclone templates for primers?'  Click on 'False'.  Then click

'Apply & Dismiss' and the Primer Picking Preferences box will pop

down.

 

(In real use, 'Screen Primers Against Sequences in File?' should be

set to 'True'.  I have had you set it to False just this once so you

can go ahead and see how this is supposed to work until your system

administrator has time to correctly install the vector sequences file.

 

**** end of temporary step ****

 

 

 

 

27) Go to some location near the right end of the contig, say base

2570.  Click with the right mouse button on the consensus and click on

either one of the top strand primer choices (either from subclone

template or from clone template).  Consed will pause a moment, and

then there will appear a selection of primers that pass all of

consed's requirements.  Templates are also chosen for each primer.

You may have to scroll the primer list to the right to see the

templates.  Consed lists these templates in order of quality--all of

them will cover the read you want to make.

 

Double click on one of the primers in the Primers Window.  That will

cause the Aligned Reads Window to scroll to show that oligo in

context.  Click on 'Accept Primer'.  Notice that a yellow oligo tag is

created on the consensus for that primer.  That tag contains all the

information you need to order that oligo and do the reaction--you will

learn how to pop it up below under 'tags'.

 

What is the difference between 'Pick Primer from Subclone Template'

and 'Pick Primer from Clone Template'? 

 

There are 3 differences: 

 

A.  which vector file the primers are screened against.  In the former

case, the primer is screened against the file primerSubcloneScreen.seq

and in the latter case against the file primerCloneScreen.seq

 

B.  In checking for false matches elsewhere in the assembly, if the

template is the whole clone, then consed must check for false matches

in the *entire* assembly, including all other contigs.  But if the

template is just going to be a subclone, consed only needs to check

elsewhere in that subclone.  Actually, to be conservative, consed

checks for false matches +/- the maximum insert size of a subclone.

 

C.  If you are picking primers for subclone template, then the primer

picker can also pick the subclone templates.  If it doesn't find any

suitable subclone template, it will reject the primer.  (By default,

picking of subclone templates is turned off.  You can turn it on

temporarily or permanently.  To turn it on temporarily, go to the

Consed Main Window, point to the Options menu, hold down the left

mouse button and release on 'Primer Picking Preferences'.  Scroll down

to 'Pick Subclone Templates for Primers' and click 'True'.  Click on

'Apply and Dismiss'.  To change this permanently, see CONSED

CUSTOMIZATION below.  Beware:  you must correctly customize

determineReadTypes.perl for template picking to work.  See INSTALLING

CONSED below.

 

If you are interested in the details of primer-picking, see the

section 'PRIMER PARAMETERS' (below).

 

When you are done editing and have saved the assembly and exited

consed, run ace2Oligos.perl (supplied with this distribution--make

sure your system administration installed it) which will extract all

the oligos you just created.  This is handy for email ordering of

oligos.

 

In the xterm, type:

 

ace2Oligos.perl standard.fasta.screen.ace.2 oligos.txt

 

where standard.fasta.screen.ace.2 is whatever the name is of the ace

file you just saved.

 

 

 

 

 

 

 

 

 

SEARCH FOR STRING

 

28) Try the 'Search for String' button (left side of the Aligned Reads

Window).  Type in a string (such as aaaca), and click 'ok'.  There

should be a list of 'hits'.  Double click on one of the hits (or

single click on it and click on 'go'.)  Notice that the Aligned Reads

Window scrolls to that position and has the cursor on the found

string.  (It might be complemented.)

 

Dismiss this window.  Try this again, only this time in the Search For

String Window select 'Search Just Reads'.  Then click 'OK'.  You will

notice there are many more hits.  This is because this shows hits in

each read, even if they are at the same consensus position.

 

COPY AND PASTE

 

29) In the Aligned Reads Window, swipe some bases by holding down the

left mouse button.  You should see the bases turn yellow, at least

temporarily.  Then click the 'Search for String' button.  Use the

middle mouse button to paste the bases you have just swiped into the

'Query string:' box.  Notice that you can swipe bases either from the

consensus or from a read.

 

The search for string is case-insensitive so don't worry about the

pasting being upper or lowercase.

 

 

CORRECTING FALSE JOINS MADE BY PHRAP

 

30)  Phrap may put several reads together that you believe do not belong

together.  (For example, you may see several high quality

discrepancies between the reads.)  If you are sure these reads do not

belong together, you can force a subsequent reassembly by phrap to not

assemble those reads together.  You do this by finding a location

where there is a high quality discrepancy.  Then click on the read

with the right mouse button and release on 'Tell phrap not to overlap

reads discrepant at this location'.  There are no high quality

discrepancies with this dataset so consed won't let you do this.

(Try it and see.)  However, when you use your own data, you may get

the chance!

 

 

 

ADDING READS

 

31) For this to work, your system administrator must have set up

everything correctly. (See below in INSTALLING CONSED.)  Assuming you

have set everything up correctly, you can now experiment with adding

reads.

 

Now bring up consed again using ace file standard.fasta.screen.ace.1

If it asks if you want to apply edits, just say 'no'.

 

On the Main Window, click on the Add New Reads button.  There will

appear a list of files ending with .fof. These are files that contain

lists of chromatograms.  Double click on 'reads_to_add.fof' There

should be lots of progress output in the xterm from which you started

consed.  When it completes, there will be a Reads Added Window popup

with a report of which reads were added.  In this case, it should say

that 9 reads were successfully added and list them.

 

 

 

TEARS AND JOINS

     

32) When phrap really screws up, you may want to just tear the contig

apart in several places and then join the pieces back together in a

different way.  Although we discourage you from doing this, we do give

you the power to do it, if you want to.  Let's try it:

 

Go to location 1550.  Point the mouse at the consensus base at 1550

and push the right mouse button down.  Release the button on 'Tear

Contig at This Consensus Position'.  Up will pop a list of reads with

2 little buttons next to them <- and ->.  Leave everything as it is

and just click 'Do Tear'.  (If you want to play around with which

reads goes into which contig, do that another time.)

 

Now you should have 2 Aligned Reads Windows on top of each other.  One

should contain 'Contig2' and the other 'Contig3'. 

 

Now let's join these 2 contigs back together:

 

 

Click on 'Search for String' and type in the following bases:

agctgccatc

 

Click 'OK'.

 

Search for string should find 2 locations, one in Contig2 and one in

Contig3:

 

Contig2     (consensus)     1447-1456   (uncomplemented)

Contig3     (consensus)     829-838     (uncomplemented)

 

Double click on the first one.  The Aligned Reads Window for Contig2

will scroll to location 1447 and the window will raise up.  In that

Aligned Reads Window, click on 'Compare Cont'.

 

Now double click on the 'Contig3' line in the above Search for String

results.  The Aligned Reads Window for Contig3 will scroll to location

829 and lift up.  In that Aligned Reads Window, click on 'Compare

Cont'.

 

Now the Compare Contigs Window should be visible.  In the Compare

Contigs Window, try scrolling back and forth.  You can change the

cursors (blinking red), but if you do, please return them to the

locations 1447 and 829 for the next step.  The cursors 'pin' these

bases together when doing an alignment.  (The algorithm is a pinned

Smith-Waterman alignment.)

 

Click on Align.  Try scrolling the alignment by dragging the thumb in

the lower half of the Compare Contigs.  An 'X' means there is a

discrepancy between the 2 contigs.  There is also a 'P' (see if you

can find it!)  The P indicates the bases that you pinned together.

 

Click with the left mouse button on either contig in the bottom

alignment.  You will notice that both contigs will have the red

blinking cursor in the same position.  Click on 'Scroll Both Aligned

Reads Windows' and look at the Aligned Reads Windows to see that they

scroll to the corresponding positions.  You can have traces up for the

contigs, and they will scroll as well.  Experiment with this.  Then

click 'Join'.  The 2 previous Aligned Reads Windows will disappear and

there will be a new one which has a new contig 'Contig4'.  You have

made a join!

 

It is possible to have more than one Compare Contigs windows up at a

time.  This allows you to investigate a repeat that has more than 2 copies.

 

Compare Contigs is one method of exploring joins of contigs that were

not made by phrap.  Another method is to use phrapview, supplied with

phrap.  phrapview gives a high level view of all internal joins while

'compare contigs' shows the alignment of a single internal join.  Some

users have found them to work well together--phrapview to find a join

and, having found it, 'compare contigs' to examine it in more detail.

 

 

 

REMOVING READS

 

33)  You can also remove individual reads and put them into their own

contigs.  For example, in the Aligned Reads Window, go to location

2000.  Point to the read name of read djs74_2664.s1 and hold down the

right mouse button.   Release on 'Put read djs74_2664.s1 into its own

contig.'  Consed will ask you 'Are you sure...?'  Answer 'yes'.

Presto-chango!   The read is put into its own contig and the old

contig is redrawn without the read in it.  At this point you should

save the assembly--you should always save the assembly after removing

a read.

 

 

TAGS

 

34) Bring up a trace for a read (as above).  Swipe some bases on the

'edt' line with the middle mouse button.  A list of choices will popup.

Select 'Add Comment Tag'.  Type in a comment in the box that appears,

and click 'OK'.  You will now see a blue box both in the Aligned Reads

Window and in the Traces Window on that read.

 

To see the comment, you can click on that blue tag in the Aligned

Reads Window with the right mouse button and release on 'Tag: comment

Show more info?'.  Alternatively, you can click on the blue tag in the

Traces Window with the right mouse button.

 

Try creating some other kinds of tags: again swipe some bases in the

Trace Window.  But this time instead of clicking 'Add Comment Tag',

click on 'Add Tag'.  Select another tag type.  You will notice that

different tags are in different colors.  You can always click with the

right mouse button on the tag (as above) if you forget what a

particular color means.

 

You can also define your own tag types.  See below CREATING CUSTOM TAG

TYPES for how to do that.

 

35) You can create really, really long tags as follows: Just create a

short version of the tag as above for where you want the tag to start.

Then figure out the consensus position of where you want the tag to

end.  In the Aligned Reads Window, click on the short tag with the

right mouse button and release on 'tag: show more info?' (as above).

A Tag Window will appear for that tag.  In the Tag Window, simply

change the End Unpadded Consensus Position to the place you want it to

end.  Then click 'OK'.  You will now notice that the tag will be as

long as you wanted.

 

36) You can create tags on the consensus in the same way.  In the

Aligned Reads Window, use the middle mouse button to swipe some bases

on the consensus in the Aligned Reads Window.  Up will pop a list of

tag types.  Click on one of them.  Try it again somewhere else.  Try

it with the tag type being 'comment'.  In this case, you must enter a

comment.  Notice the pretty colors!  If you forget what a particular

color means, you can click on the colored tag with the right mouse

button and it will tell you.

 

37)  Try creating some tags that overlap each other.  You will notice

that the overlapping region will be purple.  If you want to know which

tags overlap, you can click with the right mouse button on the purple

and you will be told all tags that are on that base.

 

38) If you have many tags that overlap and thus are purple, you can

hide some less relevant tag types so there is less purple and there is

less distraction.  Make sure you have a few tags visible.  Then click

on 'Find Main Win'.  In the Main Window, open the Options menu, and

release on 'Hide Some Tag Types'.  A list of tag types will popup.

Select the type that you have visible (above).  Then click 'OK'.  Go

back to the Aligned Reads Window.  That tag should still be visible.

Click on the button 'Some Tags' in the upper right part of the Aligned

Reads Window.  Your tag should disappear.  The 'Some Tags' button

should have changed to 'Sh All Tags'.  Click on it again.  Your tags

should have reappeared.

 

 

INCREMENTAL SEARCH FOR READ NAME

 

39) Restart consed.  Instead of clicking on a read or contig name,

type a read name into the 'Find read:' box.  Try typing djs74_2 You

will notice that as you type each letter, the first item in the list

that matches the letters typed will be highlighted.  Experiment with

deleting a few letters and typing others.  This is a powerful method

of quickly getting to the read name you are interested in.  When you

get to the read you want, just type carriage return or click the 'OK'

button.

 

ONLINE DOCUMENTATION

 

40)  On the Aligned Reads Window, click on the 'Help' menu and release

on 'Show Documentation'.  You will see this document.

 

 

GOTO POSITION

 

41) In the Aligned Reads Window, click in the 'Pos:' box in the upper

right-hand corner.  Type in a number, such as 540, and push the

'Return' or 'Enter' key.  The Aligned Reads Window will scroll to

position 540.  We find this feature is particularly useful when one

person wants another person to look at something in the sequence.

 

HIGHLIGHTING READ NAMES

 

42)  In the Aligned Reads Window, click on a read name with the left

mouse button.  The name will turn magenta.  Click again and it will

turn yellow again.  Try turning it magenta and then scrolling.  This

feature is helpful in keeping track of a particular read as you scroll.

 

COMPLEMENTING THE CONTIG

 

43)  Push 'Comp Contig' in the Aligned Reads Window to complement the

contig.  This displays the opposite strand of the contig including the

consensus and all reads.  Push this button again to uncomplement it.

 

 

RECOVERY FROM CRASHES

 

44)  It is important to feel that your data are safe, even if the

computer (or consed) were to crash.  Consed will recover your data

from such a crash.

 

Make an edit (remember, edits are made in the Trace Window) and jot

down its location.  Also note the name of the ace file which is

displayed in the upper left box in the Aligned Reads Window.  Then

simulate a crash by going to the xterm where you started consed and

typing control-C.  Restart consed and select the same ace file you

noted (above).  A box will come up saying 'There is an edit history (a

.wrk file) Consed may have crashed during a previous session with this

same file.  Do you want to apply those edits?'  Click on 'yes'.  Go

and find the edits you made before consed crashed--you will find them.

 

This is the purpose of the .wrk files--they are a log file of your

edits and they are added to as you make edits.

 

45)  You should save your edits by pulling open the 'File' menu on the

Aligned Reads Window, and releasing on 'Save assembly'.

 

PROTEIN TRANSLATION AND OPEN READING FRAMES

 

46)  If you would like, you can see the amino acid translation of the

 consensus in all reading frames.  In the Aligned Reads Window, push

 down the left mouse button on the 'Misc' menu and release on 'Show

 Top Strand Protein Translation'.  Try again but this time release on

 'Show Bottom Strand Protein Translation'.  Notice that there are 2

 characters that are in magenta color.  What are those characters?

 Why are they made in a different color?  To not show the protein

 translation, push down the left mouse button on the 'Misc' menu and

 release on 'Don't show protein translation'.

 

47)  You can search for open reading frames within a contig.  In the

 Aligned Reads Window, push the left mouse button on 'Navigate' and

 release on 'Search for Open Reading Frames'.  Notice that the open

 reading frames are shown for all 6 reading frames and are sorted by

 length.

 

 

ERROR RATE

 

48)  In the Aligned Reads Window is a box (upper right) labelled

'Err/10kb'.  This is the estimated error rate for this contig, and it

is a good indicator of when you are done (or not done) finishing.

In addition, you can find the error rate for a particular region of

contig as follows:  Point at 'Misc' menu, hold down the left mouse

button, pull down and release on 'Show Error Info For Region'.  Fill

in the boxes for left and right consensus position, click on

'Calculate' and you will be given the error and single subclone data

for that region.

 

 

RUNNING PHRED and PHRAP

 

 

phred and phrap *must* be run via the phredPhrap perl script.  If you

don't do this, you are on your own.  If you run phred on its own, and

then you run phrap on its own, you will get an ace file that will not

be usable by consed.  If you try to run phred and phrap without using

the phredPhrap script, you are on your own.  After you have run into

problems (and you probably will), then do not email me--instead please

use the phredPhrap script.  To use the phredPhrap script to run phred

and phrap:

 

49)  Type:

phredPhrap -V

 

It should say:

991019

 

If it does not, then you probably have not installed all the perl

scripts from the scripts directory, as directed above.

 

50)  Make a copy of the standard dataset.  E.g.,

 

cp -r standard test

cd test

 

51)  Delete all the file in phd_dir and edit_dir:

 

rm phd_dir/*

rm edit_dir/*

 

52)  cd edit_dir

 

53)  Run phredPhrap by typing

 

phredPhrap

 

That's it--you no longer need to type *any* arguments, and generally

you should not.  (Please do *not* use the -notags option any longer.)

If you want to add phrap options, you can do that:

 

e.g.,

 

phredPhrap -forcelevel 3

 

Then run consed on the resulting ace file as indicated in the beginning of

the Quick Tour (above).  If you have any problems, this is the time to

diagnose them before you use your own data. 

 

After you have done this successfully, you are ready to use your own

data. 

 

 

 

 

 

 

 

 

AUTOFINISH

 

Note:  Before you use autofinish on your own data, you must modify

determineReadTypes.perl.  See INSTALLING CONSED below for information

about this.

 

54) cd to autofinish/edit_dir

 

55)  Try starting consed by typing:

 

../../consed -autofinish -ace autofinish2.fasta.screen.ace.2

 

(Note 'consed' above may be 'consed_solaris', 'consed_alpha',

'consed_hp', 'consed_sgi', or 'consed_linux' depending on your

executable.  If you have trouble, use that 'ls' command (see above)! )

 

If autofinish says:

 

Run-time exception error; current exception: InputDataError

        No handler for exception.

Abort

 

that means that you have not followed the instructions under

'INSTALLING CONSED' below.  Please follow those instructions and then

try this again.

 

Consed will create 5 files:

 

autofinish.fof

(project name).991014.155627.out

(project name).991014.155627.univForwards

(project name).991014.155627.univReverses

(project name).991014.155627.customPrimers

 

The '991014.155627' is the date and time in format YYMMDD.HHMISS.

The first file, autofinish.fof, is a file of filenames.  It contains

the names of the other files.

 

The .out file is the autofinish output file.  If you want to know

*why* autofinish picked the reads it did, it will tell you.  It will

tell you lots more, such as the orientation of the contigs. 

 

 

If you correctly installed consed, it will print out a list of

experiments you should do to make reads in order to reduce the number

of errors below a target threshold.

   

(project name).991014.155627.univForwards

    is the summary file of the suggested universal forward subclone reads

(project name).991014.155627.univReverses

    is the summary file of the suggested universal reverse subclone reads

(project name).991014.155627.customPrimers

    is the summary file of the suggested custom primer reads

 

These are the files you will typically use for directing your bench

work.  If you like, you can import these files into Excel since the

fields are separated by commas.

 

This finishing tool is designed to be run in batch after each

assembly.  In a high throughput operation, the production people can

make these reads without anyone using consed to examine the assembly

interactively.  Only when autofinish cannot help you any

longer (either it reduces the number of expected errors below your

error threshold, or it says it can't help you further), must you bring

up consed graphically and examine the assembly.

 

AUTOFINISH TARGET ERROR RATE

 

Now let's experiment with some of the autofinish options.  By default,

autofinish will suggest finishing reads until the error rate is less

than 100 errors per megabase.  Suppose you want fewer errors.  Fine:

 

56)  Create a file in edit_dir called .consedrc

and put the following line in it:

 

consed.autoFinishMaxAcceptableErrorsPerMegabase: 10

 

(Note: I have put the following already in your .consedrc

consed.autoFinishAllowWholeCloneReads: false

That tells autofinish to not suggest any sequencing reactions directly

off the BAC or cosmid, since most labs don't like these sequencing

reactions--they prefer sequencing reactions off M13 or plasmids.  So I

suggest you leave this line the way it is.)

 

Run autofinish again the same as before:

 

../../consed -autofinish -ace autofinish.fasta.screen.ace.1

 

You will notice two differences in the output:  First, near the top of

the autofinish output file it will say:

 

consed.autoFinishMaxAcceptableErrorsPerMegabase: 10

 

whereas before it said:

 

consed.autoFinishMaxAcceptableErrorsPerMegabase: 100

 

A second difference is that this time it suggested additional

experiments.

 

Note for UNIX novices:  Earlier, I said that you only needed to know 3

UNIX commands:  pwd, ls, and cd.  Now I want you to learn one variant:

ls -tlr

This is the same as ls, but it puts one file on a list and prints the

lines so that the most recent files are on the bottom.  Since you will

be created many, many files as you work through these autofinish

exercises, this command gives an easy way to see the files you have

just created, without having to always look at autofinish.fof to look

for the names of the files you just created.

 

 

AUTOFINISH:  CHANGING COSTS

 

57)  Now please change it back to

consed.autoFinishMaxAcceptableErrorsPerMegabase: 100

or else just comment out the line by putting a '!' in the first column

like this:

 

!consed.autoFinishMaxAcceptableErrorsPerMegabase: 10

 

 

and run autofinish again:

 

../../consed -autofinish -ace autofinish.fasta.screen.ace.1

 

Check that it now says:

 

consed.autoFinishMaxAcceptableErrorsPerMegabase: 100

 

near the top of the autofinish output file.

 

Notice that it calls 3 custom primer subclone sequencing reactions and

3 universal primer sequencing reactions.

 

Suppose you want to indicate that your lab can make oligos very

cheaply--as cheaply as doing a universal primer reaction.  You can do

this by lowering the relative cost of subclone sequencing

reactions.  Put the following in .consedrc

 

consed.autoFinishCostOfCustomPrimerSubcloneReaction: 20

 

And then run autofinish again:

 

../../consed -autofinish -ace autofinish.fasta.screen.ace.1

 

Check that it now says:

 

consed.autoFinishCostOfCustomPrimerSubcloneReaction: 20

 

near the top of the autofinish output file.

 

You will notice that there are now 4 custom primer experiments and 2

universal primer experiments.

 

 

 

AUTOFINISH:  CHANGING MELTING TEMPERATURES

 

58)  Look near the top of the autofinish output file and you will see the

following lines:

 

consed.primersMinMeltingTemp: 50

consed.primersMaxMeltingTemp: 55

 

Some labs prefer to use primers with higher melting temperatures.  In

your .consedrc file, put the following lines:

 

consed.primersMinMeltingTemp: 55

consed.primersMaxMeltingTemp: 60

 

Then run autofinish again:

 

Check that it now says:

 

consed.primersMinMeltingTemp: 55

consed.primersMaxMeltingTemp: 60

 

near the top of the autofinish output file.

 

Compare the first experiment from the last 2 autofinish runs.

Everything should be the same except that the primers are longer at

their 3' ends but are otherwise the same primers.

 

AUTOFINISH:  OTHER CONTROL

 

59)  Try adding to .consedrc the following:

 

consed.autoFinishCloseGaps: false

 

and run autofinish again.

 

What happened?

 

 

Another parameter that people sometimes change is:

 

consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.1

 

One finisher says that she prefers to set this at 0.5 errors and to

decrease:

consed.autoFinishMaxAcceptableErrorsPerMegabase: 1

 

This has the effect of making autofinish try to resolve every

region where errors are clustered tightly together, even if the total

error rate for the entire BAC is very low.

 

You can change any of the parameters listed at the top of the

autofinish output file (or actually any of the more exhaustive list of

resources listed in the 'Info' menu, 'Show Consed Resources' list.)

 

We believe the defaults are an excellent starting point.

 

 

 

AUTOFINISH:  NOT REPEATING FAILED EXPERIMENTS

 

60) If you are serious about doing the experiments autofinish

suggests,

 

consed -ace (ace file name) -autofinish -doExperiments

 

-doExperiments causes autofinish to record its suggestions in the ace

file.  If one of these suggested reads fails to fix a problem, when

autofinish is run again it won't pick the same read again.

 

If a forward or reverse universal primer read failed, autofinish (when

run in a subsequent round) will not suggest that same experiment.  If

a custom primer read fails, autofinish will not pick that same

experiment again, and it won't pick a custom primer read that is even

close to the failed one.  'Close' is defined by the resource:

 

consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead: 50

 

You can change the default of 50 if you like.

 

In addition, autofinish (the next time it is run) will tell you how

well each experiment did in solving the problem it was intended to

solve.

 

See the

 

EVALUATING EXPERIMENTS

 

section of the autofinish output file.

 

(Note to programmers:  the format of the autoFinishExp tags is likely

to change--parse them at your peril!)

 

-doExperiments will also cause oligos to be tagged.  (You can turn

this off by setting:

 

consed.autoFinishTagOligosWhenDoExperiments: false

 

Primer id's created by autofinish use the same naming scheme as

primers created in consed and they will not conflict with each other.

For example, if autofinish creates oligos djs14.1, djs14.2, and

djs14.3, then the next primer that a user accepts will be djs14.4.  If

autofinish is run a second time, it will start with primer djs14.5.

 

You should not type '-doExperiments' if you do not intend to do the

experiments autofinish suggests.  If you use -doExperiments, but you

don't really do the experiments, and then you run autofinish again,

autofinish will be very upset--it will think that all of its suggested

experiments failed (because it can't find them).  It will see that all

of the problems are still present but it will think that it should not

choose any of those same experiments again so it will suggest

different experiments that will not be as ideal.

 

AUTOFINISH:  doNotFinish particular regions

 

61)  If there is a region that you don't care to finish (e.g., it has

already been finished or you know there is no gene there), then you

can put a doNotFinish tag on the consensus and autofinish will not try

to finish this area.  Try putting a doNotFinish tag on the region from

1 to 200.  Run autofinish again.  You will notice that there will no

longer be any experiments to solve weak regions in the consensus.

 

----------------------------------------------------------------------------

 

ADVANCED PHRAP/CONSED USAGE

 

 

62)  BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY

 

If you decide that all your edits are terrible and you want to start

over (perhaps you have been training a new finisher), the cleanest

solution is to delete everything in phd_dir and edit_dir , but leave

everything in chromat_dir and just run

phredPhrap again. 

 

 

63)  SELECTIVELY BACKING OUT EDITS AND REMOVING READS

 

If you want to back out all edits in just particular reads, I have

provided a perl script to do this:

 

 

revertToUneditedRead (read name)

 

What it does it copy the .phd.1 to 1 greater than the highest

version. 

 

Then you must reassemble using the phredPhrap script to create an ace

file that has no edits for that particular read.  It will have all

edits for all other reads.

 

Why doesn't it just delete all phd files except for the

.phd.1?  In that case, consed could not read any previous ace file

since all previous versions of ace files would refer to phd files that

have been deleted.

 

64)  REMOVING READS FROM AN ASSEMBLY

 

Create a file containing the filename of all the reads you want to

remove, one filename per line.

Then use the perl script

 

removeReads  <file of filenames>

 

Then reassemble using the phredPhrap script.

 

 

65)  ADDING READS WITHOUT CHROMATOGRAM FILES

 

This may happen if you, for example, download sequence from Genbank

and want to assemble it along with your reads. 

 

There are 2 ways to do this, depending on whether you want to edit the

read or not. 

 

a)  If you want to edit the read, run mktrace to produce a fake trace.  It

will have all perfect peaks. 

 

Run:

 

mktrace (name of file with fasta sequence)

 

Then run the phredPhrap script normally.  You will be able to bring up

the traces in consed and edit the read.

 

b)  If it is not important to edit the reads, there is a method that

is a little faster.  Create just a fake phd file using:

 

fasta2Phd.perl (name of file with fasta sequence)

 

 

It will create a file whose name is taken from the fasta file name:

for example, if the fasta filename is Contig1.c.fasta, then the phd file

will be called Contig1.c.phd.1 The fasta name in the file is ignored.

You can then put this in the phd_dir, and reassemble using the

phredPhrap script.

 

Note: all fake reads should end with an extension .c or .a or .c1 or

.c2 ... or .a1 or .a2 or ...   This is important because it tells

consed and autofinish that this data cannot be used as a template for

a primer.

 

Note:  when you are creating phd files such as this, you must start with

(read name).phd.1   Do not start with (read name).phd.2 or any higher

version number.  This is because consed looks for the .1 version in

order to find the original phred calls so it expects there to be a .1

version.

 

If the reads are really fake (you don't want

templates

 

 

66)  WHY ARE ALL THE READS NOT IN THE ASSEMBLY?

 

You will notice that there are some contigs that contain only one

read.  You will also notice that there are some reads that are not

shown by consed at all, since phrap did not put them into the ace

file.  Why?

 

If a read does not have a significant match (with Smith-Waterman score

exceeding minscore) to any other read, that read is not included in

the ace file.  Instead, that read is put in the '.singlets' file.

That read will not appear in consed.

 

If a read does have a significant match to any other read, then it

will appear in the ace file and be shown by consed.  However, such a

read might have other problems: it might not be possible to assemble

such a read with other reads (in the case of EST's this read may be a

unique representative of a particular gene (or a genomic sequence

contaminant) that happens to contain an Alu repeat and thus happens to

match other reads in the data set; or it may represent the only read

of a particular alternatively spliced form; or it may have data

anomalies of some sort (chimeras, etc.).  Such a read would end up in

a contig all of its own.

 

 

67) VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS

 

 

If you have a chromatogram, you can use consed to view it, even if it

hasn't been assembled into the ace file.  This is common with cDNA

assemblies in which the reads don't overlap and thus phrap doesn't put

them together into a contig.

 

To do this, make the same edit_dir, phd_dir,

and chromat_dir as above, put the chromatogram into chromat_dir, run

phred on it to generate the phd file which goes into phd_dir.

 

Then go to edit_dir and run:

 

phd2Ace.perl (name of phd file)

 

For example, if your phd file is myRead.phd.1

from edit_dir, type:

 

phd2Ace.perl myRead.phd.1

 

This will produce myRead.ace

 

Then just start consed normally:

consed -ace myRead.ace

and you can view the chromatogram.

 

 

MULTIPLE TRACE POPUP

 

68) Bring up dataset standard.  In the Aligned Reads window, scroll to

a region that has many reads and that has some discrepancies--try

position 1162.  Hold down the shift key, and click with the middle

mouse button on the consensus.  At this location 3 traces will

popup--these are the 2 highest quality traces that agree with the

consensus (on each strand) and the highest quality trace that

disagrees with the consensus.  This feature is useful in areas of high

coverage when you want to rapidly examine just the most significant

traces rather than looking at all of them.

 

 

MAXIMUM NUMBER OF TRACES DISPLAYED

 

69) Bring up dataset standard.  Scroll to position 1162.  Bring up 4

reads and then try bringing up additional reads.You will notice that

new reads are put at the top of the stack of traces and, once there

are 4 traces displayed, traces are automatically removed from the

bottom of the stack.  If you want to change this maximum number of

traces to something besides 4, you can do that: In the Main Consed

Window (click on 'Find Main Win' on the Aligned Reads window), pull

down the 'Options' menu, and release on 'General Preferences'.  Try

changing the 'Max Number of Traces Shown' to 3.  Then click 'Apply and

Dismiss'.  Now dismiss the Trace Window and again start adding

additional traces to the Trace Window.  You will notice that now the

number of traces shown will not exceed 3.

 

HOTKEYS FOR EDITING

 

70)  If you do a lot of editing, you will want to have a faster method

of doing these edits than having the popup and selecting an option.

Thus the following hot keys exist:

 

 

    < and > (less than and greater than) to make n's to the left

        and the right (respectively) of the cursor

    control-l and control-r to make low quality to the left and

        the right (respectively) of the cursor

    overstriking with a capital letter (e.g., C instead of c) causes

        the base to become high quality rather than low quality

    overstriking with a lower case letter causes the base to become

        low quality

 

Give these a try.

 

71) Now go to the menu labelled 'color', and pulldown and release on

'color means match'.

 

Now you notice different colors:  The

colors have the following meaning:

 

    Blue:   agrees with consensus

    Orange: disagrees with consensus

    Yellow: this stretch of this read was used to form the consensus

    Grey:   Low quality or unaligned ends of reads

 

Now go back to the colormode 'color means quality and tags' (the

default) for the next exercise.

 

(The other colormodes will mean more to you later.)

 

 

ALPHABETICAL ORDERING OF READS

 

72)  The reads can be ordered in two ways:

 

      a) alphabetically

      b) first all the top strand reads and then all the bottom

            strand reads.  The top strand reads are then ordered

            by the left end of the reads.  Same with the bottom

            strand reads.

 

Try changing between a) and b).  In the Main Consed Window (click on

'Find Main Win' on the Aligned Reads Window if you can't find the Main

Consed Window because it is covered up with other windows), pull down

the 'Options' menu, and release on 'General Preferences'.  Find

'Display reads sorted alphabetically or by strand/left end of read.'

Switch it between 'alpha' and 'strand'.  Then click 'Apply and

Dismiss'.  Notice the effect in the Aligned Reads Window.  Many

polymorphism and mutation detection labs find that alphabetically

sorting is most useful, while many genomic sequencing labs find that

sorting by strand/left end of read is most useful.

 

 

SCROLLING TRACES INDEPENDENTLY

 

73) Dismiss all of your Trace Windows.  Then popup traces for 2

different reads in approximately the same location.  Scroll one of

them.  You may want to scroll by clicking the arrows or clicking to

the left or right of the thumb.  You will notice that both will

scroll.  Consed will do its best to have corresponding peak lined up.

(Consed can't line all of them up because the peak spacing is not

uniform and differs from read to read.)  Try removing a trace by

clicking on one of the 'Remove' buttons in the Trace Window.  Try

adding other traces.  Then click on 'No' for scrolling the traces

together and try scrolling.  You will now observe that they scroll

separately.

 

 

ABI BASE CALLS

 

 

74) If you want to see the ABI base calls, no problem.  Just go to the

Main Consed Window.  Pull down the 'Options' menu and release on

'General Preferences'.  Click on 'True' for 'Show ABI Bases in Trace

Window' and then click 'OK' at the bottom of the window.  The ABI

bases will not be shown immediately--you must first dismiss the trace

window and bring it up again.  You will then see an additional line

with the ABI base calls.

 

MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION

 

75)  Some contigs have long tails of low quality bases and you would

like to find out the error rate for the contig without that long

tail.  On the Align Reads Window, pull down the Misc menu, and release

on 'Show Errors for a Region'.  This will tell you both the error rate

for the region and the number of single subclone bases for that region.

 

 

------------------------------------------------------------------------

 

INSTALLING CONSED

 

Consed used to use .Xdefaults for consed parameters--no longer.  Now

consed uses ~/.consedrc for most of the same parameters.  Thus you

should remove consed parameters from .Xdefaults and put them in

.consedrc in your home directory.

 

Before, when you made a typo with one of the consed parameters, it was

just silently ignored.  Now consed makes a big fuss.  So you need to

be prepared to find out all of the parameters that have not been

working all this time.

 

To start with, type:

 

cd ~

touch .consedrc

 

That will create a new empty consed parameter directory.  You can add

lines to it as you need to customize consed.  Although most consed

parameters now go into .consedrc, there are still a very few that need

to stay in .Xdefaults.  Here is the rule:  if the parameter starts

with

 

consed.

 

such as

 

consed.gunzipFullPath: /bin/uncompress

 

then it goes into .consedrc

 

If the parameter starts with

 

consed*

 

such as

 

consed*contigwin.background: Black

 

then it goes in .Xdefaults

 

You can also make such customizations system-wide (for everyone) or

for just a specific project.  See CONSED CUSTOMIZATION (below) for

more information.

 

 

76) Follow the first few steps of USING CONSED GRAPHICALLY of the

 Quick Tour (above).  If you have problems, it may be due to your X

 emulator.  See 'MONITORS FOR CONSED' below.

 

77) The default locations for most of consed, phred, and phrap require

that there be a directory /usr/local/genome  

 

I strongly suggest you make such a location--it will save you many

headaches of trying to customize scripts for other locations.  If you

can't actually use /usr/local/genome, then you could make

/usr/local/genome be a link to the real location--that will work just

as well. 

 

78)  Make sure that /usr/local/genome/bin is in every consed users' PATH.

 

79) Put the consed executable in /usr/local/genome/bin

 

80)  Check this by logging on as a user and typing:

 

consed -V

 

You should see 'Version 9.0'.  If you see something else, you have

some debugging to do.

 

81)  Build phd2fasta:

Go to the misc/phd2fasta directory and type 'make'

Move the phd2fasta executable to /usr/local/genome/bin

 

82)  Build mktrace:

Got to the misc/mktrace/980701 directory and type 'make'

Move the mktrace executable to /usr/local/genome/bin

 

83)  Move all perl scripts from the scripts directory to

/usr/local/genome/bin

Make sure all are executable (chmod a+x *)

 

DELETE ANY PREVIOUS VERSIONS OF THESE SCRIPTS OR YOU WILL BE SORRY!

(Bugs have been fixed.)

 

84)  Get perl 5.  You can check where to get perl via the perl web

site:

 

    http://www.perl.com/perl/info/software.html

 

 

(If you don't know about perl, try it--it will save you a

huge amount of time over developing the same utilities in C, awk, or

csh or sh.) 

 

 

85)  From the misc subdirectory, copy primerCloneScreen.seq and

primerSubcloneScreen.seq to the directory

/usr/local/genome/lib/screenLibs

(You may have to create this directory.)

 

Take a look at these files.  They are dummy files indicating the fasta

format of the sequences that should be put in them.  You should put

into primerCloneScreen.seq the vector sequence of the cloning vectors

you are using (BAC or cosmid) and into primerSubcloneScreen.seq the

sequencing vectors you are using (plasmid, M13, etc).  Don't be too

generous in putting lots of vectors into the files!  The larger they

are, the slower primer picking will be.  Our files are only this big:

 

-rw-r--r--   1 root     root       29938 Nov  7  1997 primerCloneScreen.seq

-rw-r--r--   1 root     root        7381 Aug 13  1997 primerSubcloneScreen.seq

 

and primer picking is quite fast enough.

 

Now that you have set this up, you should try the PRIMER PICKING

sections (above) in the Quick Tour to make sure this works.  Note that

you should *not* do the temporary step in the beginning of PRIMER

PICKING.  That is because you want the primers screened against vector.

 

86)  You should also create a file

 

/usr/local/genome/lib/screenLibs/vector.seq

 

This contains all the vector that you want to mask out before

phrapping.  In general, it is the combination of primerCloneScreen.seq

and primerSubcloneScreen.seq

 

 

87)  You should also create a file

/usr/local/genome/lib/screenLibs/repeats.fasta

 

In this file, put any repeats that you want to have automatically

tagged.  These typically are ALU sequences.  If you don't want to tag

anything, then comment out (put '#' as the first character of the

line) the following lines in phredPhrap:

 

Change:

!system( \"$tagRepeats $szAceFileToBeProduced\" )

  || die \"some problem running $tagRepeats\";

 

to:

#!system( \"$tagRepeats $szAceFileToBeProduced\" )

#  || die \"some problem running $tagRepeats\";

 

 

88) determineReadTypes.perl

 

Phrap, Consed's primer picking, and Consed/Autofinish all need the

following information for each read:

          is it a univeral primer forward, a universal primer reverse, 

             or a walking read?

          what is its template name?

 

Generally this information can be determined from the read name, using

*your* naming convention.  Modify the perl script

determineReadTypes.perl to put this information at the end of the phd file

using WR info items.

 

Consed allows you to check that you have correctly modified

determineReadTypes.perl:  On the Main Consed Window, point to 'Info',

hold down the left mouse button, and release on 'Show Info for Each

Read'.  Check that the information presented is correct.  If, for

example, consed thinks that there are templates that have 9 or more

reads, it is likely that you have not correctly customized

determineReadTypes.perl

 

Once you have correctly customized determineReadTypes.perl, then

uncomment the line in phredPhrap which calls determineReadTypes.perl

 

 

TEST RUNNING PHREDPHRAP

 

89) See the section RUNNING PHRED and PHRAP (in the Quick Tour)

 

 

TESTING ADDING NEW READS

 

90)  It will make your life easier if phred, phrap, and crossmatch are

all where consed expects them:  in /usr/local/genome/bin

 

91)  Decide where to put phred's parameter file and edit both

addReads2Consed.perl and phredPhrap to reflect this location.  I

generally prefer to put it in /usr/local/genome/lib to keep all of the

phred/phrap/consed files in one place.  Alternatively, you could put

it in /usr/local/etc/PhredPar/phredpar.dat which is the historical

location of this file.

 

92) Next you should test the ADDING NEW READS step in the Quick Tour

(above).  This step requires that everything be set up correctly and

in the correct location.  Hopefully the error messages are clear

enough to help you if you have set up anything incorrectly.

 

 

USING YOUR OWN DATA

 

 

93)  Create the following directory structure:

 

Directory structure:

    top level directory (generally named after the BAC or cosmid)

        subdirectory 'chromat_dir'--chromatograms go in here

        subdirectory 'phd_dir'--phd files will automatically be put here

        subdirectory 'edit_dir'--ace files will automatically be put here

 

If you already have your chromatograms somewhere else, you can make

chromat_dir be a link to wherever you have them. 

 

The various phrap and crossmatch files will be put into edit_dir by

the phredPhrap script.

 

94)  cd to the edit_dir directory, and type:

 

phredPhrap

 

If you are successful, the script will tell you so and you can bring

up consed on the ace file:

 

95)  Type:

 

consed

 

You should see a file with the extension .ace.1

Double click on it.

 

You should see a list of contigs.

 

Double click on the one you want to see.

 

Follow the first few steps of the Quick Tour under USING CONSED

GRAPHICALLY above.  You should at least go as far as viewing traces.

 

 

96) Appending expid to the phd files

 

If you are using autofinish, and would like autofinish to tell you how

well your reads are succeeding, then the phd files must be appended

with the experiment id's.  In the 3 autofinish summary files

(*.univReverse, *.univForwards, and *.customPrimers), you will see

information like this:

 

univ rev,,,->,-329,-249,71,Contig1,3,djs228_1034

 

or this:

 

tgaagaaatggctgactcc,56,1,->,3258,3338,3658,Contig1,4,djs228_2813,5,djs228_168,6,djs228_1248

 

The '3' just before the djs228_1034 is an experiment id.  There is

also an expid '4' just before djs228_2813, an expid '5' before

djs228_168, and an expid '6' just before djs228_1248.

 

Autofinish doesn't know what you will end up calling these reads it is

telling you to make.  Autofinish only knows those reads by the numbers

3, 4, 5, and 6.  So when you make the reads, autofinish needs to be

informed that this is 'experiment 3' or whatever.  You do this by

appending in the phd file the following structure:

 

WR{

expid addExpid 990811:140818

5

}

 

where WR stands for 'whole read item',

      expid for 'expid'

      addExpid is the name of the program that you will write that

            will append this information

      990811:140818 is the date and time in format YYMMDD:HHMISS

      5 is the expid

 

This program must be run *after* phred runs to create the phd files.

Thus your program must have some method of determining what the expid

of each read is.  What the University of Washington Genome Center does

is to have the finishers put the expid as part of the filename.  This

makes it easy for a program to look at the phd file and figure out

what the expid is and then write the WR item into that phd file. 

 

Alternatively, you could keep a database and, after the phd file is

created, look into the database to see what the expid is.

 

When you have successfully added expid's to the phd files, the next

time you run autofinish on this project, it will have in the

'EVALUATE' section of the autofinish output file, lots of interesting

information about how well the reads succeeded.

 

 

 

USING NON-STANDARD LOCATIONS FOR FILES

 

You have a lot of work to do.  You will need to edit nearly every

script mentioned above.  In addition, you will need to make sure that

the CONSED_PARAMETERS environment variable is set for every user and

that the CONSED_PARAMETERS file points to the new locations for these files:

 

consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening: /usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq

consed.primersCloneFullPathnameOfFileOfSequencesForScreening: /usr/local/genome/lib/screenLibs/primerCloneScreen.seq

consed.primersBadTemplatesFile: badTemplates.txt

consed.fullPathnameOfAddReads2ConsedScript: /usr/local/genome/bin/addReads2Consed.perl

consed.fullPathnameOfCrossMatch: /usr/local/genome/bin/cross_match

consed.fullPathnameOfPhred: /usr/local/genome/bin/phred

 

 

As you can see, sticking with the default of /usr/local/genome will

make your life easier--not just at installation, but even in day to

day operations.  (Remember--/usr/local/genome could be just a link)

 

 

--------------------------------------------------------------------------

NOTE TO SGI USERS

 

In /usr/lib, there must be a file: libCsup.so

 

If you don't have this file, you must get it from SGI.  To get it, if

you are on Irix 6.2 through 6.4, request:

 

SG0001637 'C++ Exception handling patch for 7.00 (and above) compilers

on irix 6.2' (it's on the 'Development Options 7.1' CD).

 

If you are on Irix 5.3, install patch 1600

 

To make things easier for you, I've included my libCsup.so

This might save you having to get the patches above.

 

 

 

----------------------------------------------------------------------------

 

FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY

 

 

CONSED VERSION

 

On the command line, type:

 

consed -v

 

This is particularly useful to system administrators to make sure the

latest version is installed on all computers.

 

CONSED CUSTOMIZATION

 

Click on the 'Info' menu on the Main Consed Window and release on menu

item 'Show Consed Resources'.  This shows you what is available to be

changed by putting in your ~/.consedrc file.

 

Changes in ~/.consedrc only affect one user.  If you want to make a

change to affect all consed users on the system, put a file in some

central location (e.g., /usr/local/genome/lib/.consedrc ) and then

have every user set the environment variable CONSED_PARAMETERS to

that location:

 

setenv CONSED_PARAMETERS /usr/local/genome/bli/.consedrc

 

Anything the user puts in ~/.consedrc will override whatever is in the

CONSED_PARAMETERS file.

 

You can also have different parameters for different projects.  Put a

.consedrc file in the edit_dir of a particular project.  When you are

working on that project, whatever is in that .consedrc will override

whatever is in your ~/.consedrc file or the  CONSED_PARAMETERS file.

 

 

COMPRESSING CHROMATOGRAMS

 

If you are interested in compressing your chromatogram files, go into

chromat_dir and gzip one of the chromatogram files.  Make sure that

gunzip is in /usr/local/bin   (You can change this location via the

consed resource

 

consed.gunzipFullPath: /usr/local/bin/gunzip

 

--see CONSED CUSTOMIZATION (above), but it will be easiest for

you and your users if you just put gunzip in /usr/local/bin and not

have to bother with consed resources.)

 

Restart consed and bring up the corresponding trace.  You will notice

no appreciable delay.

 

 

CONSED -ACE

 

Try bringing up consed like this:

 

consed -ace (name of ace file)

 

This can be useful if you are going to have consed brought up from

some other program.

 

 

NO PHD FILES

 

Try bring up consed like this:

 

consed -nophd

 

This mode does not allow editing and does not show quality

information.  It allows you to view an assembly when you don't have

phd files or chromatograms but you only have the ace file.  You will

not be able to see the quality information, since that information is

kept in the phd files.  I do not recommend nor support this option!

 

 

 

CREATING CUSTOM TAG TYPES

 

 

The following consed resources are available for creating custom tag

types:

 

consed.tagColorCustomTag1:

consed.tagColorCustomTag2:

consed.tagColorCustomTag3:

consed.tagColorCustomTag4:

consed.tagColorCustomTag5:

consed.tagColorCustomTag6:

consed.tagColorCustomTag7:

consed.tagColorCustomTag8:

consed.tagColorCustomTag9:

consed.tagColorCustomTag10:

consed.tagColorCustomTag11:

consed.tagColorCustomTag12:

consed.tagColorCustomTag13:

consed.tagColorCustomTag14:

consed.tagColorCustomTag15:

consed.customTag1:

consed.customTag2:

consed.customTag3:

consed.customTag4:

consed.customTag5:

consed.customTag6:

consed.customTag7:

consed.customTag8:

consed.customTag9:

consed.customTag10:

consed.customTag11:

consed.customTag12:

consed.customTag13:

consed.customTag14:

consed.customTag15:

consed.tagColorCustomConsensusTag1:

consed.tagColorCustomConsensusTag2:

consed.tagColorCustomConsensusTag3:

consed.tagColorCustomConsensusTag4:

consed.tagColorCustomConsensusTag5:

consed.tagColorCustomConsensusTag6:

consed.tagColorCustomConsensusTag7:

consed.tagColorCustomConsensusTag8:

consed.tagColorCustomConsensusTag9:

consed.tagColorCustomConsensusTag10:

consed.tagColorCustomConsensusTag11:

consed.tagColorCustomConsensusTag12:

consed.tagColorCustomConsensusTag13:

consed.tagColorCustomConsensusTag14:

consed.tagColorCustomConsensusTag15:

consed.customConsensusTag1:

consed.customConsensusTag2:

consed.customConsensusTag3:

consed.customConsensusTag4:

consed.customConsensusTag5:

consed.customConsensusTag6:

consed.customConsensusTag7:

consed.customConsensusTag8:

consed.customConsensusTag9:

consed.customConsensusTag10:

consed.customConsensusTag11:

consed.customConsensusTag12:

consed.customConsensusTag13:

consed.customConsensusTag14:

consed.customConsensusTag15:

 

When you create a custom tag type, you specify its name and the color

you want it displayed in.

 

For example:

 

consed.tagColorCustomTag1: SlateBlue2

consed.tagColorCustomTag2: SlateBlue2

consed.tagColorCustomTag3: SlateBlue2

consed.tagColorCustomTag4: brown

consed.tagColorCustomTag5: MediumPurple

consed.tagColorCustomTag6: purple

consed.customTag1: polymorphismInsertion

consed.customTag2: polymorphismDeletion

consed.customTag3: polymorphismSubstitution

consed.customTag4: qualityCoreComment

consed.customTag5: coordinatorApproval

consed.customTag6: coordinatorComment

 

(All of these tag types are read tag types.  Consensus tag types are

specified separately--see the consed resource names (above).)

 

Once you have done this, the user of consed can add tags of these

types in the method described in TAGS of the Quick Tour (above).

 

 

ADDING TAGS FROM OTHER PROGRAMS

 

You can also write external programs that add tags to the ace file

and/or the phd files.  Both RT (read) and CT (consensus) tags can be

appended to the end of the ace file.  BEGIN_TAG tags can be appended

to the end of the phd files.  Do not rewrite the ace file or the phd

file--there is no need to do so and it will cause problems.

 

CONTROL OF CONSED FROM SOME OTHER PROGRAM

 

Consed can be controlled by some other program.  For example, you

might have a program that displays mapping data and you would like the

user to be able to click on a location and have consed come up showing

the bases in that region.  This feature allows a programmer to do

this.

 

 

The external program can start up consed as follows:

 

consed -socket (local port number) -ace (ace filename)

 

For example,

 

consed -socket 5432 -ace standard.fasta.screen.ace

 

After consed completes coming up (including you clicking whether you

want to apply edits), you will see the message in the xterm:

 

success bind to local port number: 5432

 

And then you will see a file created by consed in the default

directory called consedSocketLocalPortNumber

 

This gives the port number of the Berkeley socket that consed has

opened and is listening on.  Thus your program can read this file and

create a connection to the Berkeley socket created by consed.

 

Once the connection is established, your program can send commands to

consed at that socket indicating to consed which contig to display and

what consensus position to scroll to.  Currently, the only acceptable

commands are:

 

Scroll (contigname) (consensus position)<return>

PopupTraces (read name) (unpadded read position in the direction of sequencing)<return>

 

'Unpadded read position in the direction of sequencing' is the

position from the right end, if the read is a bottom strand read.

 

Just send such a command to the Berkeley socket, and consed will

respond appropriately.

 

 

 

AUTOMATIC ORDERING OF OLIGOS

 

I heard of a finisher who manually ordered 72 oligos.  She had to

cut/paste the bases of each oligo.  That is not only painful, but also

error prone.  I've supplied you a script that you can use to

automatically determine which oligos have been newly requested since

the last order, aggregate them into a single order, and email the

request off.

 

The script is ace2Oligos.perl.  It takes as parameters the name of an

ace file and the name of the oligo file.  The oligo file is a list of

oligos that have been ordered for that particular project, and looks

like this:

 

name=G1980A181.1

sequence=ctgcatggctaggga

template=seq from subclone

date=980427 temp=52

 

name=G1980A181.2

sequence=tcttactttctgactttcattt

template=seq from clone

date=980427 temp=50

 

ace2Oligos.perl finds all oligo tags in the ace file and makes sure

that all of them are in this oligo file.

 

To automatically order oligos each night, there is an additional

script you will have to write.  I suggest that you run your script

each night under cron and that it do the following:

 

for each project, it will look for the most recent ace file.  It will

run ace2Oligos.perl on that ace file and direct the oligo file to be

in the parent directory of edit_dir, phd_dir, and chromat_dir for that

project.  Thus there will be one oligos file for each project.  Your

script will run ace2Oligos.perl once for each project.

 

Then your script would, for each project, look in the oligos file for

new oligos, and aggregate the unordered oligos into a central file,

which it would email to the oligo company.  If it finds any new oligos

in an oligo file, it draws a line at the bottom:

 

-------------------------------

 

which indicates that all oligos have been ordered.  When this script

looks at this file the next night, it uses this line to determine

whether any additional oligos have been requested since the previous

order.  (The idea of this line came from St Louis.)  Thus the oligos

file tells you which oligos have been ordered and which have not yet

been ordered.

 

 

97)  CUSTOM NAVIGATION

 

In the Main Window, there is also a Navigate menu.  Pull it down and

release on the Custom Navigation menu item.  A box will popup saying

'Select custom navigation file:' 

There will be a file:

custom_navigation.nav

Double click on it.

 

You will see the now-familiar custom navigation box.  Click 'Next'

repeatedly until you get to the end of the list.

 

Consed doesn't write such a file--it just reads it.  This feature

allows you the ability to write your own programs that select

locations that you want your finishers to examine.  Your program

writes a file, the user reads that file into consed in this manner,

and you can go to each of the locations.

 

 

98)  LONG, LONG, LONG READ NAMES

 

If you have very long read names, you might not be able to see the

whole name in the Aligned Reads Window.  You can solve this by

increasing the consed resource:

 

consed.alignedReadsWindowMaxCharsForReadNames: 20

 

 

--------------------------------------------------------------------------

 

MONITORS AND MICE FOR CONSED

 

If your monitor is part of a Unix computer (a Sun, an HP, a DEC, an

SGI, or a Linux box) or is an Xterminal, then you will have absolutely

no problems.

 

You must have 3 button mouse or 3 button emulation.  3 Button

emulation is tricky since consed uses all 3 buttons of the mouse and

it also uses Control-Middle-Mouse-button, Shift-Middle-Mouse-Button

and Control-Right-Mouse-Button.  So if you are going to try to just

use a 2 button mouse (or, God-forbid, a 1 button mouse), you should

make sure that you can emulate each of those.  Often, if you push the

left and right mouse buttons at the same time, your X server will

interpret that to be the middle mouse button.  But you must consult

your X emulator or X server to know what it will do--that is out of

consed's control.

 

If your monitor is a PC running Windows or NT, then you must have an X

emulator installed and running.  X emulators include:  Exceed, XWin32,

Reflection X, and OpenNT.  Any of these will work if configured

correctly (and the 'correctly' is the key).  I encourage you to use

single window mode and then use a Unix window manager such as CDE,

fvwm, or mwm.

 

If your monitor is a MAC, then you must also have an X emulator, such

as Exodus or MACX installed and running.  You *must* use this emulator

in single window mode, and then use a Unix window manager such as CDE,

fvwm, or mwm.  (If you don't use single window mode, consed might

crash in some circumstances.)

 

 

--------------------------------------------------------------------------

 

PRIMER PICKING PARAMETERS

 

 

The following are primer picking resources.  Many are used for both

consed and autofinish.   There are some that are just used for

autofinish and some that are just used for consed.

 

A great deal of science and experimentation has gone into

setting these defaults and I suggest you do not change them until you

have experimented and know what you are doing. 

 

You can set these via the .consedrc file.

 

In addition, for a particular consed session, you interactively change

many of these in the following manner: On the main window, point to

'Options', hold down the left mouse button and release on 'Primer

Picking Preferences.'  You can modify the resource of interest and

then click on 'Apply and Dismiss'.  The new value of the resource will

be in affect only until you restart consed.

 

In the following, I have annotated the parameters with the following

symbols:

 

(YES)  freely customize to your own site

(OK)  don't change unless you have a specific need and know what you

        are doing

(NO)  don't change this!

 

 

This is what they mean (I suggest you skip over this for now):

 

consed.primersAssumeTemplatesAreDoubleStrandedUnlessSpecified: false

bool

! you can put the template type in the phd file in a WR template item

! consed will have a list of these and know which are single and

! double stranded

(YES)

 

consed.primersLookThisFarForForwardVectorInsertJunction: 125

int

! don't change this--if no X's this far from beginning of read, then

! assume that you are in insert

(NO)

 

 

consed.primersMinimumLengthOfAPrimer: 15

int

(YES)

 

consed.primersMaximumLengthOfAPrimer: 25

int

(YES)

 

consed.primersALittleLessThanAverageInsertSizeOfASubclone: 1500

int

// for finding templates

! used to calculate extent of a template for choosing templates

(YES)

 

consed.primersMaxInsertSizeOfASubclone: 3000

int

// for checking for false-annealing

! check +/- this distance from the primer for false-annealing

(YES)

 

consed.primersDNAConcentrationNanomolar: 50.0

double

! used for melting temperature--don't change this!

Consed uses the nearest-neighbor (with salt concentration

correction) formula, just as all modern primer picking

programs do

(NO)

 

consed.primersMaxMatchElsewhereScore: 17

int

! used for testing false-annealing to template and to vector

In choosing a primer, it is important that the primer not

stick somewhere besides the place you are trying to get a

read--a 'false match'.  This can cause a primer to fail even

if the false match is not perfect.  The worst kind of false

matches are those the extend to the 3' end of the primer, and

worse yet if they have a high percentage of G/C matches since

G and C bind more tightly than A and T.  The algorithm used

here takes both of these effects into account.  This parameter

sets the max acceptable false match.

 

In practice, it is this parameter that eliminates most

primers.  You can get consed to give you some primers by

raising this parameter, but if you do, you should be aware of

the danger of mispriming.  To make you aware of that danger,

you can do this: when you choose a primer (see above for how

to do this), look in the xterm.  It will show you the best

alignment of each primer with some other location in the

assembly.  By looking at this you will gain an idea of what

the PrimersMaxMatchElsewhereScore means and you won't be too

free about raising it above the default.

(OK)

 

consed.primersMaxMeltingTemp: 55

int

(YES)

 

consed.primersMaxSelfMatchScore: 6

int

! cutoff for self-annealing of a primer

In choosing a primer, you don't want the primer to bind to

itself (form a hairpin) or bind to another copy of itself.  It

is particularly bad if it binds to another copy at its 3' end.

This parameter is used in the algorithm that tests this.

(OK)

 

 

consed.primersMinMeltingTemp: 50

int

(YES)

 

consed.primersMinQuality: 30

int

! you must be sure of the sequence of a primer or it won't anneal to where you want

Some primers fail because the primers don't match where they

are supposed to.  This is because the sequence where the

primer is supposed to stick isn't accurately known.  Thus it

is important to be certain of the sequence where the primer is

chosen from.  This parameter is an indication of this

certainty--it is the min quality of every base in an

acceptable primer.

(NO)

 

consed.primersNumberOfBasesToBackUpToStartLooking: 50

int

Consed is designed for you to put the cursor on the left-most (or

right-most) edge of a region that you want to cover with a new read.

Since the data quality immediately after an oligo is not good, you

don't want the oligo immediately next to the region you want to cover,

but rather a little bit back from it.  This parameter gives how far

back.  e.g., if this is 50 and you want a read at position 1000,

primers will be searched before base 950 but not in the region 950 to

1000

 

This parameter is not used for autofinish--just for consed.

 

(OK)

 

consed.primersNumberOfTemplatesToDisplayInFront: 2

int

! this shows the number of templates to show in the interactive primer picking window

(OK)

 

consed.primersPickTemplatesForPrimers: false

bool

! when picking primers for subclone templates, pick templates also.

! If there is no suitable template for a primer, do not pick the

! primer.  If you like to pick your own templates, you might want to

! turn this off for a little improvement in speed. 

(YES)

 

consed.primersPrintInfoOnRejectedTemplates: true

bool

! whether to print which templates were rejected and why (this output can be large )

(OK)

 

consed.primersSaltConcentrationMillimolar: 50.0

double

! used for melting temperature--don't change this!

(NO)

 

consed.primersSubcloneFullPathnameOfFileOfSequencesForScreening: /usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq

RWCString

! vector sequence file if choosing subclone (e.g., M13, plastmid) templates

(OK)

 

 

consed.primersCloneFullPathnameOfFileOfSequencesForScreening: /usr/local/genome/lib/screenLibs/primerCloneScreen.seq

RWCString

! vector sequence file if choosing clone (e.g., cosmid, BAC) template

(OK)

 

consed.primersScreenForVector: true

bool

! whether or not to screen primers for annealing to vector

It is important that the primers not stick to the vector of the

template.  Thus you must provide consed with two files--a file in

fasta format of all subclone vectors, and a file in fasta format of

all clone vectors.  Consed will not accept any primer that has a match

against the appropriate one of these vectors (depending on whether you

are choosing primers for clone template or from subclone template).  A

primer that has a false match to a vector is rejected if that false

match has a score worse than PrimersMaxMatchElsewhereScore

(OK)

 

consed.primersMaxLengthOfMononucleotideRepeat: 4

int

Finishers have seen that primers with mononucleotide repeats fail more

often.  This parameter says that a primer with AAAA is acceptable but

AAAAA is not.

(OK)

 

consed.primersBadTemplatesFile: badTemplates.txt

FileName

! file of templates that you've tried, don't work, and you don't want to try again

(OK)

 

consed.primersToleranceForDifferentBeginningLocationOfUniversalPrimerReads: 100

int

! different forward reads or different reverse reads

! can differ by up to this amount in the starting location

! If they differ by more, then there is something wrong

! with the template (it is mislabeled?) so don't use it again for walking

(NO)

 

 

consed.primersTooManyVectorBasesInWalkingRead: 10

int

! if there are this many x's, then don't walk again on this template

(OK)

 

consed.primersWhenChoosingATemplateMinPotentialReadLength: 500

int

! when choosing templates for a custom primer, only choose a template

! if the read can be chosen at least this long

This currently can only be set via your .consedrc file.  It is used in

picking templates for a primer.  Clearly you don't want a template to

end too soon after the primer.  This parameter indicates the minimum

number of bases that a template must extend after the primer location.

(OK)

 

consed.primersWindowSizeInLooking: 450

int

This is the width of the region in which consed looks for

primers.  So if PrimersNumberOfBasesToBackupToStartLooking is

50 and PrimersWindowSizeInLooking is 450, and you are looking

for a forward primer, then the consed will look from 500 bases

to the left of the cursor up to 50 bases to the left of the

cursor.  If you are looking for a reverse primer, then consed

will start looking 50 bases to the right of the cursor and

continue until 500 bases to the right of the cursor.

(OK)

 

You can also read about this in the consed paper:

 

Gordon, D., C. Abajian, and P. Green. 1998. Consed: A graphical tool

for sequence finishing. Genome Research. 8:195-202

 

 

 

--------------------------------------------------------------------------

 

AUTOFINISH PARAMETERS

 

Autofinish uses many of the primer picking parameters.  Autofinish

also has additional parameters.

 

In the following, I have annotated the parameters with the following

symbols:

 

(YES)  freely customize to your own site

(OK)  don't change unless you have a specific need and know what you

        are doing

(NO)  don't change this!

 

bool means the value must be true or false

int means the value must be an integer

double means the value must be a decimal number

 

consed.autoFinishAllowWholeCloneReads: true

bool

A 'whole clone read' as opposed to a 'subclone read' is when the

sequencing template for the sequencing reaction is the entire

assembly.  If you are assembling a BAC, a whole clone read is one

that is sequenced directly off the BAC.  If the assembly is a full

length cDNA, then a whole clone read is one in which the sequencing

reaction is off a complete cDNA.

 

This resource tells autofinish that it is ok to suggest whole clone reads.

entire clone (BAC or cosmid).  If you don't want to use the whole BAC as a

template for any reads, change to false. 

(YES)

 

 

consed.autoFinishAverageInsertSize: 1500

int

used for calling reverses.  This determines the location of the potential

reverse read.  If you have a forward read already, autofinish uses this

number as an estimate of how far away the beginning of the reverse read

should end up.

(YES)

 

 

consed.autoFinishCallReversesToFlankGaps: true

bool

If there is a forward-reverse pair flanking a gap, print it out

If there is not, suggest reverses to flank the gap.  Useful to help align

and orient the contigs.

(YES)

 

 

consed.autoFinishCallHowManyReversesToFlankGaps: 2

int

How many forward/reverse pairs must flank a gap.  If there are fewer than

this number, autofinish will try to suggest more reverses to do. If there

are already this number or more forward/reverse pairs, it will list them

and not suggest any more.

(YES)

 

consed.autoFinishCloseGaps: true

bool

This allows you to turn off choosing reads to close gaps.   For example, if

you choose to close all gaps by PCRs using manually-picked primers, you

should change this to false.

(YES)

 

Cost Parameters:

consed.autoFinishCostOfResequencingUniversalPrimerSubcloneReaction: 20.0

consed.autoFinishCostOfCustomPrimerSubcloneReaction: 60.0

consed.autoFinishCostOfCustomPrimerCloneReaction: 80.0

consed.autoFinishCostOfDeNovoUniversalPrimerSubcloneReaction: 60.0

double

(YES)

 

Compares universal primer subclone resequencing reaction, universal

primer subclone denovo reaction (a reverse where you just have a

forward or a forward where you just have a reverse), custom primer

subclone reaction, custom primer clone reaction, and to decide which

to favor.  These parameters give you control over which type of

reactions autofinish prefers when it has a choice.

 

The default costs have been chosen by Seattle and St Louis.  They

reflect the fact that ordering an oligo is more expensive than using a

universal primer.  They also reflect the fact that whole clone

reactions (sequencing off the BAC) are more difficult to do than

subclone reactions (sequencing off the plasmid).

 

consed.autoFinishCoverSingleSubcloneRegions: true

bool

! this allows you to turn off choosing reads to cover single subclone regions

(YES)

 

 

consed.autoFinishCoverLowConsensusQualityRegions: true

bool

! this allows you to turn off choosing reads to cover low consensus quality regions

(YES)

 

consed.autoFinishCreateExpSummaryFiles: true

bool

! this allow you to turn off creating the 3 experiment summary files: forward universal primer, reverse universal primer, and custom primers

(OK)

 

consed.autoFinishDoNotFinishWhereTheseTagsAre: doNotFinish

RWCString

list of tag types separated by spaces.  E.g.,

doNotFinish repeat

tells autofinish that you are not interested in finishing in this region

(OK)

 

 

consed.autoFinishDumpTemplates: false

bool

! for debugging, this allows you to dump all information about the templates--insert locations

(OK)

 

consed.autoFinishExcludeContigIfOnlyThisManyReadsOrLess: 2

int

! exclude contigs that are probably E. coli contamination

(OK)

 

consed.autoFinishExcludeContigIfDepthOfCoverageOutOfLine: true

bool

(OK)

 

consed.autoFinishExcludeContigIfDepthOfCoverageThisMuchMoreThanLargestContig: 2.0

double

! exclude contig if its depth of coverage is much greater than other

! contigs (this indicates contamination)

(NO)

 

consed.autoFinishHowManyTemplatesYouIntendToUseForCustomPrimerSubcloneReactions: 2

int

! this tells autofinish which templates you are planning on using which is necessary to figure out which regions will still be single subclone regions

(YES)

 

consed.autoFinishMaxAcceptableErrorsPerMegabase: 100

int

! target error rate

(YES)

 

consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.1

double

! if an experiment solves fewer errors than this, it isn't worth doing so won't be chosen, even if the target error rate has not yet been achieved

(OK)

 

consed.autoFinishMinNumberOfForwardReversePairsToCalculateAverageInsertSize: 100

int

! if there are fewer forward/reverse pairs than this, then the parameter

! consed.autoFinishAverageInsertSize is used instead.  These parameters are

! when calling reverses to figure out where the reverse should go

(NO)

 

consed.autoFinishMinNumberOfGapErrorsFixedByAGapClosingExp: 30

int

(NO)

 

consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead: 50

int

! this tells autofinish when it wants to make a new custom primer read, how far this read must be from any previous custom primer reads on the same strand

(NO)

 

consed.autoFinishLookForRepeatedForwardUniversalPrimerReadThisFarAway: 200

int

! this tells autofinish how far to look for the tag of a previously called universal primer read

(NO)

 

consed.autoFinishNumberOfGapClosingReadsPerContigEnd: 3

int

! don't make any more experiments than this to extend into a gap

(YES)

 

consed.autoFinishMinNumberOfSingleSubcloneBasesFixedByAnExp: 1

int

! if an experiment will only fix less than this number of single subclone bases, don't do it even if the total number of single subclone bases in the contig is too high

(OK)

 

consed.autoFinishNumberOfBasesBetweenContigsAssumed: 200

int

! gap size--each base in the gap counts as 1 error so autofinish tries to extend into gaps

(NO)

 

consed.autoFinishPotentialHighQualityPartOfReadStart: 80

int

This is how far the high quality region of the read is from the

beginning of the read.

(OK)

 

consed.autoFinishPotentialHighQualityPartOfReadEnd: 300

int

       --------------------------------------------

       ^                 ^             ^

       beginning         A             B

       of read

 

       <----------------->

       consed.autoFinishPotentialHighQualityPartOfReadStart           

 

       <------------------------------->

       consed.autoFinishPotentialHighQualityPartOfReadEnd

 

You can adjust these depending on your assessment of the typical

quality of your data.

(OK)

 

consed.autoFinishReversesForFlankingGapsTemplateMustProtrudeFromContigThisMuch: 100

int

! we don't want these templates in which it goes into vector right at

! the end of the template

(OK)

 

consed.autoFinishTagOligosWhenDoExperiments: true

bool

! when autofinish is run with -doExperiments, tags the oligos

! it chooses

(OK)

 

consed.autoFinishTryHarderToSuggestExperimentsToCoverLowQualityRegions: true

bool

! consed tries to cover a low quality region with a read of a different strand

! or chemistry from the existing reads covering that area.  If it can't find

! any read of a different strand or chemistry, should it suggest a read of

! the same strand and chemistry as an existing read?  This parameter says \"yes\".

(OK)

 

 

----------------------------------------------------------------------------

 

NEW ACE FILE FORMAT

 

There is a new ace file format (since early 1998).  If you still

haven't changed to the new ace file format, you must do so now since

it contains information that is not contained in the old ace file

format.  This additional information (e.g., the alignment and quality

clipping values) are essential for some of the consed functions (e.g.,

navigate by single stranded, navigate by single subclone, autofinish)

to work correctly.

 

Another reason to switch to the new ace format is that you will get

faster consed startup performance.  The new ace file format is also

much smaller (about 60% as big as the old).

 

The new phrap (Aug 1998 and better) writes the new ace format (using

the -new_ace switch).  Since consed now uses the additional

information found only in the new ace format, if you are editing an

assembly, you should first re-phrap to take advantage of this

additional information.

 

Consed can read either old or new ace format.

Consed can also write either new or old ace format.  It write the new

ace format by default--see 'Options'/'General Preferences'.  Also see

the consed resource:

 

consed.writeThisAceFormat: 2

 

(where 2 means 'new' and 1 means 'old')

 

If you have scripts that read the ace file, you will need to modify

those scripts for the new ace format.  Here is the format:

 

Ace File Format

 

Refer to the accompanying sample_ace_file.txt (below)

 

AS <number of contigs> <total number of reads in ace file>

 

CO <contig name> <# of bases> <# of reads in contig> <# of base segments in contig> <U or C>

 

The U or C indicates whether the contig has been complemented from the

way phrap originally created it.  Thus this is always U for an ace

file created by phrap.

 

BQ

 

This starts the list of base qualities for the unpadded consensus

bases.  The contig is the one from the previous CO, hence no name is

needed here.

 

AF <read name> <C or U> <padded start consensus position>

 

This line replaces the 'AssembledFrom*' line in the previous ace file

format.  C or U means complemented or uncomplemented.  The <read name>

is the true read name (no .comp on it as with the previous ace file

format.)

 

BS <padded start consensus position> <padded end consensus position> <read name>

 

This replaces the 'BaseSegment*' line from the previous ace file format.

 

RD <read name> <# of padded bases> <# of whole read info items> <# of read tags>

 

QA <qual clipping start> <qual clipping end> <align clipping start> <align clipping end>

 

This is new information not found in the previous ace file.  If the

entire read is low quality, then <qual clipping start> and <qual

clipping end> will both be -1.  These positions are offsets from the

left end of the read (left, as shown in consed).  Hence for bottom

strand reads, the offsets are from the end of the read.  The offsets

are 1-based.  That is, if the left-most base is in the aligned,

high-quality region, <qual clipping start> = 1 and <align clipping

start> = 1 (not zero).

 

DS CHROMAT_FILE: <name of chromat file> PHD_FILE: <name of phd file> TIME: <date/time of the phd file> CHEM: <prim, term, unknown, etc> DYE: <usually ET, big, etc> TEMPLATE: <template name> DIRECTION: <fwd or rev>

 

There can be additional information on this line.

This is replaces the DESCRIPTION line from the old ace file.

 

The following is for transient read tags (those generated by

crossmatch and phrap).  They are not fully implemented, and the format

may eventually change.  The read is implied by the location of the

whole read info item within the ace file.  They are found after the WR

lines for a read.

 

RT{

<read name> <tag type> <what program created tag> <padded cons pos start> <padded cons pos end> <date when tag was created in form YYMMDD:HHMISS>

}

 

for example:

 

RT{

djs14_680.s1 matchElsewhereLowQual phrap 904 933 990823:114356

}

 

There are consensus tags now in the ace file.  All consensus tags have

the following format:

 

CT{

<contig name> <tag type> <what program created tag> <padded cons pos start> <padded cons pos end> <date when tag was created in form YYMMDD> <NoTrans>

(possibly additional information)

}

 

The NoTrans is optional--it indicates that, when you reassemble, this

tag should not be transferred to the new assembly.  This is true with

tags that should be recreated each time because they have to do with

the assembly (e.g., repeat tags).

 

e.g.,

 

CT{

Contig206 repeat tagRepeats.perl 118732 119060 990823:115033 NoTrans

AluY

}

 

In the case of most consensus tag types, there is only 1 line for the

consensus tag.  In the case of comment tags and oligo tags, there are

additional lines of information.  The comment tag includes the comment

on the additional lines.  The oligo tag has the following information:

<oligo name> <oligo bases from 5' to 3'> <melting temp> <C or U

indicating whether the oligo is top strand or bottom strand relative

to the orientation of the contig as created by phrap>

 

WA{

<tag type> <what program created tag> <date tag was created in form YYMMDD:HHMISS>

1 or more lines of data

}

 

This line is a 'whole assembly' tag.  It is used for information

referring to the assembly as a whole.  Currently, phrap puts its

version and phrap command line options in a WA tag.

 

You can append CT, WA, and RT tags to the end of the ace file in any

order you like.

 

 

 

------------------------------------------------------------------------

 

WHAT THE COLORS MEAN

 

 

See the beginning of the Quick Tour.  But here is a very partial list

of the colors:

 

Greyscale of background indicates quality

Grey base with black background--clipped off part of read (either due

to low quality or due to alignment)

Red base--discrepant with consensus

Black base--agrees with consensus

Colored area covering half of a base--tag (see Quick Tour)

Purple tag--more than 1 tag covering a base

 

 

This document was last updated on October 27, 1999 by Andreas Matern