BLASTing the Night Away

This is a brief tutorial on how to use the command-line BLAST on the Dell Server at the Theory Center. The examples will be shown using Windows NT, but there should be only slight differences if you are using a macintosh or a unix machine. In all of the examples the username is amatern (that's me). Anytime you see amatern in this tutorial, you should see your userid on the screen.

 

Contents:

  1. Logging In
    1. telnet
    2. terminal server client
  2. Directory Structure  
  3. Examples
  4. TBLASTX and amino acid searches using nucleotide sequences
  5. BLASTing your own sequences a Genbank example
  6. Batch Files - doing more than one BLAST at a time
  7. Information on Windows NT

Logging in:  there are two ways to log into charybids -- via telnet or terminal server client.
 
 

Logging In - Terminal Server Client


Generally terminal server client is in your Start menu, if it isn't please use telnet for this demo and ask someone about getting Terminal Server Client installed on your machine. 

Enter charybdis.tc.cornell.edu in the server field and enter 1280x1024 for the resolution.


 


(N.B. Your picture might vary slightly.  That's OK.  You might not be in the same domain as I am.)

Enter your userid at the logon prompt and your password at the password prompt and ctc_ith at the domain prompt:


 


Now open a Command Prompt window and continue.

 


 

 


Logging In - Telnet


Use telnet (Comet on a macintosh) to connect to: charybdis.tc.cornell.eduThe easiest way to do this on a windows machine is to go to the Start Menu, select Run...



And then, in the window that opens type: telnet charybdis.tc.cornell.edu and then hit Enter or click OK (Windows2000 users, click here)



The following window should appear:



Enter your userid at the logon prompt and your password at the password prompt and ctc_ith at the domain prompt:



Enter N at the Use Color Codes? Prompt

The window then changes, giving you the following.



What happened?

You are now logged into the ctc_ith cluster at the Cornell Theory Center.ctc = Cornell Theory Center and ith = Ithaca.The ctc_ith cluster is a number of networked WindowsNT machines.When you logged into charybdis, a little program called a login script assumed you wanted to be in your 'home space' which is the H: drive.The H: drive is a terabyte hard drive where you should store ALL of your stuff.

The logon script executed two commands for you:

h: (all by itself) - "Change the active disk drive to h". In Windows, drive names are in the format X: where X is the 'name' of the drive (a single letter) and the colon signifies to DOS that it's a disk drive partition.(This is different from **IX where partitions and directories are accessed in almost the same way).

cd \users\amatern - "Change the working directory to h:\users\amatern".The cd command stands for change directory.All of the users of Velocity have a directory on H: in the users directory.So, in this example, my home directory is h:\users\amatern

Directory Structure

Charybdis has two hard drives, one's called C: and the other's called D:

EVERYTHING YOU WANT/NEED/DESIRE IS ON D:

Please don't put files on C:

Please don't even go to C:

Thank you.

The D drive has a number of directories, the important ones are:

d:\bin\

This is where all the programs that are available reside.To see what programs are there, simply type:



d:         changes the drive to the d: drive

cd bin   changes directory to the bin directory

dir        gives you a directory listing of all the files in bin

The most important file in this tutorial is the blastall.exe program.To see if it's there type:dir blastall.exe



There it is:blastall.exe

D:\bin\ this is where the executable files are including BLAST. More are coming soon as we convert UNIX to NT. The path of any of the files in this directory is: d:\bin\filename e.g. d:\bin\blastall

D:\blast\ this is where the ncbi toolkit source code is. We’ll try to write up how to use some of these cool tools at a later date. PLEASE don’t erase, move, rename, edit, etc. these files.

D:\blast\data\    this is where the substitution matricies are.

D:\databases\ here’s where the databases are downloaded to and formatted for BLASTING. They are updated twice a month.

They are:

D:\databases\arabidopsis\arabidopsis -- all the ncbi ararbidopsis sequences

D:\databases\est\est – dbEST

D:\databases\nr\nr – the non-redundant amino acid database

D:\databases\nt\nt – the non-redundant nucleotide database

D:\databases\sol\soldb – the solanaceous database, includes everythingfrom our lab as well as NCBI

D:\databases\vector\vector - all the cloning vector sequences from NCBI

D:\databases\ecoli\ecoli.nt - the nucleotide sequence of the E. Coli genome

D:\databases\solestcontigs1199\sol.fasta.screen.contigs - contigs made from tomato ESTs with phrap in late October

D:\databases\atbacdb\atbacdb - the Arabidopsis BAC 'tiling path' database

Documentation on the NCBI derived databases is here:

http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html

The next directory of import today is d:\synteny\

This is where all of the fasta formatted files are. If you are looking for a sequence, it should be in here.This directory also has info re: the Arabidopsis-Tomato synteny project.

D:\synteny\bactilingpath-1199\         - the sequences and the Excel spreadsheet for the arabidopsis BAC 'tiling path' from Atdb

D:\synteny\EST\                                 - here's where the FASTA formatted EST sequences are

D:\synteny\EST\tigr\              - all the TIGR ESTs in appropriate subdirectories, so TMEA101 is in d:\synteny\EST\tigr\tme\

D:\synteny\EST\cereon\                    - divided into appropriate subdirectories, so cC-esflc010001 is in D:\synteny\EST\cereon\esflc\

D:\synteny\EST\monsanto\    -all the monsanto ESTs no subdirectories

D:\synteny\EST\novartis\                  -all the novartis ESTs no subdirectories

D:\synteny\tom-est-contig-1199\       -the tomato EST contigs made in late October 1999

Examples

get the FASTA sequence for TPSAD70TH and use BLASTN against the soldb database.

1)      make sure you are telnetted to charybdis.tc.cornell.edu

2)      make sure you are in your home directory (e.g. h:\users\amatern)

a.       to do this from where we left off before, just type h:

b.      if the directory isn't h:\users\<yourid> just type cd \users\<yourid>

3)      type the following command:

blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p blastn -d d:\datbases\soldb\soldb

Here are the blastall parameters I used:

-i          name of the input file

-o         name of the output file

-T        HTML output (T or F)

-a         number of processors to use (charybdis has 4)

-p           program name (blastn, blastx, blastp, tblastn, tblastx)

-d         database name

The output file (TPSAD70TH.html ) will be an html formatted BLASTN output which will be in the h:\users\amatern\ directory.

If you use the -T T flag (HTML) you can view your results in a web browser (Netscape, etc.).

Here's the same search against the arabidopsis tiling path database

blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p blastn -d d:\datbases\atbacdb

TBLASTX and amino acid searches using nucleotide sequences.

One of the benefits of having a local BLAST server is that we get to do comparisons that are not possible via the web.  For example:

get the FASTA sequence for TPSAD70TH and use TBLASTX against the solanaceaous database

blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p tblastx -d d:\datbases\sol\soldb -M d:\blast\data\blosum62

There are only two differences between this search and the last search:

-p    tblastx (compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a  nucleotide sequence database.)

-M    substitution matrix.  The default matrix is blosum62.  The matricies are all in d:\blast\data
 

To see what the blastall parameters are, you can look at the web page at: http://charybdis.tc.cornell.edu/blastall.html
 

BLASTING your own sequence - GenBank Example

If you have a sequence that's not on charybis, all you need to do is format it correctly (FASTA-format) and save it to a text file.  All FASTA format means is that the sequence is in IUPAC codes and the header begins with a > (greater than) and is no longer than one line.  For example:

>mysequence

ACTGTCGATCGTCGATCGAT

One of the ways you might want to do this is getting a sequence from Genbank.

Once the text is on the clipboard, you can paste it into a text editor.  All windows machines come with a program called Notepad.  From the Start Menu select Accessories and then Notepad. Paste your sequence into Notepad and then Save As whatever you'd like it to be called.  The filename is important, as you'll need it for blastall later.

 


 

To blast this sequence, simply enter:  blastall -i h:\users\amatern\mysequence.txt -o h:\users\amatern\mysequence.html -T T -a 4 -p tblastx -d d:\databases\sol\soldb

If you'd like to get sequences from genbank, simply go to:  http://www.ncbi.nlm.nih.gov/Entrez/

Retrieving sequences from Genbank is self explanatory.  Remember to save the file as fasta formatted text in your h:\users\<yourid>\ directory.  Once you do that, you can simply issue the blastall command!

 

Doing Lots of BLASTs at once - making Batch files

If you want to perform a lot of blasts at once, you don't have to sit at the terminal and type them in one at a time. You can make a text file with all the commands you want to happen, name it something.bat and execute that command. DOS will then run each command in order.

Here's a trivial example: example1.bat

REM this is a remark, a comment

REM it's useful to put comments in programs that you will be re-using

blastall -i d:\synteny\EST\monsanto\cd22-f2a -o h:\users\amatern\monsantoout\cd22-f2a.html -a 4 -T T -p tblastx -M d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb

blastall -i d:\synteny\EST\novartis\ tomato010104-t3 -o h:\users\amatern\novartisout\tomato010104-t3.html -a 4 -T T -p tblastx -M d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb

blastall -i d:\synteny\EST\tigr\tca\tcaaa13th -o h:\users\amatern\tigrout\tcaaa13th.html -a 4 -T T -p tblastx -M d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb

REM all done!

All the example file does is run three BLASTs one at a time.

Here's what the file looks like in Notepad:



  Then, to execute the commands, save the Notepad document as filename.bat

 


 


In this example, I saved the file in my h:\users\amatern\ directory because that's where you should save all of your files!  Remember to give the file the .bat extension!

Now, from charybdis, all I need to do is type the filename, and viola!  All the blastall commands are executed by charybdis.....

Information on Windows NT

This little tutorial is obviouisly not meant to show you how to do all of the file manipulations you'd probably like to do, but:
There's a pretty good tutorial on using Windows NT here: http://www.freeskills.com/
 

Windows 2000 and Telnet


Go to the Start Menu, Select Run and enter telnet
A telnet window opens
Type unset ntlm
Type open charybdis.tc.cornell.edu
(Back to Tutorial)