This is a brief tutorial on how to use the command-line BLAST on the Dell Server at the Theory Center. The examples will be shown using Windows NT, but there should be only slight differences if you are using a macintosh or a unix machine. In all of the examples the username is amatern (that's me). Anytime you see amatern in this tutorial, you should see your userid on the screen.
Contents:
Logging in: there are two ways to log into charybids -- via telnet or terminal server client.
Generally terminal server client is in your Start menu, if it isn't please use
telnet for this demo and ask someone about getting Terminal Server Client
installed on your machine.
Enter charybdis.tc.cornell.edu in the server field and enter 1280x1024 for the resolution.
(N.B. Your picture might vary slightly. That's OK. You might not be in the same domain as I am.)
Enter your userid at the logon prompt and your password at the password prompt and ctc_ith at the domain prompt:
Now open a Command Prompt window and continue.
Use telnet (Comet on a macintosh) to connect to: charybdis.tc.cornell.eduThe
easiest way to do this on a windows machine is to go to the Start Menu, select
Run...
And then, in the window that opens type: telnet charybdis.tc.cornell.edu and
then hit Enter or click OK (Windows2000 users, click here)
The following window should appear:
Enter your userid at the logon prompt and your password at the password prompt and ctc_ith at the domain prompt:
Enter N at the Use Color Codes? Prompt
The window then changes, giving you the following.
You are now logged into the ctc_ith cluster at the Cornell Theory Center.ctc = Cornell Theory Center and ith = Ithaca.The ctc_ith cluster is a number of networked WindowsNT machines.When you logged into charybdis, a little program called a login script assumed you wanted to be in your 'home space' which is the H: drive.The H: drive is a terabyte hard drive where you should store ALL of your stuff.
The logon script executed two commands for you:
h: (all by itself) - "Change the active disk drive to h". In Windows, drive names are in the format X: where X is the 'name' of the drive (a single letter) and the colon signifies to DOS that it's a disk drive partition.(This is different from **IX where partitions and directories are accessed in almost the same way).
cd \users\amatern - "Change the working directory to h:\users\amatern".The cd command stands for change directory.All of the users of Velocity have a directory on H: in the users directory.So, in this example, my home directory is h:\users\amatern
Charybdis has two hard drives, one's called C: and the other's called D:
EVERYTHING YOU WANT/NEED/DESIRE IS ON D:
Please don't put files on C:
Please don't even go to C:
Thank you.
The D drive has a number of directories, the important ones are:
d:\bin\
This is where all the programs that are available reside.To see what programs are there, simply type:
d: changes the drive to the d: drive
cd bin changes directory to the bin directory
dir gives you a directory listing of all the files in bin
The most important file in this tutorial is the blastall.exe program.To see
if it's there type:dir blastall.exe
There it is:blastall.exe
D:\bin\ this is where the executable files are including BLAST. More are coming soon as we convert UNIX to NT. The path of any of the files in this directory is: d:\bin\filename e.g. d:\bin\blastall
D:\blast\ this is where the ncbi toolkit source code is. We’ll try to write up how to use some of these cool tools at a later date. PLEASE don’t erase, move, rename, edit, etc. these files.
D:\blast\data\ this is where the substitution matricies are.
D:\databases\ here’s where the databases are downloaded to and formatted for BLASTING. They are updated twice a month.
They are:
D:\databases\arabidopsis\arabidopsis -- all the ncbi ararbidopsis sequences
D:\databases\est\est – dbEST
D:\databases\nr\nr – the non-redundant amino acid database
D:\databases\nt\nt – the non-redundant nucleotide database
D:\databases\sol\soldb – the solanaceous database, includes everythingfrom our lab as well as NCBI
D:\databases\vector\vector - all the cloning vector sequences from NCBI
D:\databases\ecoli\ecoli.nt - the nucleotide sequence of the E. Coli genome
D:\databases\solestcontigs1199\sol.fasta.screen.contigs - contigs made from tomato ESTs with phrap in late October
D:\databases\atbacdb\atbacdb - the Arabidopsis BAC 'tiling path' database
Documentation on the NCBI derived databases is here:
http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html
The next directory of import today is d:\synteny\
This is where all of the fasta formatted files are. If you are looking for a sequence, it should be in here.This directory also has info re: the Arabidopsis-Tomato synteny project.
D:\synteny\bactilingpath-1199\ - the sequences and the Excel spreadsheet for the arabidopsis BAC 'tiling path' from Atdb
D:\synteny\EST\ - here's where the FASTA formatted EST sequences are
D:\synteny\EST\tigr\ - all the TIGR ESTs in appropriate subdirectories, so TMEA101 is in d:\synteny\EST\tigr\tme\
D:\synteny\EST\cereon\ - divided into appropriate subdirectories, so cC-esflc010001 is in D:\synteny\EST\cereon\esflc\
D:\synteny\EST\monsanto\ -all the monsanto ESTs no subdirectories
D:\synteny\EST\novartis\ -all the novartis ESTs no subdirectories
D:\synteny\tom-est-contig-1199\ -the tomato EST contigs made in late October 1999
get the FASTA sequence for TPSAD70TH and use BLASTN against the soldb database.
1) make sure you are telnetted to charybdis.tc.cornell.edu
2) make sure you are in your home directory (e.g. h:\users\amatern)
a. to do this from where we left off before, just type h:
b. if the directory isn't h:\users\<yourid> just type cd \users\<yourid>
3) type the following command:
blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p blastn -d d:\datbases\soldb\soldb
Here are the blastall parameters I used:
-i name of the input file
-o name of the output file
-T HTML output (T or F)
-a number of processors to use (charybdis has 4)
-p program name (blastn, blastx, blastp, tblastn, tblastx)
-d database name
The output file (TPSAD70TH.html ) will be an html formatted BLASTN output which will be in the h:\users\amatern\ directory.
If you use the -T T flag (HTML) you can view your results in a web browser (Netscape, etc.).
Here's the same search against the arabidopsis tiling path database
blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p blastn -d d:\datbases\atbacdb
One of the benefits of having a local BLAST server is that we get to do comparisons that are not possible via the web. For example:
get the FASTA sequence for TPSAD70TH and use TBLASTX against the solanaceaous database
blastall -i d:\synteny\EST\tigr\tps\TPSAD70TH -o h:\users\amatern\TPSAD70TH.html -T T -a 4 -p tblastx -d d:\datbases\sol\soldb -M d:\blast\data\blosum62
There are only two differences between this search and the last search:
-p tblastx (compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.)
-M substitution matrix.
The default matrix is blosum62. The matricies are all in d:\blast\data
To see what the blastall parameters are, you can look at the web page at: http://charybdis.tc.cornell.edu/blastall.html
If you have a sequence that's not on charybis, all you need to do is format it correctly (FASTA-format) and save it to a text file. All FASTA format means is that the sequence is in IUPAC codes and the header begins with a > (greater than) and is no longer than one line. For example:
>mysequence
ACTGTCGATCGTCGATCGAT
One of the ways you might want to do this is getting a sequence from Genbank.
Once the text is on the clipboard, you can paste it into a text editor. All windows machines come with a program called Notepad. From the Start Menu select Accessories and then Notepad. Paste your sequence into Notepad and then Save As whatever you'd like it to be called. The filename is important, as you'll need it for blastall later.
To blast this sequence, simply enter: blastall -i h:\users\amatern\mysequence.txt -o h:\users\amatern\mysequence.html -T T -a 4 -p tblastx -d d:\databases\sol\soldb
If you'd like to get sequences from genbank, simply go to: http://www.ncbi.nlm.nih.gov/Entrez/
Retrieving sequences from Genbank is self explanatory. Remember to save the file as fasta formatted text in your h:\users\<yourid>\ directory. Once you do that, you can simply issue the blastall command!
If you want to perform a lot of blasts at once, you don't have to sit at the terminal and type them in one at a time. You can make a text file with all the commands you want to happen, name it something.bat and execute that command. DOS will then run each command in order.
Here's a trivial example: example1.bat
REM this is a remark, a comment
REM it's useful to put comments in programs
that you will be re-using
blastall -i d:\synteny\EST\monsanto\cd22-f2a
-o h:\users\amatern\monsantoout\cd22-f2a.html -a 4 -T T -p tblastx -M
d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb
blastall -i d:\synteny\EST\novartis\ tomato010104-t3
-o h:\users\amatern\novartisout\tomato010104-t3.html -a 4 -T T -p tblastx -M
d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb
blastall -i d:\synteny\EST\tigr\tca\tcaaa13th
-o h:\users\amatern\tigrout\tcaaa13th.html -a 4 -T T -p tblastx -M
d:\blast\data\blosum62 -d d:\databases\atbacdb\atbacdb
REM all done!
All the example file does is run three BLASTs one at a time.
Here's what the file looks like in Notepad:
Then, to execute the commands, save the Notepad document as filename.bat
In this example, I saved the file in my h:\users\amatern\ directory because that's where you should save all of your files! Remember to give the file the .bat extension!
Now, from charybdis, all I need to do is type the filename, and viola! All the blastall commands are executed by charybdis.....
This little tutorial is obviouisly not meant to show you how to do all of
the file manipulations you'd probably like to do, but:
There's a pretty good tutorial on using Windows NT here: http://www.freeskills.com/
Go to the Start Menu, Select Run and enter telnet
A telnet window opens
Type unset ntlm
Type open charybdis.tc.cornell.edu
(Back to Tutorial)