SYNTOM Meeting Notes - ALM - July 14, 1999

Last Time: (7/12/99)

Labs

People

Authorships

Tissues

Deleted-Sequences

Sequence_Synonyms

Sequencing_reactions

Clones

Clone Synonyms

Bases

Parameters

Steps

Processes

Notes:

· I've hopefully attached the schema table output from this morning, please note what included tables look like from other developers. We should IMHO, use Dave's software engineering idea to have a "stable release" version of the tables, as well as each developer having their own tables. I don't think this actually is controlled by Projects (which appear to be specific to forms, graphics, PL/SQL, etc.). Perhaps we should define a "new user" named Stable which will be the owner of the stable tables.

· Using schema builder I was unable to change the field name Organism to Taxon Id as suggested in the last meeting. I suspect I'll have to change that manually, using SQL.

· Included in the "block outline" but not in the schema is the "Tissues" table. Last time we had talked about creating a "tissue hierarchy" so that tissues in library tables would be a controlled vocabulary with pre-defined relationships. I did not try to make those tables yet.

· I've spent the time since we last met trying to a) figure out how to get data into Oracle b) figure out how "sequences" will be represented and c) use some of the other tools to begin shaping the first cut of a user interface. I've played with Forms builder a little:

This is a really simple form based on the People table. It won't look like this in the future, obviously. Note the large "box" on the bottom of the screenshot. That was supposed to be a graphic that I somehow messed up. While doing this it became clear that most of Oracle's tools are meant to be guides not solutions. We'll be doing LOTS of coding to make this thing work correctly.

I recommend Oracle PL/SQL Programming by Steven Feuerstein.

Before we move on to the table descriptions, some quick notes:

Trying to enter data into Oracle, I found out some fun things:

Sequences: not the biological kind, the Oracle kind. Sequences aren't (as I naively thought) a data type.

Sequences work like this: (page 498 in the big book)

create sequence customerID increment by 1 start with 1;

insert into CUSTOMER

(Name, Contact, ID)

values ('Pizzeria Uno', 'Lisa', customerID.NextVal);

1 longs -- you can't use string operators on a long. That might be problematic. For example, the command LENGTH (string) won't work on a Long. That's going to be a pain when we want to know how many bp long a sequence is.

To discuss: implementing GenBank and other sequences into our model now that the old "sequence" table is in fact a sequencing reaction table. N.B. Vectors won't point to sequencing rxns anymore.

I discussed GenBank sequence records on June 15, and I'll recreate some of those notes here.

Here's an example of one of our ESTs as it is stored in GenBank:

LOCUS AI782850 424 bp mRNA EST 29-JUN-1999

DEFINITION EST263729 tomato susceptible, Cornell Lycopersicon esculentum cDNA

clone cLES20P24, mRNA sequence.

ACCESSION AI782850

NID g5280891

VERSION AI782850.1 GI:5280891

KEYWORDS EST.

SOURCE tomato.

ORGANISM Lycopersicon esculentum

Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;

euphyllophytes; Spermatophyta; Magnoliophyta; eudicotyledons; core

eudicots; Asteridae; euasterids I; Solanales; Solanaceae; Solanum;

Potatoe; Lycopersicon.

REFERENCE 1 (bases 1 to 424)

AUTHORS D' Ascenzo,M., He,X., Lyman,J., Matern,A.L., Vision,T., Holt,I.E.,

Liang,F., Upton,J., Ronning,C.M., Craven,M.B., Fujii,C.Y.,

Bowman,C.L., Nierman,W., Fraser,C.M., Venter,J.C., Tanksley,S.D.,

Giovannoni,J.J. and Martin,G.B.

TITLE Generation of ESTs from Pseudomonas susceptible tomato

JOURNAL Unpublished (1999)

COMMENT

Contact: David Frisch

Clemson University Genomics Institute

Clemson University

100 Jordan Hall, Clemson, SC 29634, USA

Tel: 864 656 4366

Fax: 864 656 4293

Email: dfrisch@CLEMSON.EDU

5 prime sequence.

FEATURES Location/Qualifiers

source 1..424

/organism="Lycopersicon esculentum"

/cultivar="R11-13 (Rio Grande x Money Maker)"

/note="Vector: pBlueScript SK(-); Site_1: EcoR1; Site_2:

Xho1; cLES - Tomato Pseudomonas Susceptible EST Library.

Directionally cloned cDNAs inserted into pBlueScript

SK(-) at 5' end with EcoRI and 3' end with XhoI site"

/db_xref="taxon:4081"

/clone="cLES20P24"

/clone_lib="tomato susceptible, Cornell"

/tissue_type="leaf"

/dev_stage="4-week old"

/lab_host="SOLR"

BASE COUNT 116 a 70 c 116 g 122 t

ORIGIN

1 gttcaagttg ttgattctga ttcataacca gatagtgaac tttctgataa tcctccagag

61 atatatgatt gtcctgatcc agagttcagt gattttgata agcataggga agaaagttgc

121 tttgctgttg accaaatctg ggcttgttat gatacagctg atggaatgcc aagattctat

181 tgtcagatta ggagagttgc gtgtcctgaa tttgagctac ggggcacctg gctcgaggct

241 aatccagagg atcgaagaga catggagtgg gtagaggcag aattgcctgc tggttgtggg

301 aaatttaaac gtgggagttc tcaaatcagt aatgatcggc ttacattctc tcatctagtg

361 cagatcacac agggtaagag aggtgcattc attgtatatc ctaggaaagg ggagacatgg

421 gctc

I'd like to talk about:

· Data input - CGI?

· I'm turning these handouts into HTML, any objections?

· I'm not going to print out the whole schema anymore, just the changes and the topic of the day

· Sequences - GenBank et al

· The Sequencing Reactions table -- we're not done, we need to add quality.

· BLAST results

· Contigs

· Other tables we may have missed?

Things we still need to talk about:

· Mapping

· Expression

· Theresa's tables

· The day-to-day roles that SynTom plays as a laboratory information system and as a WWW based database