Meeting Notes June 23, 1999

In order to more effectively use our time to design the database, some suggestions were made:

"Textbook" database design involves an analysis phase -- a detailed investigation of the required function and the information required to execute these functions (entities with their attributes and relationships)
It was agreed that we should begin looking at the functions that are required of the database (questions, data generation/handling, etc.) and we should draw up functional hierarchies.
In addition, as the information required for the functions becomes clear, entity relationship diagrams should be presented to give the group an 'overview' of datatypes and data handling strategies.
As a general outline for future meetings we'd like to split the 2 hours into:

High-Level Design

Defining a mission statement

Stage Planning - Converting the unfathomable task of designing a database into several manageable tasks

General function definitions

General entity definitions

Low-Level Design

Table definitions

Column definitions

Table constraints

Unique identifiers

Relationships

Views….

The database requires specific goals -- it should NOT be a repository for literature, for example. In fact the word "repository" in general was shunned. More "active" terminology e.g. "genome information network" and "problem solving environment" are to be used instead.
Sequences and mapping information should be the first tasks to be handled. These are the datatypes which we have right now, and these are the kinds of data which are required by the researchers to advance discovery.
However, the "big picture" includes understanding evolutionary relationships, not only within the solanaceae, but within and among all of plants. Metabolism was also listed as an important future goal.

Solanaceae (Plant) Genome Network (SGN or PGN)--Statement of Mission:

The SGN is intended to be both a relational database for information derived from genomic studies and a set of software tools with which to query that database for purposes of scientific discovery. The goal is to provide the informatics network in which to explore and understand the content, organization and function of genetic information in evolutionary context. The database can be anlayzed/queried from several different perspectives: 1) genetic/physical maps 2) sequence analysis 3) evolution/taxonomy 4) metabolic pathways. (ST)

Questions/Functions Required of the Database:

Instead of Starting with the data we have on-hand, we began asking "What types of questions will users of the database be asking?" as a way of structuring the data we have (and will have) to best answer these questions. These questions are to be intentionally non-species specific, however they must be applicable to the task at hand (defining tomato - Arabidopsis synteny)

Sequences:

I have a sequence in hand, show me its ortholog in another species. (Specifically, show the ortholog in Arabidopsis of the given solanaceous sequence)
Show sequences similar to a given sequence
Show the phylogeny of a given gene
Show the multiple alignment of a given sequence with the rest of the database or a subset of the database
How often is my sequence represented in a particular tissue, organism, etc.
What is the putative function of my sequence?
What are the "features" of my sequence - e.g. 3d structure, repetitive elements, control elements, etc.
How many unique sequences are there in a particular tissue
How does a tissue in one organism compare with another tissue or another organism

Mapping:

Where does my sequence map to in (insert species here)
Given a chromosomal region tell me what other seqeunces map there, what QTL are located there, what physical clones span the region, etc.
Show me all the things from one organism that have been mapped in my favorite organism, and
where they mapped to.
Show me all the things with putative function X that have been mapped in my favorite organism
What size PCR band does a given sequence give and was it single copy in my favorite organism
What % of a specific library is single copy in my favorite organism?
Show all the QTL associated with my favorite trait and tell me what has been mapped to those regions
Align the maps of two organisms

Andreas Matern

Last modified: Wed Jun 23 23:06:22 EDT 1999