PUMAdb : Glossary Terms
Your session is inactive. Login
A-B |
C-E |
F-J |
K-M |
N-O |
P-Q |
R-S |
T-Z
- Advanced Search
- A method of searching for hydridization data (experiments) using multiple terms.
- ATCC
- American Type Cell Culture; maintains
collections of strains and clones.
- BioSci
- BIOSCI is a set of internet
newsgroups and e-mail lists for biologists.
- Cellular Component
- One of the three categories used by the Gene Ontology project, cellular
component encompasses subcellular structures, locations, and
macromolecular complexes. Examples include nucleus,
telomere, and origin recognition complex.
- Curator
- A keeper of the microarray database, responsible for providing
online assistance to users of the database. The Staff page lists all current curators.
- DDBJ
- DNA DataBase of Japan. DDBJ is a
repository of DNA sequences. DDBJ is produced in collaboration with
GenBank and EMBL.
- EMBL
- European Molecular Biology Labs. The EMBL
Nucleotide Sequence database is a comprehensive database of DNA and
RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
- Entrez
- The Entrez
Search System was developed by NCBI.
Entrez allows you to retrieve molecular biology data and bibliographic
citations from integrated nucleotide (GenBank, DDBJ, EMBL), protein (Swiss-Prot, PIR, PRF, PDB), and bioliographic (PubMed) databases. Within database
pages, external links are provided to one or more of these databases.
- GCG
- The Genetics Computer Group is a
private company involved in the development of sequence analysis
software.
- Function
- One of the three categories used by the Gene Ontology project,
function describes the tasks performed by individual gene
products; examples are transcription factor and DNA
binding.
- GenBank
- GenBank is
the DNA sequence database sponsored by the US National Institutes of
Health. GenBank is produced in collaboration with EMBL and DDBJ.
- Gene Ontology (GO)
- The Gene Ontology (GO) project was established to provide a
common language to describe aspects of a gene product's biology. The
use of a consistent vocabulary allows genes from different species to
be compared based on their GO annotations. For each of three
categories of biological information--function, process, and cellular
component--a set of terms has been selected and organized. Each set of
terms uses a controlled vocabulary, and parent-child relationships
between terms are defined. This combination of a controlled vocabulary
with defined relationships between items is an ontology. Within an
ontology, a child may be a "part of" or an example ("instance") of its
parent. There are three independently organized controlled
vocabularies, or gene ontologies, one for , one for process, and one for cellular component. Many-to-many
parent-child relationships allowed in the process and cellular
component ontologies. A gene may be annotated to any level in an
ontology, and to more than one item within an ontology. The Gene
Ontology project is a collaboration between three model organism
databases, FlyBase (Drosophila), Saccharomyces Genome Database (SGD)
and Mouse Genome Informatics (MGI).
- Keyword
- A keyword is a word identified as particularly informative about an
object. In a sequence, a keyword often relates to the identity of a
gene or the function of the gene product. References often have a
list of keywords that are Medline MeSH terms. Keywords are good to
use in text searches.
- Medline
- Medline is the National Library of
Medicine's database of biomedical papers; it contains all citation
information for each paper, as well as abstracts for most of the
papers.
- NCBI
- The National Center for
Biotechnology Information (NCBI) is part of the National Library
of Medicine (NLM) in the National Institutes of Health (NIH). Its
mission is to develop new information technologies to aid in the
understanding of fundamental molecular and genetic processes that
control health and disease. NCBI developed and maintains the Entrez Search System and PubMed database.
- ORF
- An ORF (Open Reading Frame) corresponds to a stretch of DNA that
could potentially be translated into a polypeptide; i.e., it begins
with an ATG "start" codon and terminates with one of the 3 "stop"
codons. For an ORF to be considered as a good candidate for coding
a bona fide cellular protein, a minimum size requirement is often
set, e.g., many of the systematic sequencing groups define an ORF as a
stretch of DNA that would code for a protein of 100 amino acids or
more. An ORF is not usually considered equivalent to a gene or locus
until there has been shown to be a phenotype associated with a
mutation in the ORF, and/or an mRNA transcript or a gene product
generated from the ORF's DNA has been detected. See ORF naming
conventions for how ORF's are named in Saccharomyces
cerevisiae.
- ORF naming conventions (Yeast)
- All S. cerevisiae ORF's are designated by
a symbol consisting of three uppercase letters followed by a number
and then another letter, as follows: Y (for "Yeast"); A - P for the
chromosome upon which the ORF resides (where "A" is chromosome I, up
to "P" for chromosome XVI); L or R (for Left or Right arm); a 3-digit
number corresponding to the order of the open reading frame on the
chromosome arm (starting from the centromere and counting out to the
telomere); and W or C for whether the open reading frame is on the
"Watson" or "Crick" strand (where "Watson" runs 5' to 3' from left
telomere to right telomere). Most ORF designations by the systematic
sequencing groups use a predicted 100 amino acid polypeptide as the
minimum size limit, except when a smaller gene has already been
characterized and localized to the chromosomal sequence. When a new
ORF is discovered on a chromosome that has already had its ORF's
named, the new ORF will usually be named by taking the name of an
adjacent ORF and adding an "A" or "B" to the end of it (this avoids
re-numbering all the distal ORF's).
- Orthologs
- Sequences from different species that perform the same
biological function and are likely to be evolved from a common ancenstral
sequence. See Paralogs.
- Paralogs
- Sequences that perform
different biological functions in the same species that likely arose
by duplication and divergence from a common ancestral sequence. See orthologs.
- Process
- One of the three categories used by the Gene Ontology project,
process describes broad biological goals, such as
mitosis or purine metabolism.
- PubMed
- PubMed is
a database of bibliographic information developed by NCBI.
- SGD
- Saccharomyces Genome Database. The SGD project collects
information and maintains a database of the molecular biology of the
yeast Saccharomyces cerevisiae. This database includes a
variety of genomic and biological information. SGD is funded by the
National Center for Human Genome Research (NCGHR) at the U.S. National
Institutes of Health. The SGD is in the Department of Genetics at the
School of Medicine, Stanford University. The SGD Homepage is located
at http://www.yeastgenome.org/.
- SMD
- Stanford Microarray Database. The SMD project stores raw and
normalized data from microarray experiments, as well as their
corresponding image files. In addition, SMD provides interfaces for
data retrieval, analysis and visualization. SMD is located at http://smd.stanford.edu. The
Princeton University MicroArray database (PUMAdb) is based on the SMD
software package.
- SUID
- A unique identifying number within the database which is specific
for a reporter sequence on an array. Typically, this corresponds to a
single arrayed clone or PCR-amplifed region of genomic DNA.
- SwissProt
- SwissProt
is an annotated protein sequence database. Within a Locus page, an
external link is provided (at the "SwissProt" tag) to the SwissProt
entry for the gene, which includes the amino acid sequence for the
protein encoded by the gene.
- Wildcard character
- The database uses an asterisk "*" as a wildcard symbol. In a
search, the wildcard character shows where any text can be tolerated.
For example, searching for the category "DNA*" will produce all
categories that begin with "DNA". Since the database requires exact
matches to its format for searches to be productive, wise use of the
"*" wildcard character is needed for many types of searches.