PUMAdb : Glossary Terms

A-B | C-E | F-J | K-M | N-O | P-Q | R-S | T-Z

Advanced Search: A method of searching for hydridization data (experiments) using multiple terms.
ATCC: American Type Cell Culture; maintains collections of strains and clones.
BioSci: BIOSCI is a set of internet newsgroups and e-mail lists for biologists.
Cellular Component: One of the three categories used by the Gene Ontology project, cellular component encompasses subcellular structures, locations, and macromolecular complexes. Examples include nucleus, telomere, and origin recognition complex.
Curator: A keeper of the microarray database, responsible for providing online assistance to users of the database. The Staff page lists all current curators.
DDBJ: DNA DataBase of Japan. DDBJ is a repository of DNA sequences. DDBJ is produced in collaboration with GenBank and EMBL.
EMBL: European Molecular Biology Labs. The EMBL Nucleotide Sequence database is a comprehensive database of DNA and RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ).
Entrez: The Entrez Search System was developed by NCBI. Entrez allows you to retrieve molecular biology data and bibliographic citations from integrated nucleotide (GenBank, DDBJ, EMBL), protein (Swiss-Prot, PIR, PRF, PDB), and bioliographic (PubMed) databases. Within database pages, external links are provided to one or more of these databases.
GCG: The Genetics Computer Group is a private company involved in the development of sequence analysis software.
Function: One of the three categories used by the Gene Ontology project, function describes the tasks performed by individual gene products; examples are transcription factor and DNA binding.
GenBank: GenBank is the DNA sequence database sponsored by the US National Institutes of Health. GenBank is produced in collaboration with EMBL and DDBJ.
Gene Ontology (GO): The Gene Ontology (GO) project was established to provide a common language to describe aspects of a gene product's biology. The use of a consistent vocabulary allows genes from different species to be compared based on their GO annotations. For each of three categories of biological information--function, process, and cellular component--a set of terms has been selected and organized. Each set of terms uses a controlled vocabulary, and parent-child relationships between terms are defined. This combination of a controlled vocabulary with defined relationships between items is an ontology. Within an ontology, a child may be a "part of" or an example ("instance") of its parent. There are three independently organized controlled vocabularies, or gene ontologies, one for , one for process, and one for cellular component. Many-to-many parent-child relationships allowed in the process and cellular component ontologies. A gene may be annotated to any level in an ontology, and to more than one item within an ontology. The Gene Ontology project is a collaboration between three model organism databases, FlyBase (Drosophila), Saccharomyces Genome Database (SGD) and Mouse Genome Informatics (MGI).
Keyword: A keyword is a word identified as particularly informative about an object. In a sequence, a keyword often relates to the identity of a gene or the function of the gene product. References often have a list of keywords that are Medline MeSH terms. Keywords are good to use in text searches.
Medline: Medline is the National Library of Medicine's database of biomedical papers; it contains all citation information for each paper, as well as abstracts for most of the papers.
NCBI: The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) in the National Institutes of Health (NIH). Its mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. NCBI developed and maintains the Entrez Search System and PubMed database.
ORF: An ORF (Open Reading Frame) corresponds to a stretch of DNA that could potentially be translated into a polypeptide; i.e., it begins with an ATG "start" codon and terminates with one of the 3 "stop" codons. For an ORF to be considered as a good candidate for coding a bona fide cellular protein, a minimum size requirement is often set, e.g., many of the systematic sequencing groups define an ORF as a stretch of DNA that would code for a protein of 100 amino acids or more. An ORF is not usually considered equivalent to a gene or locus until there has been shown to be a phenotype associated with a mutation in the ORF, and/or an mRNA transcript or a gene product generated from the ORF's DNA has been detected. See ORF naming conventions for how ORF's are named in Saccharomyces cerevisiae.
ORF naming conventions (Yeast): All S. cerevisiae ORF's are designated by a symbol consisting of three uppercase letters followed by a number and then another letter, as follows: Y (for "Yeast"); A - P for the chromosome upon which the ORF resides (where "A" is chromosome I, up to "P" for chromosome XVI); L or R (for Left or Right arm); a 3-digit number corresponding to the order of the open reading frame on the chromosome arm (starting from the centromere and counting out to the telomere); and W or C for whether the open reading frame is on the "Watson" or "Crick" strand (where "Watson" runs 5' to 3' from left telomere to right telomere). Most ORF designations by the systematic sequencing groups use a predicted 100 amino acid polypeptide as the minimum size limit, except when a smaller gene has already been characterized and localized to the chromosomal sequence. When a new ORF is discovered on a chromosome that has already had its ORF's named, the new ORF will usually be named by taking the name of an adjacent ORF and adding an "A" or "B" to the end of it (this avoids re-numbering all the distal ORF's).
Orthologs: Sequences from different species that perform the same biological function and are likely to be evolved from a common ancenstral sequence. See Paralogs.
Paralogs: Sequences that perform different biological functions in the same species that likely arose by duplication and divergence from a common ancestral sequence. See orthologs.
Process: One of the three categories used by the Gene Ontology project, process describes broad biological goals, such as mitosis or purine metabolism.
PubMed: PubMed is a database of bibliographic information developed by NCBI.
SGD: Saccharomyces Genome Database. The SGD project collects information and maintains a database of the molecular biology of the yeast Saccharomyces cerevisiae. This database includes a variety of genomic and biological information. SGD is funded by the National Center for Human Genome Research (NCGHR) at the U.S. National Institutes of Health. The SGD is in the Department of Genetics at the School of Medicine, Stanford University. The SGD Homepage is located at http://www.yeastgenome.org/.
SMD: Stanford Microarray Database. The SMD project stores raw and normalized data from microarray experiments, as well as their corresponding image files. In addition, SMD provides interfaces for data retrieval, analysis and visualization. SMD is located at http://smd.stanford.edu. The Princeton University MicroArray database (PUMAdb) is based on the SMD software package.
SUID: A unique identifying number within the database which is specific for a reporter sequence on an array. Typically, this corresponds to a single arrayed clone or PCR-amplifed region of genomic DNA.
SwissProt: SwissProt is an annotated protein sequence database. Within a Locus page, an external link is provided (at the "SwissProt" tag) to the SwissProt entry for the gene, which includes the amino acid sequence for the protein encoded by the gene.
Wildcard character: The database uses an asterisk "*" as a wildcard symbol. In a search, the wildcard character shows where any text can be tolerated. For example, searching for the category "DNA*" will produce all categories that begin with "DNA". Since the database requires exact matches to its format for searches to be productive, wise use of the "*" wildcard character is needed for many types of searches.