PUMAdb : Retrieving Public Data

Your session is inactive. Login

Contents


Description

Once data stored within the database have been used in a publication, the full raw data are made freely available to the public. In addition, at an experimenter's discretion, unpublished data may also be made publically available. There are search interfaces for selecting particular experiments for which you wish to download raw data, that are detailed elsewhere (Advanced Results Seach | Basic Search). This document describes in detail how to download raw data for one or many experiments, such as all public data for an organism, or all public data for a publication. The formats of the downloaded files are also detailed in this document.

Retrieving all public data for an organism

There is an ftp site where the public data for each organism are in separate directories, with one file per experiment. The base address for this ftp site is:
ftp://gen-ftp.princeton.edu/puma/organisms

Under this directory is one directory per organism, whose names are a two letter code used by PUMAdbdb to indicate the organism. A list of the organisms, and their two letter codes are available as is a simple interface to the directories, indicating the organism, and the number of available experiments.

Within each organism's directory, is one file per public experiment. These files are gzipped, so will need to be unzipped prior to use (using Winzip, Stuffit Expander, or gunzip, depending on your platform). Further details of the file format may be found below.

Retrieving all public data for an publication

PUMAdbdb provides a simple interface by which you can list all publications whose data reside in PUMAdbdb. If you click on the Data in PUMAdbdb icon, this will show you a list of the experiment sets that are included within that paper. Here you will find a link to download all the files for an experiment set as a single gzipped tarfile.

File formats

Currently PUMAdbdb uses its own, somewhat ad hoc file format for raw data files. To indicate the organization of experiment sets within a publication, and experiments within a set, PUMAdbdb produces .meta files. These are not strict XML, though look somewhat like it. We will in future use the MAGEML standard which is currently being defined.

A meta file for an publication, named publication_XX.meta, where XX is PUMAdbdb's internal publication number, looks something like:


<publication>
!Citation=Spellman PT et al.(1998) Mol Biol Cell 9:3273-97
!Title=Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.
!PubMedID=9843569
	<experiment_set>
		!Name=Spellman et al : Alpha-factor block-release
		!ExptSetNo=526
		!Description=These data are from a time series, where yeast were arrested in alpha-factor, then the alpha-factor was washed out, and the cells were release into fresh medium. Samples were taken every 7 minutes as the cells went through the cell cycle.
	</experiment_set>
	<experiment_set>
		!Name=Spellman et al : cdc15 block-release
		!ExptSetNo=528
		!Description=Yeast cells were blocked in telophase using a cdc15-2 temperature senstive mutant at restrictive temperature. The culture was then shifted to permissive temperature (25oC), and released into the cell cycle. Sample were then taken during the course of almost three full cell cycles.
	</experiment_set>
	<experiment_set>
		!Name=Spellman et al : Elutriation time course
		!ExptSetNo=529
		!Description=Small G1 daughter yeast cells were isolated by centrigugal elutriation. They were then released into YEP ethanol, and followed through one cell cycle, with samples being taken every 30 minutes.
	</experiment_set>
	<experiment_set>
		!Name=Spellman et al : Cyclin overexpression
		!ExptSetNo=530
		!Description=Yeast cell were arrested either in G1 (for CLN3 overexpression) or in G2/M (for CLB2 overexpression). The cyclin was then induced, and samples were taken.
	</experiment_set>
</publication>

Each individual item, such as an experiment set, or a publication is enclosed by a tag to indicate its start and end. Comments about an item begin with an exclamation point, followed by the name of the type of information.

The organization of the .meta file for an experiment set is very similar. For each individual experiment file, there is a series of comment lines at the top of the file, eg:


!Exptid=29
!Experiment Name=alpha factor release sample016
!Organism=Saccharomyces cerevisiae
!Category=Cell-cycle
!Subcategory=Alpha factor block-release
!Experimenter=Paul Spellman
!Contact email=spellman@genomics.stanford.edu
!Contact Address1=School of Medicine
!Contact Address2=Department of Genetics
!State=CA
!Country=USA
!Postal Code=94305
!SlideName=y744n101
!Printname=y744
!Tip Configuration=Standard 4-tip
!Columns per Sector=44
!Rows per Sector=44
!Column Spacing=135
!Row Spacing=135
!Channel 1 Description=asynchronous control (prep3)
!Channel 2 Description=16
!Scanning Software=ScanAlyze
!Software version=2.03
!Scanning parameters=

This is then followed by the data themselves. The data includes all the raw data from the image scan, biological annotation attributed to each spot, and tracking information about the microtiter plates from which the samples were printed. The data are tab-delimited. For definitions of the various columns, please see the table specifications for the RESULT and PLATESAMPLE tables, as well as the relevant annotation tables