Contents
ftp://gen-ftp.princeton.edu/puma/organisms
Under this directory is one directory per organism, whose names are a two letter code used by PUMAdbdb to indicate the organism. A list of the organisms, and their two letter codes are available as is a simple interface to the directories, indicating the organism, and the number of available experiments.
Within each organism's directory, is one file per public experiment. These files are gzipped, so will need to be unzipped prior to use (using Winzip, Stuffit Expander, or gunzip, depending on your platform). Further details of the file format may be found below.
The exact method you use to retrieve all the files from an organism's directory, depends on your ftp client of choice. If you are simply using a web browser, such as Netscape, or Internet Explorer, clicking on each file one at a time, and downloading them will work. However, we recommend using an alternative method for downloading many experiments, as using a web browser will be tedious and time consuming.
A graphical ftp client, such as Fetch on the Macintosh, or FTP Explorer on the PC, will allow you to connect to an ftp site, select one, several, or all files in a directory, then download them. This will likely save you a lot of time. The example below uses Fetch on MacOSX:
A command line ftp client can be used to easily retrieve the entire contents of a directory from an ftp site. Typically users on a unix system may use ftp on the command line. The following example is taken from the command line in MacOSX
Note the -i switch means that when you retrieve the files (using mget *gz) that it doesn't ask you to confirm that you want to retrieve each one - it just gets them all. Command line ftp is likely to be the quickest way of retrieving all the files.
Currently PUMAdbdb uses its own, somewhat ad hoc file format for raw data files. To indicate the organization of experiment sets within a publication, and experiments within a set, PUMAdbdb produces .meta files. These are not strict XML, though look somewhat like it. We will in future use the MAGEML standard which is currently being defined.
A meta file for an publication, named publication_XX.meta, where XX is PUMAdbdb's internal publication number, looks something like:
<publication> !Citation=Spellman PT et al.(1998) Mol Biol Cell 9:3273-97 !Title=Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. !PubMedID=9843569 <experiment_set> !Name=Spellman et al : Alpha-factor block-release !ExptSetNo=526 !Description=These data are from a time series, where yeast were arrested in alpha-factor, then the alpha-factor was washed out, and the cells were release into fresh medium. Samples were taken every 7 minutes as the cells went through the cell cycle. </experiment_set> <experiment_set> !Name=Spellman et al : cdc15 block-release !ExptSetNo=528 !Description=Yeast cells were blocked in telophase using a cdc15-2 temperature senstive mutant at restrictive temperature. The culture was then shifted to permissive temperature (25oC), and released into the cell cycle. Sample were then taken during the course of almost three full cell cycles. </experiment_set> <experiment_set> !Name=Spellman et al : Elutriation time course !ExptSetNo=529 !Description=Small G1 daughter yeast cells were isolated by centrigugal elutriation. They were then released into YEP ethanol, and followed through one cell cycle, with samples being taken every 30 minutes. </experiment_set> <experiment_set> !Name=Spellman et al : Cyclin overexpression !ExptSetNo=530 !Description=Yeast cell were arrested either in G1 (for CLN3 overexpression) or in G2/M (for CLB2 overexpression). The cyclin was then induced, and samples were taken. </experiment_set> </publication>
Each individual item, such as an experiment set, or a publication is enclosed by a tag to indicate its start and end. Comments about an item begin with an exclamation point, followed by the name of the type of information.
The organization of the .meta file for an experiment set is very similar. For each individual experiment file, there is a series of comment lines at the top of the file, eg:
!Exptid=29 !Experiment Name=alpha factor release sample016 !Organism=Saccharomyces cerevisiae !Category=Cell-cycle !Subcategory=Alpha factor block-release !Experimenter=Paul Spellman !Contact email=spellman@genomics.stanford.edu !Contact Address1=School of Medicine !Contact Address2=Department of Genetics !State=CA !Country=USA !Postal Code=94305 !SlideName=y744n101 !Printname=y744 !Tip Configuration=Standard 4-tip !Columns per Sector=44 !Rows per Sector=44 !Column Spacing=135 !Row Spacing=135 !Channel 1 Description=asynchronous control (prep3) !Channel 2 Description=16 !Scanning Software=ScanAlyze !Software version=2.03 !Scanning parameters=
This is then followed by the data themselves. The data includes all the raw data from the image scan, biological annotation attributed to each spot, and tracking information about the microtiter plates from which the samples were printed. The data are tab-delimited. For definitions of the various columns, please see the table specifications for the RESULT and PLATESAMPLE tables, as well as the relevant annotation tables