PUMAdb : Help Entering Results for a Batch of Arrays

Contents

Description
Specifying your Batch File and Data Directory
Assembling a Batch File
Example Batch Files
Running the Batch Load Program
Monitoring Your Request as it Progresses Within the Queue
Successful Result Entry into the Database
Common Problems

Description

Many times, you will have enough microarray results to enter that entering the file names one-by-one on a web page will be too time consuming. If this is the case, you can construct a file that provides all the information you would otherwise enter on the web page. The Data Entry for Microarray Experiment form is used to enter a batch of experiments into the database according to instructions specifed in the batch file. This help describes the information you need to enter an experiment and how to create the batch file. Small numbers of experiments can also be entered individually, and a separate help is available for that procedure.

It is assumed you have already read First Time Users and What You Need from the previous help file, Entering Results from Experiments

Specifying your Batch File and Data Directory

First choose the feature extract software package for your experiment. Then you need to specify the directory containing both your batch and data files. This can be your incoming directory on loader.princeton.edu or a subdirectory thereof. In addition, if you have an account on the Microarray Core Facility's arrayfiles server, they can reside there. Your batch file must reside with the data files or you can upload it from your own computer.

Assembling A Batch File

To load a group of experiments into the database, you first need to assemble a batch file in which each line contains all the information needed to enter one experiment. This is the same information that goes on the web form for individual experiment entry. The batch file must be tab delimited and have the lists below in the header. These lists are the same as the entry box titles on the individual experiment entry form. The columns in the batch file can be in any order.

Within the File menu of your browser, Just "Save As Text" and either copy or edit the resulting file. Do not change the name or spelling of the labels. All fields are required except those marked optional. Optional fields may be ommitted entirely from the batch file, or contained with empty values. Your completed batch file must be in your loader account before you can enter the data. Data files must also be in your loader account. All fields marked "optional" may be left blank or not included in your batch file. All fields not marked as optional are required.

AFFYMETRIX batch file columns (sample)	AGILENT batch file columns (sample)
Result Set Name Result Set Description: (optional) Add to Expt¹ (optional) Print Name: must be in the Print list. Slide Name: must be unique Exp File Location (file suffix is .exp) Cell File Location (file suffix is .cel) Gene File Location (file suffix is .txt or .calls; optional for Tiling experiments) Single Scan File Location (file suffix is .dat; optional for Tiling experiments) Single Channel Description Experiment Date³ (optional, defaults to the current date) Experiment Name: must be unique Experiment Category: must be in the Category list. Experiment SubCategory: must be in the SubCategory list. Normalization Type ⁴ Norm Value ⁴ Experimenter: Must be in the Users list. Experiment Description (optional) Collaborative Group ⁵ (optional) If specified, must be in the User Groups list. For multiple groups, separate with commas. Individual User ⁵ (optional) If specified, must be in the Users list. For multiple users, separate with commas. Users list.	Result Set Name Result Set Description: (optional) Add to Expt¹ (optional) Print Name: must be in the Print list. Slide Name: must be unique Data File Location²: (file suffix is .dat, .dat2, .gpr, .srr or .txt) Grid File Location: (file suffix is .sag, .grd, .gps, .sra or .shp) Green Scan File Location:⁶ (file suffix is .tif or .scn) Red Scan File Location:⁶ (file suffix is .tif or .scn) Experiment Date³ (optional, defaults to the current date) Experiment Name: must be unique Experiment Category: must be in the Category list. Experiment SubCategory: must be in the SubCategory list. Green Channel (CH1) Description Red Channel (CH2) Description Is Reverse (optional defaults to N) Normalization Type ⁴ Norm Value ⁴ Experimenter: Must be in the Users list. Experiment Description (optional) Collaborative Group ⁵ (optional) If specified, must be in the User Groups list. For multiple groups, separate with commas. Individual User ⁵ (optional) If specified, must be in the Users list. For multiple users, separate with commas.
GENEPIX / SCANALYZE / SPOTREADER batch file columns (sample)	NIMBLEGEN batch file columns (sample)
Print Name: must be in the Print list. Slide Name: must be unique Data File Location²: (file suffix is .dat, .dat2, .gpr, .srr or .txt) Grid File Location: (file suffix is .sag, .grd, .gps, .sra or .shp) Green Scan File Location: (file suffix is .tif or .scn) Red Scan File Location: (file suffix is .tif or .scn) Experiment Date³ (optional, defaults to the current date) Experiment Name: must be unique Experiment Category: must be in the Category list. Experiment SubCategory: must be in the SubCategory list. Green Channel (CH1) Description Red Channel (CH2) Description Is Reverse (optional defaults to N) Normalization Type ⁴ Norm Value ⁴ Experimenter: Must be in the Users list. Experiment Description (optional) Collaborative Group ⁵ (optional) If specified, must be in the User Groups list. For multiple groups, separate with commas. Individual User ⁵ (optional) If specified, must be in the Users list. For multiple users, separate with commas.	Result Set Name Result Set Description: (optional) Add to Expt¹ (optional) Print Name: must be in the Print list. Slide Name: must be unique FTR File Location (file suffix is .ftr) Cell File Location (file suffix is .cel or .xys) Gene File Location (file suffix is .txt or .calls) optional Single Scan File Location (file suffix is .tif) Experiment Date³ (optional, defaults to the current date) Experiment Name: must be unique Experiment Category: must be in the Category list. Experiment SubCategory: must be in the SubCategory list. Single Channel Description ExptType must be in the Experiment Type list. Probe Set Algorithm must be one of Probe_Calls, RMA, MAS and is required only if you are loading a gene intensity file. Experimenter: Must be in the Users list. Experiment Description (optional) Collaborative Group ⁵ (optional) If specified, must be in the User Groups list. For multiple groups, separate with commas. Individual User ⁵ (optional) If specified, must be in the Users list. For multiple users, separate with commas.

¹Add to Expt is used if you are adding a new result set to an existing experiment. The value must be "Y". The values for Experiment Name and Slide Name must be the same as the experiment to which you are adding the result set.

²Data files can be identified in the batch file by filenames only if the batch file is in the same directory as all the data files. If some data files are organized in subdirectories inside incoming/, then the batch file should include the path to those data files relative to the batch file. If, eg. some of the data files are in the "worm_aging" directory inside the incoming/ directory, the path would be: "worm_aging/1234.gpr".

³Experiment Date will default to the date the experiment is entered if the column is left blank. Two date formats are accepted. One is a 4-digit year, 2-digit month, and 2-digit day (YYYY-MM-DD, e.g. 2006-11-22), and the other is the Excel default (MM/DD/YY e.g.11/22/06).

⁴Normalization Type is required and can be "Computed", "Regression" or "User Defined". If Normalization Type is "User Defined", then Norm Value is required. Any number can be entered as a Norm Value. If norm Type is "Computed," then the default computed normalization is used. If the Normalization Type is either "Computed" or "Regression", the Norm Value column should be left blank. Normalization type is required for entering Agilent and Affymetrix data, but is ignored.

⁵Collaborative Group and Individual User are groups and/or users to whom you give permission to view this experiement. They must exist in the User Groups and the Users list, respectively. Separate multiple values with commas.

⁶Green Scan File and Red Scan File (Agilent Only): if you have used TiffSplitter to split your image into two files, then enter each name name in these files. If you have NOT split the image, then enter its name in the Green Scan File and leave the Red Scan File blank. A single batch file must contain ALL split images or ALL unsplit images, not a combination of the two.

Example Batch files

Example batch files from which you can copy headings are available for:

Running the Batch Load Program

Initiating the Process: Choosing "Load Experiment(s) using a Batch File" from the Experiment and Result Entry form takes you to the batch entry form. Here you specify the feature extraction software, choose the directory containing your data files and select the batch file. (Alternatively, you can upload the batch file from your own computer.) For convenience, files containing the word "batch" are placed at the top of the selection list and so you might want to include the word "batch" in the filename, i.e., "affy_batch.txt". (The filename can have any extension.) Now you can choose to:

Check Batch File: You should check your batch file before loading. Feedback is displayed on the screen. You can then correct any errors within your batch file (re-uploading it, if editing on your PC) then re-submit it. Typical errors include malformed files (must be tab-delimited text), incorrect names (print names, categories, subcategories, and experimenter must exist in the database), incorrect data-file path locations, and non-unique columns (slidenames, experiment names).
Submit your Request: All that remains is to press the "Load Experiments using Batch File" button. The loading program scans your batch file one last time for incorrect entries, and then enters the request into a queue.

Monitoring Your Request as it Progresses Within the Queue

Experiment loading is commenced by entering your loading-data into a queue. The rate of loading is determined by a number of factors, including both the load on the database and how many other array-load requests were made prior to yours. If there are no delays, it usually takes at least five minutes per array, but can take quite a bit longer if your arrays have a large number of spots (human arrays) or if many other users are using the database. An Affymetrix tiling array can take up to one hour to load into the database and so as not to slow the database to a crawl, only one tiling array can be run at once; all other jobs are queued.

During this time, you can check the progress of your experiment load within the queue. After your data is successfully entered into the queue (note: this is not the same thing as final entry into the database), you should receive a confirmation screen as well as an email notifying you:

	Your database entry request (batch number XXXX) has been
queued for loading.

	Please note the data for your array(s) ARE NOT YET IN THE
DATABASE.  Do NOT delete any of your files until you receive email
confirmation that the data have been loaded.

	Progress of is batch within the queue can be viewed at:

http://puma.princeton.edu/cgi-bin/tools/queue/nph-ProgressQuery?batchno=XXXX

	If you have any questions please contact the database curators 

	array@princeton.edu)

You can check the progress of your experiment load based on the batch number reported to you with either the link on the queue confirmation page or from the URL in the email.

Successful Result Entry into the Database

If all goes well, you will eventually get an email message that says:

    Loading of your array data (batch number XXXX) has completed.

  1 out of 1 were successfully loaded.

    Details of the load process have been written to:
        
    /loader/ftphome/username/logs/XXXX.log,

    or you can temporarily view the details via the web at:

    http://puma.princeton.edu/cgi-bin/tools/queue/nph-ProgressQuery?batchno=XXXX

    If you have any questions please contact the curators

    (array@princeton.edu).

At the bottom of the HTML confirmation page or in the log file in your logs directory on loader.princeton.edu should be the message:

==== 1 out of 1 were successfully loaded. ====

In the case of a batch load the numbers would be greater, for example, "10 out of 10".

Common Problems

If your results have not been loaded 1 day after entry into the queue, please notify the microarray database curators.
File location: All files must be in the same directory or subdirectory of your incoming account on loader.princeton.edu or the arrayfiles server.
UNIX file names: The names of your uploaded files should not contain spaces, or any of the following characters: ' " # , / \ ? < > ; : ! @ % ^ & * ( )
Occasionally, we backup and re-index the database. This process can significantly delay the loading of data (and vice versa). We suggest not loading during these time periods. Consult the Scheduled Database Backups page for the times to avoid.
Sometimes the conversion of the 2 TIFF images to a proxy image (for web viewing) fails. Please check your loaded arrays by displaying them and verifying the clickable-image. If you need to replace the gif file that we have created for you, please see our help documentation for this. If there is no clickable-image icon present, contact the microarray database curators.
Errors? What errors? Shortly after a queue batch request is processed (successfully or not), you will no longer be able to monitor its status within the queue (as it has been removed, and its web log with it). However, just check your logs directory on loader.princeton.edu to see the text log file of the database entry. The log file name uses the batch number, e.g. "1234.log".