PUMAdb : Help Entering Results for a Single Array

Contents

Description
Completing the Data Entry for Microarray Experiment Form
Specifying array platform, software, organism, and file location
Result Set Section (Affymetrix and Agilent only)
Data File Locations Section
Monitoring Your Request as it Progresses Within the Queue
Successful Result Entry into the Database
Common Problems

Description

The Data Entry for Microarray Experiment form is used to enter experiments into the database one at a time. This help describes how to fill out the form to enter an experiment. Experiments can also be entered by batch, and a separate batch help is available for that procedure.

It is assumed you've already read First Time Users and What You Need from the previous help file, Entering Results from Experiments

Completing the Data Entry for Microarray Experiment Form

Specifying array platform, software, organism, and file location

The first step in entering an experiment is to specify the print technology type, software package, organism, and directory which contains your files. Files can reside in either your incoming directory on loader, a subdirectory thereof, or in your arrayfiles directory (or subdirectory thereof) on the Microarray Facility's server, if you have an account there. All files for an experiment should reside in the same directory. Agilent multi-channel image support : this is where you indicate if you have split the image through the TiffSplitter program, or are uploading a single, multi-page TIFF file.

Result Set Section (Affymetrix and Agilent only)

A "Result Set" is one version of data for a given hybridization. Both Agilent and Affymetrix/dChip software allow you to set various parameters which can affect the set of extracted results. For example, Agilent Feature Extractor has several different normalization schemes which would produce a distinct result set for the same array image. For Affymetrix, the cell file is considered a distinct result set from the results of the probe set derivation (you should have exactly one cell file result set, and one or more data file result sets, per Affymetrix array). Generally speaking, only power-users that are exploring different feature extraction parameters or normalization schemes would be producing multiple result sets from a single array. For example, a researcher may scan a slide once (defining a "slide" and "experiment" for the purposes of the database), but choose several different ways to normalize or compute the extracted data, producing multiple result sets for the single array. However, this example is very seldom seen.

Enter the name for the result set and a description specific to the set; e.g., details of your chosen normalization scheme, or the software used (say, MAS 5 vs. GCOS vs. dChip vs. SNPSncanner). Information describing the experiment or encompassing all result sets (e.g., for Affymetrix, the cell data and every Probe Set intensity set) is entered in the Experiment Description and Details section.

Data File Locations Section

The file location pulldown menus will contain a list of your files in the directory you specified on the previous page. The data file is first and after you select it you press the button labeled "Autofill". Autofill will attempt to select the names of the other three files based on the name of the data file. If it cannot do so, you will need to specify each of the three remaining files. Note that we now accept compressed (.gz) files, but make sure that they are transferred to loader as binary or "raw" data. (Files with the extensions .gps, .sag, .sra, .tif as well as Affymetrix .DAT files should also be transferred in binary. All others should be transferred as ASCII text. For more information about transferring files, refer to the Entering Results from Experiments document.

Experiment Description And Details Section

Determine the print your experiment uses and select it from the pop-up list. Clicking on Print Name will get you information about these prints. If you are unsure about which print to use, contact the microarray database curators. If you are entering data for an Agilent experiment and have selected the data file, the Autofill option will attempt to determine the print by mapping the barcode in the file to the database.
Enter a unique name for the slide used in this experiment in the Slide Name box. A systematic slide name, usually dictated by the array producer, is preferred (unique). If you are entering data for an Agilent experiment and have selected the data file, the Autofill option will attempt to construct the slide name by looking up the barcode in the file.

Note: All text entry on the forms is case sensitive. Also, please avoid the use of single and double quotes ('a' and "a") in your entries.
Enter a date you want associated with the experiment; the date of hybridization is recommended. The date format is 4-digit year, 2-digit month, and 2-digit day (YYYY-MM-DD). This too is looked up for Agilent if you press Autofill.
For Experiment Name, enter a unique, descriptive name for your experiment. The maximum length is 100 characters.
You have space to enter 2000 characters of text description for the experiment. This is optional and is intended for useful notes, such as a description of the methods used in sample preparation. There are separate description fields below to specifically identify the red and green channels.
Choose an experiment category and subcategory. If you need more information about categories or subcategories, click on the links to the left of the pop-up menus. If you need new categories or subcategories created, contact the microarray database curators. Note that you can save these values as your preference for the form, so they will be selected the next time you use it.
Enter short descriptions of the samples used for the green and red channels in this experiment. The maximum length is 100 characters, and descriptions are usually phrases such as "reference pool", or "nitrogen starvation, 2 days".
If you wish to define your own normalization, check 'User Defined' and enter a normalization value in the box provided. Otherwise, accept the default-computed normalization ('Computed' on the form) or select 'Using regression correlation' if you wish to use this method. Normalization type is required for entering Agilent or Affymetrix data, but is ignored.

Array Access Section

Enter the login name of the person who performed the experiment in the Experimenter box. This will be the only person, save the curators, with the ability to edit or delete the array. By default, your entire lab group will automatically have view access to your experiment.
Choose who will be able to see the data from your array using the multiple-select scrolling lists under Collaborative Group. You may select one or more Collaborative Groups and/or one or more individual users who will be able to view your data. (To select multiple choices in a list, hold down the command/apple key on MacOS or the control key on a Windows while you click. With Unix, just click on multiple names.) Lists of collaborative groups and individual users can be obtained by clicking the links to the left of the scrolling menus or by clicking "User group" or "Users" under "List Data" on the main page.
Click the "Load Experiment into Database" button to enter your experiment.

Monitoring Your Request as it Progresses Within the Queue

Experiment loading is commenced by entering your loading-data into a queue. The rate of loading is determined by a number of factors, including both the load on the database and how many other array-load requests were made prior to yours. If there are no delays, it usually takes at least five minutes per array, but can take quite a bit longer if your arrays have a large number of spots (human arrays) or if many other users are using the database. An Affymetrix tiling array can take up to one hour to load into the database and so as not to slow the database to a crawl, only one tiling array can be run at once; all other jobs are queued.

During this time, you can check the progress of your experiment load within the queue. After your data is successfully entered into the queue (note: this is not the same thing as final entry into the database), you should receive a confirmation screen as well as an email notifying you:

	Your database entry request (batch number XXXX) has been
queued for loading.

	Please note the data for your array(s) ARE NOT YET IN THE
DATABASE.  Do NOT delete any of your files until you receive email
confirmation that the data have been loaded.

	Progress of is batch within the queue can be viewed at:

http://puma.princeton.edu/cgi-bin/tools/queue/nph-ProgressQuery?batchno=XXXX

	If you have any questions please contact the database curators 

	array@princeton.edu)

You can check the progress of your experiment load based on the batch number reported to you with either the link on the queue confirmation page or from the URL in the email.

Successful Result Entry into the Database

If all goes well, you will eventually get an email message that says:


    Loading of your array data (batch number XXXX) has completed.

  1 out of 1 were successfully loaded.

    Details of the load process have been written to:
        
    /loader/ftphome/username/logs/XXXX.log,

    or you can temporarily view the details via the web at:

    http://puma.princeton.edu/cgi-bin/tools/queue/nph-ProgressQuery?batchno=XXXX

    If you have any questions please contact the curators

    (array@princeton.edu).

At the bottom of the HTML confirmation page or in the log file in your logs directory on loader.princeton.edu should be the message:

==== 1 out of 1 were successfully loaded. ====

In the case of a batch load the numbers would be greater, for example, "10 out of 10".

Common Problems

If your results have not been loaded 1 day after entry into the queue, please notify the microarray database curators.
File location: All files must be in the same directory or subdirectory of your incoming account on loader.princeton.edu or the arrayfiles server.
UNIX file names: The names of your uploaded files should not contain spaces, or any of the following characters: ' " # , / \ ? < > ; : ! @ % ^ & * ( )
Occasionally, we backup and re-index the database. This process can significantly delay the loading of data (and vice versa). We suggest not loading during these time periods. Consult the Scheduled Database Backups page for the times to avoid.
Sometimes the conversion of the 2 TIFF images to a proxy image (for web viewing) fails. Please check your loaded arrays by displaying them and verifying the clickable-image. If you need to replace the gif file that we have created for you, please see our help documentation for this. If there is no clickable-image icon present, contact the microarray database curators.
Errors? What errors? Shortly after a queue batch request is processed (successfully or not), you will no longer be able to monitor its status within the queue (as it has been removed, and its web log with it). However, just check your logs directory on loader.princeton.edu to see the text log file of the database entry. The log file name uses the batch number, e.g. "1234.log".