Contents
- Accounts
- Data entry
- How do I enter an experiment?
- How do I get started?
- I lack some of the original files required for data loading (e.g. .gps or .tif files), and only have the data (i.e. gpr). How do I get my data loaded, and still "meet" the database's file requirements?
- Why don't my recently experiments have a clickable proxy image?
- How is data normalization accomplished?
- How do I enter a new print?
- How do I initiate the process of entering a new organism in the database?
- Data Retrieval and Analysis
- Data Retrieval
- Data Analysis
- Data and Database questions
- Data Questions
- What are the channel conventions used by the database?
- What do these column names mean?
- What is a SUID? Often phrased: What is this random number in my annotations?
- Why are some experiments lacking data for the various median results?
- Which part of the array is used for determining background?
- How are the top features determined for oligos on ChIP-on-chip arrays?
- Database questions
Princeton University MicroArray database accounts are given to only:
In order to load experiments into the database, the data-files themselves need to be located on our filesystem so that our software can parse and load them. In addition, the arraylists, genelists and logs directories of the loader account add additional database functionality.
Ask a database curator at:array@princeton.edu.
The loader and PUMAdb account userid and password should be the same (use lowercase to login). When you chhange your password within the database,this change will be propagated to loader, within 15 minutes. If you forgot your login credential, plase email the microarray curators at array@princeton.edu.
> I am trying to batch submit files and cannot find the sample batch > files. Can you please point me in the right > direction?
If you are using FTP Explorer, the best way to get to a new directory is to go under the Tools menu and select "Go To." You need to enter the full pathname of the file you want.
If you are using Unix, type "cd " and the full pathname of the file.
>What should I do if I don't have enough space in my directory?
If you don't have enough space in your directory, you need to delete some files. The following commands may help you.
You can also save space by depositing compressed files.
Entering an experiment requires that you have the files in your "loader.princeton.edu" account, in the "incoming" directory , so you will need an account on this system (a SFTP loader account) to enter data. If you've never logged on to loader before, you may not have an account on loader. Keep in mind that a loader account is different from a PUMAdb Database access account. Please refer to the PUMAdb Account and Access help page for additional information on PUMAdb accounts, viewing data, and entering experiments.
You can provide proxy files for the TIFFs or grid files is the original were lost. An archive of usable "proxy" files can be downloaded here and used after uploading to your incoming directory.
With some experiments, the software program that makes the proxy image fails when the experiments are entered into the database, so these experiments will not have any images in the database. It is possible upload a preferred replacement, or ask the curators for assistance.
Please send an e-mail to the curators at array@princeton.edu.
To download data from a publication follow the Published data link on the PUMAdb homepage. The "PUMAdb" link in the last column will take you to the page where you can download the complete set of data in a compressed format ("raw data"). You can also display or analyze the arrays by choosing the appropriate buttons at the bottom.You can also do this by selecting "publications", organism and reference in the basic search window. The "display data" button will take you to the window where you download the data.
The Retrieving Public Data from PUMAdb help will give you more detailed information on this topic.
Database users can retrieve data from a print by using "method 2" and selecting the desired print name on the Advanced Results Search page.
>how do I have gene names or other biological information show up on my clusters?
After selecting the experiments you wish to cluster, you are presented with a page of clustering options. These include "Gene Selection Options", "Gene Filtering", "Biological data To Select", and "Data Selection Options". Within the third field, "Biological data To Select", you can choose which annotations to display alongside your cluster. For example, for yeast you may display the gene name, process, and function. The default clustering display does not include biological annotations, only the systematic name. Also, biological information contained within PUMAdb varies depending on the level of annotation for your organism of interest.
The asterisks are there to warn users. There are a number of clones that map to more than one UniGene clusters. These clones are often referred to as "chimeric clones" because their 5' and 3' ends are most likely derived from different messages. We flag these clones with double asterisks under their GENE_NAME annotation to signify that we are choosing one of the possible annotations. Users should take caution in interpreting their results from these clones (and not take them *too* seriously), and perhaps study them more closely.
Using the data analysis pipeline, data must be retrieved, and the .pcl file placed in your repository. From the repository, click on for the .pcl file to be converted to yeast ORF names. Choose the curated synthetic gene list, "ORFs", with the radio button "Remove all original data". Click the "Calculate Synthetic Genes" button, and then either download the resulting PreClustering file or proceed to gene filtering. For more on the potential uses of collapsing data by "synthetic" gene(s) please see the help page
New users are often unaware of historical microarray conventions, which provided the specification for the database model.
>How can I find out what the PUMAdb column names mean? >For example, what is the LFRAT column?Explanations of the column names can be found in the ScanAlyze manual in the MicroArray software section.
Other information is available from the database table specifications section; data columns are described in the results table.
A SUID (Sequence Unique IDentifier) is number used to uniquely identify a PCR product, oligo, or clone used as an element within an arraydesign (i.e. a spotted probe or reporter). Generally, a SUID is primarily for internal use. Registered users can link from external applications using a SUID with the template URL
http://puma.princeton.edu/cgi-bin/search/nameSearch.pl?suid=SUIDFor example: http://puma.princeton.edu/cgi-bin/search/nameSearch.pl?suid=5456
Data entered before January 2001 does not contain these median result data. Before this time, our default file format was based on Scanalyze, which does not have these columns. Since the beginning of 2001, we changed our database so that it was able to store all columns produced by Genepix, which is used by most people entering data into PUMAdb, while continuing to support Scanalyze.
Background is determined on a local, spot-by-spot basis.
If you are using Genepix,
the background pixels include all of the pixels within a circular region
that surround the feature of interest unless they meet one of the
conditions listed below. The circular region has a diameter of three times
the diameter of the current feature indicator. The pixel is excluded from
background calculation if one of the following is true:
1) the pixel resides in a neighboring feature-indicator
2) the pixel is not wholly outside a two pixel wide ring around a
feature-indicator
3) the pixel is within the feature-indicator of interest.
The top features are determined by BLASTing the oligos to the genome and then examing the features in a window around the hit area. The algorithm we use is described here.
Information on the hardware and software used by PUMAdb is available on PUMAdb's About page