PUMAdb : Frequently Asked Questions

Contents

Accounts

Princeton University MicroArray database(PUMAdb) Accounts

How do I acquire a PUMAdb account?
I've forgotten my PUMAdb username and password?
I can't open my PUMAdb account

SFTP account on loader.princeton.edu

Why would I need an SFTP loader account?
How do I acquire an SFTP loader account?
I've forgotten my loader account username and password?
I can't find files on loader
Freeing up space in your account

Data entry

How do I enter an experiment?

How do I get started?
I lack some of the original files required for data loading (e.g. .gps or .tif files), and only have the data (i.e. gpr). How do I get my data loaded, and still "meet" the database's file requirements?
Why don't my recently experiments have a clickable proxy image?
How is data normalization accomplished?

How do I enter a new print?
How do I initiate the process of entering a new organism in the database?

Data Retrieval and Analysis

Data Retrieval

How do I find the data for a publication?
How do I retrieve all the experiments for a print?

Data Analysis

Clustering : how do I get biological information included with my clusters?
What do the asterisks mean in front of a Gene Name?
How do I convert sequence identifiers of either Agilent or Affymetrix to the yeast ORF name they are reporting on?

Data and Database questions

Data Questions

What are the channel conventions used by the database?
What do these column names mean?
What is a SUID? Often phrased: What is this random number in my annotations?
Why are some experiments lacking data for the various median results?
Which part of the array is used for determining background?
How are the top features determined for oligos on ChIP-on-chip arrays?

Database questions

What hardware and software does PUMAdb run on?

Accounts
1. Princeton University MicroArray database(PUMAdb) Accounts
  1. How do I acquire a PUMAdb Account?
    Princeton University MicroArray database accounts are given to only:
    - Members of laboratories at Princeton University.
    - Their collaborators.
    PUMA currently relies on authenticating to Princeton CAS ; this ensures an current affiliation with the University and/or an active sponsored account by a Princeton-affiliated principle investigator.
2. SFTP account on "loader.princeton.edu"
  1. Why would I need an SFTP loader account?
    In order to load experiments into the database, the data-files themselves need to be located on our filesystem so that our software can parse and load them. In addition, the arraylists, genelists and logs directories of the loader account add additional database functionality.
  2. How do I acquire an SFTP loader account?
    Ask a database curator at:array@princeton.edu.
  3. I've forgotten my loader account username and password?
    The loader and PUMAdb account userid and password should be the same (use lowercase to login). When you chhange your password within the database,this change will be propagated to loader, within 15 minutes. If you forgot your login credential, plase email the microarray curators at array@princeton.edu.
  4. Can't Find Files on loader
```
	  > I am trying to batch submit files and cannot find the sample batch
	  > files. Can you please point me in the right
	  > direction?
```
    If you are using FTP Explorer, the best way to get to a new directory is to go under the Tools menu and select "Go To." You need to enter the full pathname of the file you want.
    If you are using Unix, type "cd " and the full pathname of the file.
  5. Freeing up space in your account
```
	  >What should I do if I don't have enough space in my directory?
	  
```
    If you don't have enough space in your directory, you need to delete some files. The following commands may help you.
    - get filename copies a file from your current directory on loader in the directory from which you connected to loader. The file is not removed from your current directory.
    - delete filename removes the specified file from your current directory on loader. Once a file has been removed, that copy can no longer be accessed. Make sure you no longer need this file or have copied it elsewhere before using the delete command. SFTP will not ask you to confirm this command before completing it.
    You can also save space by depositing compressed files.
Data entry
1. How do I load experiments?
  
  How do I get started?
  Entering an experiment requires that you have the files in your "loader.princeton.edu" account, in the "incoming" directory , so you will need an account on this system (a SFTP loader account) to enter data. If you've never logged on to loader before, you may not have an account on loader. Keep in mind that a loader account is different from a PUMAdb Database access account. Please refer to the PUMAdb Account and Access help page for additional information on PUMAdb accounts, viewing data, and entering experiments.
  
  I lack some of the original files required for data loading (e.g. .gps or .tif files), and only have the data (i.e. gpr). How do I get my data loaded, and still "meet" the database's file requirements?
  You can provide proxy files for the TIFFs or grid files is the original were lost. An archive of usable "proxy" files can be downloaded here and used after uploading to your incoming directory.
  
  Why don't my recently experiments have a clickable proxy image?
  With some experiments, the software program that makes the proxy image fails when the experiments are entered into the database, so these experiments will not have any images in the database. It is possible upload a preferred replacement, or ask the curators for assistance.
  
  How is data normalization accomplished?
  
  For assistance with normalization, please refer to our normalization help page.
  For information on normalization issues and techniques, see the MGED Normalization Working Group page
2. How do I enter a new print, based on a spotted array design?
  1. First you need to create a list containing essential information about your samples. There exists a help file that explains what information you need to collect.
  2. After depositing the list in your logs folder on loader.princeton.edu, you can validate it using the following program.
  3. If your samples require the assigment of new sequences identifiers (SUIDs), e-mail the database curators (array@princeton.edu) and they will assign them.
  4. After validating the list you will receive a message telling you what additional information you need to e-mail the curators (array@princeton.edu) about the printing. They will create the print for you.
3. How do I initiate the process of entering a new organism in the database?
  Please send an e-mail to the curators at array@princeton.edu.
Data Retrieval and Analysis
1. Data Retrieval
  1. How do I find the data for a publication?
    To download data from a publication follow the Published data link on the PUMAdb homepage. The "PUMAdb" link in the last column will take you to the page where you can download the complete set of data in a compressed format ("raw data"). You can also display or analyze the arrays by choosing the appropriate buttons at the bottom.
    You can also do this by selecting "publications", organism and reference in the basic search window. The "display data" button will take you to the window where you download the data.
    The Retrieving Public Data from PUMAdb help will give you more detailed information on this topic.
  2. How do I retrieve all the experiments for a print?
    Database users can retrieve data from a print by using "method 2" and selecting the desired print name on the Advanced Results Search page.
2. Data Analysis
  1. Clustering and Biological Annotations
```
	  >how do I have gene names or other biological information show up on my clusters?
```
    After selecting the experiments you wish to cluster, you are presented with a page of clustering options. These include "Gene Selection Options", "Gene Filtering", "Biological data To Select", and "Data Selection Options". Within the third field, "Biological data To Select", you can choose which annotations to display alongside your cluster. For example, for yeast you may display the gene name, process, and function. The default clustering display does not include biological annotations, only the systematic name. Also, biological information contained within PUMAdb varies depending on the level of annotation for your organism of interest.
  2. What do the asterisks mean in front of a Gene Name?
    The asterisks are there to warn users. There are a number of clones that map to more than one UniGene clusters. These clones are often referred to as "chimeric clones" because their 5' and 3' ends are most likely derived from different messages. We flag these clones with double asterisks under their GENE_NAME annotation to signify that we are choosing one of the possible annotations. Users should take caution in interpreting their results from these clones (and not take them *too* seriously), and perhaps study them more closely.
  3. How do I convert sequence identifiers of either Agilent or Affymetrix to the yeast ORF name they are reporting on?
    Using the data analysis pipeline, data must be retrieved, and the .pcl file placed in your repository. From the repository, click on for the .pcl file to be converted to yeast ORF names. Choose the curated synthetic gene list, "ORFs", with the radio button "Remove all original data". Click the "Calculate Synthetic Genes" button, and then either download the resulting PreClustering file or proceed to gene filtering. For more on the potential uses of collapsing data by "synthetic" gene(s) please see the help page
Data and Database questions
1. Data Questions
  1. What are the channel conventions used by the database?
    New users are often unaware of historical microarray conventions, which provided the specification for the database model.
    - The control/reference sample was labeled with a green fluorescein dye in channel 1. Nowadays this dye is usually cy3, but the original convention is the reason channel 1 is usually depicted as green.
    - The experimental sample was labeled with a red dye rhodamine in channel 2. Nowadays this dye is usually cy5, but the original convention is the reason channel 2 is usually depicted red.
    - The ratio convention was experimental/control or red/green. Therefore red typically means, "enriched in the sample of interest" and green means a reduced transcript abundance.
    - "Number 1" is typically associated as being "more important", but in the convention, channel 1 is usually the reference/control, and thus perhaps considered "less important/interesting".
    - The more familiar stoplight meme (red==stop; green==go), causes confusion between what is up-regulated and down-regulated (because people naturally associate green with "go"/"up"). However, if the historical/default channel convention is followed, the opposite is true.
    So new users must often fight their intuition because the database, like most microarray researchers, follows the historical conventions. Disregarding the conventions can cause issues later, as researchers may have to invert the data in order to compare the assay with data that follow the convention. As such, you should probably flag your data as reverse labeled, to make this inversion easier (if you did not follow the convention). Regardless of what channel convention you follow, or color scheme you choose, in order for you and others to interpret your data, it is critical to document the relationship between sample, dye-labeled extract, and channel.
  2. What do these PUMAdb data abbreviations (column names) mean?
```
	  >How can I find out what the PUMAdb column names mean?
	  >For example, what is the LFRAT column?
```
    Explanations of the column names can be found in the ScanAlyze manual in the MicroArray software section.
    Other information is available from the database table specifications section; data columns are described in the results table.
  3. What is a SUID? Often phrased: What is this random number in my annotations?
    A SUID (Sequence Unique IDentifier) is number used to uniquely identify a PCR product, oligo, or clone used as an element within an arraydesign (i.e. a spotted probe or reporter). Generally, a SUID is primarily for internal use. Registered users can link from external applications using a SUID with the template URL
```
http://puma.princeton.edu/cgi-bin/search/nameSearch.pl?suid=SUID
```
    For example:
    http://puma.princeton.edu/cgi-bin/search/nameSearch.pl?suid=5456
  4. Why are some experiments lacking data for the various median results?
    Data entered before January 2001 does not contain these median result data. Before this time, our default file format was based on Scanalyze, which does not have these columns. Since the beginning of 2001, we changed our database so that it was able to store all columns produced by Genepix, which is used by most people entering data into PUMAdb, while continuing to support Scanalyze.
  5. Which part of the array is used for determining background?
    Background is determined on a local, spot-by-spot basis. If you are using Genepix, the background pixels include all of the pixels within a circular region that surround the feature of interest unless they meet one of the conditions listed below. The circular region has a diameter of three times the diameter of the current feature indicator. The pixel is excluded from background calculation if one of the following is true:
    1) the pixel resides in a neighboring feature-indicator
    2) the pixel is not wholly outside a two pixel wide ring around a feature-indicator
    3) the pixel is within the feature-indicator of interest.
  6. How are the top features determined for oligos on ChIP-on-chip arrays?
    The top features are determined by BLASTing the oligos to the genome and then examing the features in a window around the hit area. The algorithm we use is described here.
2. Database questions
  1. What hardware and software does PUMAdb run on?
    Information on the hardware and software used by PUMAdb is available on PUMAdb's About page