PUMAdb : Help Merge PCL Files

Contents

Description
Combine or distinguish arrays with the same identifier
Translate or match gene identifiers
Compute averages

Related Help Documents

Data Repository: Description of and instructions for your repository for data and analyses.
File Formats: Information about preclustering (.pcl), clustered data table (.cdt), gene tree (.gtr) and array tree (.atr) files generated in the process of clustering data.

Description

The PCL merge tool allows you to combine PCL files. You may upload one or two files from your desktop computer, and/or select one or more PCL deposits from your repository. This process may be useful for co-clustering your data with data from another source, for adding rows of clinical values to gene expression data, etc. When the merging process is complete, you may download the new file, enter it into your repository, or cluster it.

Combine or distinguish arrays with the same identifier

You must choose what to do with columns (experiments) in different files but with the same identifier: merge them, averaging values for any rows (genes) in common between them; or make them distinct, by appending the name of the file from which they came. Merging columns is a convenient way to combine data if you have hyrbridized a single sample to two or more microarrays comprising a single "chip set," e.g., Affymetrix HG-U133A and HG-U133B arrays. Note that in this application you would likely have to edit at least one of the files by hand, in order to make the column headers (array identifiers) the same for the two arrays in each chip set. Note also that two columns with the same identifier in a single file will always be merged, whether or not you select the merge option.

Translate or match gene identifiers

You may optionally provide a "translation file" to match up different row identifiers (UID's or gene names) in the various files. For example, if you are combining data from spotted cDNA arrays and Affymetrix GeneChips^tm, you might want to translate the Affymetrix probe set names into the nearest equivalent clone ID, or translate both into their corresponding UniGene clusters. To do this, you may upload a tab-delimited text file in the following format:

Column	Contents
1	Final desired identifier (Hs.408312)
2	Final desired annotation (TP53)
3 and onward	Identifiers to be translated to the final desired identifier (one per column) (IMAGE:1208978, IMAGE:1508462, ...)

Identifiers may appear on multiple rows, but note that this may cause a single row in an original file to contribute to more than one row in the final file. This may be the desired outcome if you are doing something more complex than a simple translation. Identifiers not found in the translation file will be preserved unchanged.

Compute averages

Finally, you must select the method, mean or median, by which values will be averaged after merging and translation (if any). For example, if two columns are merged, and each contains three rows that are translated to the same final identifier, six values must be averaged to obtain the final value. Either the mean or median will be calculated, according to your selection. Note that each identifier will appear only once in the final file, with averaging as required to produce this result.

Gene and experiment weights will also will be averaged by the method selected.