Help : Comparing Two Experiments

Contents

Overview
Setting up the comparison

Select array for comparison
Select measured data to compare
Select filters

Interpreting the comparison

The plot
The regression calculation

Overview

This tool compares measured values from two arrays/hybridizations. This gives a global view of the consistency between two samples (usually replicates or reverse-dye replicates, whether technical or biological). The measurement to compare, usually log ratios for two-color data or signal for single-channel data, and any filters to be applied are configurable by the user. Only data from the same print run or array design may be compared.

The tool fits a simple linear regression to all data that pass the filters, providing a measure of concordence between the two arrays. All data are plotted, with features that pass the filters displayed in blue, and those that fail displayed in red.

Setting up the comparison

The first page of the tool provides several configuration options.

Select array for comparison

All accessible arrays from the same print run are displayed in one or two scrolling lists. One list contains arrays belonging to you, the other contains arrays belonging to others. Only one array may be selected for comparison.

By default, the selected array is considered a simple replicate, and values are compared directly. You may select the "reverse replicate" option, meaning that the equivalent samples have the opposite dye on the second array. In this case, log ratio values only from the second array will be inverted prior to comparison. This option has no effect on single-channel data, or on any measurement except log ratios.

Select measured data to compare

A limited list of measurements are available for comparison: mean intensity or summary signal for single-channel data; and various foreground and background intensities, and ratios, for two-color data. Except as noted in the list, all values will be log-transformed (base 10) in the plot. This is generally more appropriate than using untransformed values for the regression calculations.

Select filters

The standard set of filtering options are available. All data will be presented, but features that do not pass the filters will be presented in red and will not be included in the regression calculation. For an indication of array quality, you might wish to filter away only relatively bad spots, in order to see how well the bulk of the data agree between the two arrays. Generally speaking, non-responsive genes might not be expected to have any particular correlation (especially in two-color ratio measurements), so to examine concordance between responsive genes you should filter for features with a high absolute log ratio.

Interpreting the comparison

The second page of the tool presents some summary data, a scatter plot of the data, and the regression calculation indicating concordance between the two arrays.

The plot

All data are log-transformed, as noted above. Features that passed the filters in both arrays are shown in blue, while those that failed are shown in red. Features with no data in either array are omitted from the plot.

The regression calculation

A linear regression is calculated for all data that pass the filters in both arrays. In principle, the regression line for replicate experiments should have a slope of 1 and an intercept of 0; the y=x line is plotted (dashed) as a reference. (Reverse replicates should have a slope of -1 for log-ratio data, but choosing the "reverse replicate" option will invert the values for the second array, making this ideal slope 1 for easier comparison.) In addition to the slope and intercept, the regression correlation (R2) indicates the overall strength of the correlation: 1 is perfect correlation, 0 is no correlation. High quality replicates should have an R2 greater than 0.5.

The regression calculation is fit by the method of least squares, making it vulnerable to outlier values. Visual inspection of the plot may reveal that the overall correlation is better than the best-fit line suggests, especially if you concentrate on the apparently responsive features (those with a high or low log-ratio).

PUMAdb : Help : Comparing Two Experiments