Help : Comparing Two Experiments
Contents
- Overview
- Setting up the comparison
- Interpreting the comparison
Overview
This tool compares measured values from two arrays/hybridizations.
This gives a global view of the consistency between two samples
(usually replicates or reverse-dye replicates, whether technical or
biological). The measurement to compare, usually log ratios for
two-color data or signal for single-channel data, and any filters to
be applied are configurable by the user. Only data from the same
print run or array design may be compared.
The tool fits a simple linear regression to all data that pass the
filters, providing a measure of concordence between the two arrays.
All data are plotted, with features that pass the filters displayed
in blue, and those that fail displayed in red.
Setting up the comparison
The first page of the tool provides several configuration options.
Select array for comparison
All accessible arrays from the same print run are displayed in one or two
scrolling lists. One list contains arrays belonging to you, the other contains
arrays belonging to others. Only one array may be selected for comparison.
By default, the selected array is considered a simple replicate, and
values are compared directly. You may select the "reverse replicate"
option, meaning that the equivalent samples have the opposite dye on
the second array. In this case, log ratio values only from the
second array will be inverted prior to comparison. This option has no
effect on single-channel data, or on any measurement except log
ratios.
Select measured data to compare
A limited list of measurements are available for comparison: mean
intensity or summary signal for single-channel data; and various
foreground and background intensities, and ratios, for two-color data.
Except as noted in the list, all values will be log-transformed (base
10) in the plot. This is generally more appropriate than using
untransformed values for the regression calculations.
Select filters
The standard set of filtering options are available. All data will be
presented, but features that do not pass the filters will be
presented in red and will not be included in the regression
calculation.
For an indication of array quality, you might wish to filter away only
relatively bad spots, in order to see how well the bulk of the data
agree between the two arrays. Generally speaking, non-responsive
genes might not be expected to have any particular correlation
(especially in two-color ratio measurements), so to examine
concordance between responsive genes you should filter for features with a
high absolute log ratio.
Interpreting the comparison
The second page of the tool presents some summary data, a scatter plot
of the data, and the regression calculation indicating concordance
between the two arrays.
The plot
All data are log-transformed, as noted above. Features that passed
the filters in both arrays are shown in blue, while those that failed
are shown in red. Features with no data in either array are omitted
from the plot.
The regression calculation
A linear regression is calculated for all data that pass the filters
in both arrays. In principle, the regression line for replicate
experiments should have a slope of 1 and an intercept of 0; the y=x
line is plotted (dashed) as a reference. (Reverse replicates should
have a slope of -1 for log-ratio data, but choosing the "reverse
replicate" option will invert the values for the second array, making
this ideal slope 1 for easier comparison.) In addition to the slope
and intercept, the regression correlation (R
2)
indicates the overall strength of the correlation: 1 is perfect
correlation, 0 is no correlation. High quality replicates should have
an R
2 greater than 0.5.
The regression calculation is fit by the method of least squares,
making it vulnerable to outlier values. Visual inspection of the plot
may reveal that the overall correlation is better than the best-fit
line suggests, especially if you concentrate on the apparently
responsive features (those with a high or low log-ratio).