PUMAdb : Data Analysis and Clustering Help

Your session is inactive. Login

Contents

Related Help Documents


Description

The Clustering and Image Generation tools at PUMAdb allows you to select methods to organize, center, cluster and display microarray data. At this time, you have to select and filter the data you want to analyze using the Basic Search or Advanced Results Search. This help page describes the algorithms available and how to choose them.

Partitioning Data

When organizing your data for analysis, you can elect to treat all the microarray data as a single group or you can separate the data into related groups, or partitions. Hierarchical clustering can be performed on data that is partitioned or unpartitioned. The difference is that hierarchical clustering does not join data in different partitions, so that potentially unrelated patterns remain independent.

Distance Metrics and Centering

The most obvious method for comparing sets of microarray data is to compare the data profiles. A gene expression profile can be imagined as a vector of n dimensions, where n is the number of microarray measurements for that gene and whose coordinates are the results of each measurement of that gene. By comparing these vectors, we can detect which genes show similar data profiles across a series of experiments. Similarly, experiments can be thought of as vectors of m dimensions, where m is the number of genes and each coordinate is the measurement of a single gene on that microarray. Comparison of the array vectors will show which arrays showed the most similar behavior. Once we consider these data as vectors, we can use standard mathematical techniques to measure their similarity. PUMAdb uses two distance metrics to measure the similarity between vectors: Pearson Correlation and Euclidean distance. Pearson Correlation treats the vectors as if they were the same (unit) length, and is thus insensitive to the amplitude of changes that may be seen in the expression profiles. The Euclidean distance measures the absolute distance between two points in space, which in this case are defined by two vectors. Note that Euclidean distance will be affected by both the direction and the amplitude of the vectors, so that two genes that are coordinately expressed might not be seen to be similar if one has a much higher signal than the other.

Making a file of well-correlated genes

You can make a list of genes with well correlated data profiles within your dataset. Choose an upper limit of the number of genes to display, the correlation threshold and the method of correlation (either centered or non-centered). Unless you have used gene filters (for example, "Only use genes with greater than 80% good data") during Data Selection for Analysis, you should be aware that genes with only a single datapoint will be well-correlated to all other genes. The list of correlated genes is available for downloading and ends with a ".stdCor" extension.

Image Generation Options

Visualizing Clustered Microarray Data

After the data have been clustered, you will be presented with an image with a list of links. There are several options for visualizing and exploring your clustered data. An example of the image is shown in .

Figure 3. Image and options available after hierarchical clustering.
  1. Browse the cluster

    By clicking on the image itself, you can explore your data, zoom in, zoom out and more.

  2. View averaged spot images

    Spot images are simply square images with even signals that represent the actual spots on the array. It is often more convenient and intuitive to examine the broken images rather than the spots themselves. Figure 4 shows an example of a broken images.

    Figure 4. Display of clustering results as broken images.
  3. View spots

    You can view your clustering results as images of the actual spots, as illustrated in Figure 5.

    Figure 5. Display of clustering results as broken spots.
  4. View spots and spot images

    Figure 6 shows how you can see both the images and the spots in your clustering results.

    Figure 6. Display of clustering results as joint broken images.
  5. Notes on viewing Spot images

    When several instances of the same clone (SUID) are present on the same slide and data points are collapsed by SUID, this will result in averaging of these data points. Subsequently, when spot images are displayed during clustering, the images from one of the spots is picked randomly. Naturally, this might look quite different than the averaged cluster images.

    When planning on looking at spot images during clustering, the "Include SUID/LUID/SPOT in the UID column." box has to be checked during data retrieval. If this is left unchecked, the spot images can not be properly assigned to cluster images when multiple instances of the same clone (SUID) are present on the slides. A random spot image is picked and will be used for all spots. An example is shown on Figure 7.

    Figure 7. shows an example when data was collapsed by 'SPOT' (i.e. no averaging of spots), but the UID column was not retrieved. IMAGE:683659 was present in 4 spots on these slides, 3 of which clustered together (lower panel) but the 4th did not (upper panel). Spot images for this 4th spot were picked randomly and used for all the instances of the clone. This results in quite obvious discrepancy with the cluster images (lower panel).