Significance Analysis

The largest source of unwanted variation in a microarray experiment is inherent to the samples being hybridized. Whether it is due to imperfect tissue isolation, poor RNA isolation, or simply the biological differences between individuals that have nothing to do with the variable being tested, these variations can never be eliminated by normalization. In fact, with a single sample, they're indistinguishable from the planned variation in the experiment.

This is why replicates are so important. By averaging several arrays that have been treated equivalently, the variations that have nothing to do with the treatment become smaller. The more (equivalent) arrays we average, the smaller these unwanted variations become. In practice, we strongly recommend that each experimental group have at least three replicates. By having three, we can apply much more powerful tests than simply averaging the samples.

One such test is the student's t-test. This is a common statistical test to determine if two distributions are significantly different. For microarrays, the distributions being compared are the signal values for a single gene on all of the replicates run. By using the t-test, we can create a list of genes that change between your control and experiment at a specified confidence value.

In order to detect changes between more than two groups, we use ANOVA (Analysis of Variance). ANOVA is an extension of the t-test, and in fact gives the same results when run on two groups. When run on more than two groups, though, it allows us to see which genes change in any one of the groups (at a given confidence level). After the changing genes are identified, one of several post-hoc tests can be performed to determine which groups they changed in.

A major problem with the idea of performing these tests on microarray data is that of false positives. Simply put, a p-value of 0.05 means that there is a 5% chance that a particular gene was called significant by chance, and is not really significant. We can accept that error level for one test, but it becomes overwhelming in large numbers - it would mean that 500 genes on a list of 10,000 are false positives. To reduce the volume of these errors, we typically apply a Multiple Testing Correction to our list. There are several different MTC algorithms that vary in stringency - eliminating false positives from a list inevitably removes true positives at the same time, and the analyst must balance these to achieve a manageable result.