Microarray analysis techniques

Microarray analysis techniques are used in interpreting the data generated from experiments on DNA, RNA, and protein microarrays, which allow researchers to investigate the expression state of a large number of genes - in many cases, an organism's entire genome - in a single experiment. Such experiments can generate very large volumes of data, allowing researchers to assess the overall state of a cell or organism. These large data amount can be difficult to analyze, especially in the absence of good gene annotation.

Microarray data analysis involves several distinct steps, as outlined below. Changing any one of the steps will change the outcome of the analysis, so the MAQC Project was created to identify a set of standard strategies. Companies exist that use the MAQC protocols to perform a complete analysis.

Most microarray manufacturers, such as Affymetrix and Agilent, provide commercial data analysis software with microarray equipment such as plate readers.

Depending on the type of array, signal related to nonspecific binding of the fluorophore can be subtracted to achieve better results. One approach involves subtracting the average signal intensity of the area between spots. A variety of tools for background correction and further analysis are available from TIGR, Agilent (GeneSpring), and Ocimum Bio Solutions (Genowiz).

Entire arrays may have obvious flaws detectable by visual inspection, pairwise comparisons to arrays in the same experimental group, or by analysis of RNA degradation. Results may improve by removing these arrays from the analysis entirely.

Visual identification of local artifacts, such as printing or washing defects, may likewise suggest the removal of individual spots. This can take a substantial amount of time depending on the quality of array manufacture. In addition, some procedures call for the elimination of all spots with an expression value below a certain intensity threshold.

Comparing two different arrays, or two different samples hybridized to the same array generally involves making adjustments for systematic errors introduced by differences in procedures and dye intensity effects. Dye normalization for two color arrays is often achieved by local regression. LIMMA provides a set of tools for background correction and scaling, as well as an option to average on-slide duplicate spots. A common method for evaluating how well normalized an array is, is to plot an MA plot of the data.

Raw Affy data contains about twenty probes for the same RNA target. Half of these are "mismatch spots", which do not precisely match the target sequence. These can theoretically measure the amount of nonspecific binding for a given target. Robust Multi-array Average (RMA) is a normalization approach that does not take advantage of these mismatch spots, but still must summarize the perfect matches through median polish. The median polish algorithm, although robust, behaves differently depending on the number of samples analyzed. Quantile normalization, also part of RMA, is one sensible approach to normalize a batch of arrays in order to make further comparisons meaningful.

...
Wikipedia