Lewin, A., Richardson, S., Marshall C., Glazier A. and Aitman T. (2005) Bayesian Modelling of Differential Gene Expression. Biometrics (in press) abstract

Abstract: We present a Bayesian hierarchical model for detecting differentially expressing genes that includes simultaneous estimation of array effects, and show how to use the output for choosing lists of genes for further investigation. We give empirical evidence that expression-level dependent array effects are needed, and explore different non-linear functions as part of our model-based approach to normalisation. The model includes gene-specific variances but imposes some necessary shrinkage through a hierarchical structure. Model criticism via posterior predictive checks is discussed. Modelling the array effects (normalisation) simultaneously with differential expression gives fewer false positive results. To choose a list of genes, we propose to combine various criteria (for instance, fold change and overall expression) into a single indicator variable for each gene. The posterior distribution of these variables is used to pick the list of genes, thereby taking into account uncertainty in parameter estimates.

Broët, P., Lewin, A., Richardson, S., Dalmasso, C. and Magdelenat, H. (2004) A mixture model based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 2004 20(16):2562-2571; doi:10.1093/bioinformatics/bth285.

Abstract: Multiclass response (MCR) experiments are those in which there are more than two classes to be compared. In these experiments, though the null hypothesis is simple, there are typically many patterns of gene expression changes across the different classes that lead to complex alternatives. In this paper, we propose a new strategy for selecting genes in MCR based on a flexible mixture model for the marginal distribution of a modified F statistic. Using this model, false positive and negative discovery rates can be estimated and combined to produce a rule for selecting a subset of genes. Moreover, the method proposed allows calculation of these rates for any predefined subset of genes. We illustrate the performance our approach using simulated datasets and a real breast cancer microarray dataset from Hedenfalk et al. (2001). In this latter study, we investigate predefined subset of genes and point out interesting differences between three distinct biological pathways.