simulation {BGcom} | R Documentation |
The function simulates two vectors of p-values using the procedure described in Hwang et al.
simulation(n, GammaA, GammaB, epsilonM = 0, epsilonSD = 1, r1, r2, DEfirst, DEsecond, DEcommon)
n |
Number of features to simulate |
GammaA,GammaB |
Parameters of the Gamma distribution |
epsilonM,epsilonSD |
Parameters of the Gaussian noise |
r1,r2 |
Additional experiment-specific noise |
DEfirst,DEsecond |
Number of DE features in each experiment |
DEcommon |
Number of DE features in common between the two experiments |
Considering two experiments (k=1,2), each of them with two classes, and n genes, for each gene we simulate a true difference between the classes dg, drawn from a Gamma distribution with random sign. The true difference dg is 0 if the gene is not differentially expressed. We then add two normal random noise components, r(k) that is the experiment specific component and e(gk), that is the gene-experiment component. The former is assigned deterministically, whilst the latter is drawn from a standard Gaussian distribution. So the log fold change (T(gk)) is the sum of all these components for each gene and experiment. We divide the n genes in four groups: genes differentially expressed (DE) in both experiments, genes differentially expressed only in the first experiment, genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment. When the genes are differentially expressed in both experiments, they share the same dg and the only difference between them is given by the random components: T(g1) = dg + r(1) times e(g1) T(g2) = dg + r(2) times e(g2) This group represents the true positive genes (i.e. truly DE in both experiments) that we are interested in finding using our method. The two groups of genes differentially expressed only in one of the two experiments act like additional noise and make the simulation more realistic. Together with the genes not differentially expressed they constitute the negative genes in our set up, i.e. genes that should not be listed if the procedure correctly clarifies the intersection.
Then, as described in Hwang et al., a two tails T-test is performed for each T(gk) and a p-value is generated as: P(gk) = 2 Normal cdf(-absolute value (T(gk)/r(k))).
names |
Which group each simulated gene expression valeue belongs to |
T1,T2 |
T statistic for the first and second experiment |
Pval1,Pval2 |
p-value for the first and the second experiment |
Marta Blangiardo
Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration methodology for system biology. PNAS 2005, 29:17296–17301. M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments (under revision)
data = simulation(n=2000,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=1000,DEsecond=800,DEcommon=700) ## The function is currently defined as function(n,GammaA,GammaB,epsilonM=0, epsilonSD=1, r1,r2,DEfirst,DEsecond,DEcommon){ T1=c() T2=c() delta=rgamma(n,GammaA,1/GammaB) epsilon1=rnorm(n,epsilonM,epsilonSD) epsilon2=rnorm(n,epsilonM,epsilonSD) names=c() #Group 1 : DE in common for(i in 1: DEcommon){ x=rbinom(1,1,0.5) if(x==1) { T1[i] <- delta[i] + epsilon1[i]*r1; T2[i] <- delta[i] + epsilon2[i]*r2 } if(x==0) {T1[i] <- -delta[i] - epsilon1[i]*r1; T2[i] <- -delta[i] - epsilon2[i]*r2} names[i] <- "DEcommon" } #Group 2 : DE in the first experiment for(i in (DEcommon+1):(DEfirst)){ x=rbinom(1,1,0.5) if(x==1){ T1[i] <- delta[i] + epsilon1[i]*r1 } if(x==0){T1[i] <- -delta[i] - epsilon1[i]*r1} T2[i] <- epsilon2[i]*r2 names[i] <- "DEfirst" } #Group 3 : DE in the second experiment for(i in (DEfirst+1):(DEfirst+DEsecond-DEcommon)){ x=rbinom(1,1,0.5) T1[i] <- epsilon1[i]*r1 if(x==1){ T2[i] <- delta[i] + epsilon2[i]*r2 } if(x==0){T2[i] <- -delta[i] - epsilon2[i]*r2} names[i] <- "DEsecond" } #Group 4 : Not DE in Both experiments for(i in (DEfirst+DEsecond-DEcommon+1):(n)){ T1[i] <- epsilon1[i]*r1 T2[i] <- epsilon2[i]*r2 names[i] <- "Null" } ############################################## #Assign the Pvalues Pval1 = c() Pval2 = c() for(i in 1:n){ Pval1[i] <- 2*pnorm(-abs(T1[i]/r1)) Pval2[i] <- 2*pnorm(-abs(T2[i]/r2)) #Pval1[i] <- 1-pnorm(T1[i]/r1) #Pval2[i] <- 1-pnorm(T2[i]/r2) } ############################################## return(list(names=names,T1=T1,T2=T2,Pval1=Pval1,Pval2=Pval2)) }