simulation.indep {BGcom}R Documentation

Simulation for independent experiments

Description

The function simulate two vectors of p-values using the procedure described in Hwang et al. for independent experiments

Usage

simulation.indep(n, GammaA, GammaB, epsilonM = 0, epsilonSD = 1, r1, r2, DEfirst, DEsecond)

Arguments

n Number of features to be simulated
GammaA,GammaB Parameters of the Gamma distribution
epsilonM,epsilonSD Parameters of the Gaussian noise
r1,r2 Additional experiment-specific noise
DEfirst,DEsecond Number of DE features in each experiment

Details

Considering two experiments (k=1,2), each of them with two classes, and n genes, for each gene we simulate a true difference between the classes dg, drawn from a Gamma distribution with random sign. The true difference dg is 0 if the gene is not differentially expressed. We then add two normal random noise components, r[k] that is the experiment specific component and e(gk), that is the gene-experiment component. The former is assigned deterministically, whilst the latter is drawn from a standard Gaussian distribution. So the log fold change (T(gk)) is the sum of all these components for each gene and experiment. We divide the n genes in three groups: genes differentially expressed only in the first experiment, genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment. There are not true positive genes (i.e. truly DE in both experiments), so we should find no genes in common using our method.

Then, as described in Hwang et al., a two tails T-test is performed for each T(gk) and a p-value is generated as: P(gk) = 2 Normal cdf(-absolute value (T(gk)/r(k))) where T(gk) is the t statistic that evaluates the differential expression between the two classes for the g gene and k experiment.

Value

names Which group each simulated gene expression valeue belongs to
T1,T2 T statistic for the first and second experiment
Pval1,Pval2 p-value for the first and the second experiment

Author(s)

Marta Blangiardo

References

Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration methodology for system biology. PNAS 2005, 29:17296–17301. M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments (under revision)

Examples

simulated.indepP = simulation.indep(n=2000,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=1000,DEsecond=800)

## The function is currently defined as
function(n,GammaA=2,GammaB=2,epsilonM=0, epsilonSD=1, r1,r2,DEfirst,DEsecond){

T1=c()
T2=c()
delta=rgamma(n,shape=GammaA,scale=GammaB)
epsilon1=rnorm(n,epsilonM,epsilonSD)
epsilon2=rnorm(n,epsilonM,epsilonSD)
names=c()
#Group 1 : DE in the first experiment
for(i in (1):(DEfirst)){
    x=rbinom(1,1,0.5)
    if(x==1){T1[i] <- delta[i] + epsilon1[i]*r1}
    if(x==0){T1[i] <- -delta[i] - epsilon1[i]*r1}
    T2[i] <- epsilon2[i]*r2
    names[i] <- "DEfirst"
}

#Group 2 : DE in the second experiment
for(i in (DEfirst+1):(DEfirst+DEsecond)){
    x=rbinom(1,1,0.5)
    T1[i] <- epsilon1[i]*r1
    if(x==1){T2[i] <- delta[i] + epsilon2[i]*r2}
    if(x==0){T2[i] <- -delta[i] - epsilon2[i]*r2}
    names[i] <- "DEsecond"

}

#Group 3 : Not DE in Both experiments
for(i in (DEfirst+DEsecond+1):(n)){
    T1[i] <- epsilon1[i]*r1
    T2[i] <- epsilon2[i]*r2
    names[i] <- "Null"
}

##############################################
#Assign the Pvalues

Pval1 = c()
Pval2 = c()

for(i in 1:n){
Pval1[i] <- 2*pnorm(-abs(T1[i]/r1))
Pval2[i] <- 2*pnorm(-abs(T2[i]/r2))
}

##############################################
return(list(names=names,T1=T1,T2=T2,Pval1=Pval1,Pval2=Pval2))
        }

[Package BGcom version 1.0 Index]