simulation {BGcom}R Documentation

Simulate the data in form of p-values

Description

The function simulates two vectors of p-values using the procedure described in Hwang et al.

Usage

simulation(n, GammaA, GammaB, epsilonM = 0, epsilonSD = 1, r1, r2, DEfirst, DEsecond, DEcommon)

Arguments

n Number of features to simulate
GammaA,GammaB Parameters of the Gamma distribution
epsilonM,epsilonSD Parameters of the Gaussian noise
r1,r2 Additional experiment-specific noise
DEfirst,DEsecond Number of DE features in each experiment
DEcommon Number of DE features in common between the two experiments

Details

Considering two experiments (k=1,2), each of them with two classes, and n genes, for each gene we simulate a true difference between the classes dg, drawn from a Gamma distribution with random sign. The true difference dg is 0 if the gene is not differentially expressed. We then add two normal random noise components, r(k) that is the experiment specific component and e(gk), that is the gene-experiment component. The former is assigned deterministically, whilst the latter is drawn from a standard Gaussian distribution. So the log fold change (T(gk)) is the sum of all these components for each gene and experiment. We divide the n genes in four groups: genes differentially expressed (DE) in both experiments, genes differentially expressed only in the first experiment, genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment. When the genes are differentially expressed in both experiments, they share the same dg and the only difference between them is given by the random components: T(g1) = dg + r(1) times e(g1) T(g2) = dg + r(2) times e(g2) This group represents the true positive genes (i.e. truly DE in both experiments) that we are interested in finding using our method. The two groups of genes differentially expressed only in one of the two experiments act like additional noise and make the simulation more realistic. Together with the genes not differentially expressed they constitute the negative genes in our set up, i.e. genes that should not be listed if the procedure correctly clarifies the intersection.

Then, as described in Hwang et al., a two tails T-test is performed for each T(gk) and a p-value is generated as: P(gk) = 2 Normal cdf(-absolute value (T(gk)/r(k))).

Value

names Which group each simulated gene expression valeue belongs to
T1,T2 T statistic for the first and second experiment
Pval p-value for the experiments to be compared

Author(s)

Marta Blangiardo

References

Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration methodology for system biology. PNAS 2005, 29:17296–17301.

M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments, Genome Biology, 8, R54

Examples

data = simulation(n=500,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,DEcommon=100)

## The function is currently defined as
function(n,GammaA,GammaB,epsilonM=0, epsilonSD=1, r1,r2,DEfirst,DEsecond,DEcommon){

T1=c()
T2=c()
delta=rgamma(n,GammaA,1/GammaB)
epsilon1=rnorm(n,epsilonM,epsilonSD)
epsilon2=rnorm(n,epsilonM,epsilonSD)
names=c()
#Group 1 : DE in common
for(i in 1: DEcommon){
x=rbinom(1,1,0.5)
if(x==1) {
    T1[i] <- delta[i] + epsilon1[i]*r1;
    T2[i] <- delta[i] + epsilon2[i]*r2
  }
if(x==0) {T1[i] <- -delta[i] - epsilon1[i]*r1;
    T2[i] <- -delta[i] - epsilon2[i]*r2}

names[i] <- "DEcommon"
  }

#Group 2 : DE in the first experiment
for(i in (DEcommon+1):(DEfirst)){
x=rbinom(1,1,0.5)
if(x==1){
T1[i] <- delta[i] + epsilon1[i]*r1
  }
if(x==0){T1[i] <- -delta[i] - epsilon1[i]*r1}
T2[i] <- epsilon2[i]*r2
names[i] <- "DEfirst"
  }

#Group 3 : DE in the second experiment
for(i in (DEfirst+1):(DEfirst+DEsecond-DEcommon)){
x=rbinom(1,1,0.5)
T1[i] <- epsilon1[i]*r1
if(x==1){
T2[i] <- delta[i] + epsilon2[i]*r2
  }
if(x==0){T2[i] <- -delta[i] - epsilon2[i]*r2}
names[i] <- "DEsecond"

  }

#Group 4 : Not DE in Both experiments
for(i in (DEfirst+DEsecond-DEcommon+1):(n)){
T1[i] <- epsilon1[i]*r1
T2[i] <- epsilon2[i]*r2
names[i] <- "Null"
  }

##############################################
#Assign the Pvalues

Pval1 = c()
Pval2 = c()

for(i in 1:n){
Pval1[i] <- 2*pnorm(-abs(T1[i]/r1))
Pval2[i] <- 2*pnorm(-abs(T2[i]/r2))
#Pval1[i] <- 1-pnorm(T1[i]/r1)
#Pval2[i] <- 1-pnorm(T2[i]/r2)
  }

##############################################
return(list(names=names,T1=T1,T2=T2,Pval=cbind(Pval1,Pval2)))
  }

[Package BGcom version 1.0 Index]