simulation               package:BGcom               R Documentation

_S_i_m_u_l_a_t_e _t_h_e _d_a_t_a _i_n _f_o_r_m _o_f _p-_v_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     The function simulates two vectors of p-values using the procedure
     described in Hwang et al.

_U_s_a_g_e:

     simulation(n, GammaA, GammaB, epsilonM = 0, epsilonSD = 1, r1, r2, DEfirst, DEsecond, DEcommon)

_A_r_g_u_m_e_n_t_s:

       n: Number of features to simulate

GammaA,GammaB: Parameters of the Gamma distribution

epsilonM,epsilonSD: Parameters of the Gaussian noise

   r1,r2: Additional experiment-specific noise

DEfirst,DEsecond: Number of DE features in each experiment

DEcommon: Number of DE features in common between the two experiments

_D_e_t_a_i_l_s:

     Considering two experiments (k=1,2), each of them with two
     classes, and n genes, for each gene we simulate a true difference
     between the classes dg, drawn from a Gamma distribution with
     random sign. The true difference dg is 0 if the gene is not
     differentially expressed. We then add two normal random noise
     components, r(k) that is the  experiment specific component and
     e(gk), that is the gene-experiment component. The former is
     assigned deterministically,  whilst the latter is drawn from a
     standard Gaussian distribution. So the log fold change  (T(gk)) is
     the sum of all these components for each gene and experiment. We
     divide the n genes in four groups: genes differentially expressed
     (DE) in both experiments, genes differentially expressed only in
     the first experiment, genes differentially expressed only in the
     second experiment and genes differentially expressed in neither
     experiment. When the genes are differentially expressed in both
     experiments, they share the same dg and the only difference
     between them is given by the random components: T(g1) = dg + r(1)
     times e(g1) T(g2) = dg + r(2) times e(g2) This group represents
     the true positive genes (i.e. truly DE in both experiments) that
     we are interested in finding using our method. The two groups of
     genes differentially expressed only in one of the two experiments
     act like additional noise and make the simulation more realistic.
     Together with the genes not differentially expressed they
     constitute the negative genes in our set up, i.e. genes that
     should not be listed if the procedure correctly clarifies the
     intersection.

     Then, as described in Hwang et al., a two tails T-test is
     performed for each T(gk) and a p-value is generated as: P(gk) = 2
     Normal cdf(-absolute value (T(gk)/r(k))).

_V_a_l_u_e:

   names: Which group each simulated gene expression valeue belongs to

   T1,T2: T statistic for the first and second experiment

    Pval: p-value for the experiments to be compared

_A_u_t_h_o_r(_s):

     Marta Blangiardo

_R_e_f_e_r_e_n_c_e_s:

     Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri
     P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration
     methodology for system biology. PNAS 2005, 29:1729617301.

     M.Blangiardo and S.Richardson Statistical tools for synthesizing
     lists of differentially expressed features in related experiments,
     Genome Biology, 8, R54

_E_x_a_m_p_l_e_s:

     data = simulation(n=500,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,DEcommon=100)

     ## The function is currently defined as
     function(n,GammaA,GammaB,epsilonM=0, epsilonSD=1, r1,r2,DEfirst,DEsecond,DEcommon){

     T1=c()
     T2=c()
     delta=rgamma(n,GammaA,1/GammaB)
     epsilon1=rnorm(n,epsilonM,epsilonSD)
     epsilon2=rnorm(n,epsilonM,epsilonSD)
     names=c()
     #Group 1 : DE in common
     for(i in 1: DEcommon){
     x=rbinom(1,1,0.5)
     if(x==1) {
         T1[i] <- delta[i] + epsilon1[i]*r1;
         T2[i] <- delta[i] + epsilon2[i]*r2
       }
     if(x==0) {T1[i] <- -delta[i] - epsilon1[i]*r1;
         T2[i] <- -delta[i] - epsilon2[i]*r2}

     names[i] <- "DEcommon"
       }

     #Group 2 : DE in the first experiment
     for(i in (DEcommon+1):(DEfirst)){
     x=rbinom(1,1,0.5)
     if(x==1){
     T1[i] <- delta[i] + epsilon1[i]*r1
       }
     if(x==0){T1[i] <- -delta[i] - epsilon1[i]*r1}
     T2[i] <- epsilon2[i]*r2
     names[i] <- "DEfirst"
       }

     #Group 3 : DE in the second experiment
     for(i in (DEfirst+1):(DEfirst+DEsecond-DEcommon)){
     x=rbinom(1,1,0.5)
     T1[i] <- epsilon1[i]*r1
     if(x==1){
     T2[i] <- delta[i] + epsilon2[i]*r2
       }
     if(x==0){T2[i] <- -delta[i] - epsilon2[i]*r2}
     names[i] <- "DEsecond"

       }

     #Group 4 : Not DE in Both experiments
     for(i in (DEfirst+DEsecond-DEcommon+1):(n)){
     T1[i] <- epsilon1[i]*r1
     T2[i] <- epsilon2[i]*r2
     names[i] <- "Null"
       }

     ##############################################
     #Assign the Pvalues

     Pval1 = c()
     Pval2 = c()

     for(i in 1:n){
     Pval1[i] <- 2*pnorm(-abs(T1[i]/r1))
     Pval2[i] <- 2*pnorm(-abs(T2[i]/r2))
     #Pval1[i] <- 1-pnorm(T1[i]/r1)
     #Pval2[i] <- 1-pnorm(T2[i]/r2)
       }

     ##############################################
     return(list(names=names,T1=T1,T2=T2,Pval=cbind(Pval1,Pval2)))
       }

