Tmc {BGcom}R Documentation

Function to calculate the empirical null distribution of max(T)

Description

This function use bootstrap for calculating the empirical distribution of max(T) under the null hypothesis of independence between the two experiments. An empirical p-value is calculated to evaluate how the data are far from the hypothesis of independence

Usage

Tmc(n, repl, output.ratio, dir, data1, data2)

Arguments

data1 The vector of the data from the first experiment
data2 The vector of the data from the second experiment
n length of the vector of simulated pvalue
dir directory for storing the plots
repl Number of replicates to be performed
output.ratio The output object from the real.ratio function

Details

This function uses bootstrap for calculating the empirical distribution of the maximum of T (i.e. T(q*)) under the null hypothesis of independence between the two experiments. The pvalues of one list are randomly permuted B times, while the ones for the other list are keeping fixed. In this way, any relationship between the two lists is destroyed. At each permutation b (b varies from 1 to B) a Tb(q) is calculated for each q and a maximum statistic Tb(q*) is returned; its distribution represents the null distribution under the condition of independence. The relative frequency of Tb(q*) larger than T(q*) is noted as empirical p value: it returns the proportion of Tb(q*) from permuted dataset greater than the observed one (so indicates where the observed T(q*) is located on the null distribution).

Value

Tmc: Returns the pvalue of the observed T(q*).

Note

Author(s)

M. Blangiardo

References

Stone et al.(1988), Investigations of excess environmental risks around putative sources: statistical problems and a proposed test,Statistics in Medicine, 7, 649-660. M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments (under revision)

Examples

data = simulation(n=2000,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=1000,DEsecond=800,DEcommon=700)
T<- ratio(n=length(data$Pval1),data1=data$Pval1,data2=data$Pval2,interval=0.01,dir="D:/",name="CompData1Data2",pvalue=TRUE)
bootstrap<- Tmc(n=length(data$Pval1),data1=data$Pval1,data2=data$Pval2,repl=100,output.ratio=T,dir="D:/")

## The function is currently defined as
function(n,repl,output.ratio,dir,data1,data2){
    l=sum(output.ratio$int>0)
    max = max(output.ratio$ratios,na.rm=TRUE)
    max.null = rep(NA,repl)
    ratios.null = matrix(NA,l,repl)

for(k in 1:repl){
    sample1 = sample(data2)
    ID=seq(1,n)
    threshold = cbind(rep(NA,l),seq((101-l)/100,1,output.ratio$interval))

    for(i in 1:l){
        threshold[i,1] <- (length(ID[data1<=threshold[i,2] &    sample1<=threshold[i,2]]))
        }

    threshold.expected = cbind(rep(NA,l),rep(NA,l),rep(NA,l),seq((101-l)/100,1,output.ratio$interval))
    for(i in 1:l){
        threshold.expected[i,1] <- length(ID[data1<=threshold.expected[i,4]])/n;
        threshold.expected[i,2]<- length(ID[sample1<=threshold.expected[i,4]])/n;
        threshold.expected[i,3] <- threshold.expected[i,1]*threshold.expected[i,2]*n
        }

expected = threshold.expected[,3]
observed = threshold[,1] 
ratios = matrix(0,l,1)

for(i in 1:l){
    ratios[i,1] <- observed[i]/expected[i]
    }
ratios.null[,k] <- ratios
  }

ID=seq(1,repl)
p=length(ID[max.null>=max])
pvalue<- p/repl

postscript(paste("Pvalue","_",output.ratio$name,".ps"))
hist(max.null,main="",xlab="T",ylab="",cex.main=0.9,xlim=c(min(max.null),max(max(max.null),max)),yaxt="n")
legend(x=max/2,y=n/4,legend=paste("Pvalue =",pvalue),bty="n",cex=0.9)
abline(v=max,lty=2)
dev.off()
return(pvalue=pvalue)
        }

[Package BGcom version 1.0 Index]