Tmc {BGcom} | R Documentation |
This function use bootstrap for calculating the empirical distribution of max(T) under the null hypothesis of independence between the two experiments. An empirical p-value is calculated to evaluate how the data are far from the hypothesis of independence
Tmc(n, repl, output.ratio, dir, data1, data2)
data1 |
The vector of the data from the first experiment |
data2 |
The vector of the data from the second experiment |
n |
length of the vector of simulated pvalue |
dir |
directory for storing the plots |
repl |
Number of replicates to be performed |
output.ratio |
The output object from the real.ratio function |
This function uses bootstrap for calculating the empirical distribution of the maximum of T (i.e. T(q*)) under the null hypothesis of independence between the two experiments. The pvalues of one list are randomly permuted B times, while the ones for the other list are keeping fixed. In this way, any relationship between the two lists is destroyed. At each permutation b (b varies from 1 to B) a Tb(q) is calculated for each q and a maximum statistic Tb(q*) is returned; its distribution represents the null distribution under the condition of independence. The relative frequency of Tb(q*) larger than T(q*) is noted as empirical p value: it returns the proportion of Tb(q*) from permuted dataset greater than the observed one (so indicates where the observed T(q*) is located on the null distribution).
Tmc: Returns the pvalue of the observed T(q*).
M. Blangiardo
Stone et al.(1988), Investigations of excess environmental risks around putative sources: statistical problems and a proposed test,Statistics in Medicine, 7, 649-660. M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments (under revision)
data = simulation(n=2000,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=1000,DEsecond=800,DEcommon=700) T<- ratio(n=length(data$Pval1),data1=data$Pval1,data2=data$Pval2,interval=0.01,dir="D:/",name="CompData1Data2",pvalue=TRUE) bootstrap<- Tmc(n=length(data$Pval1),data1=data$Pval1,data2=data$Pval2,repl=100,output.ratio=T,dir="D:/") ## The function is currently defined as function(n,repl,output.ratio,dir,data1,data2){ l=sum(output.ratio$int>0) max = max(output.ratio$ratios,na.rm=TRUE) max.null = rep(NA,repl) ratios.null = matrix(NA,l,repl) for(k in 1:repl){ sample1 = sample(data2) ID=seq(1,n) threshold = cbind(rep(NA,l),seq((101-l)/100,1,output.ratio$interval)) for(i in 1:l){ threshold[i,1] <- (length(ID[data1<=threshold[i,2] & sample1<=threshold[i,2]])) } threshold.expected = cbind(rep(NA,l),rep(NA,l),rep(NA,l),seq((101-l)/100,1,output.ratio$interval)) for(i in 1:l){ threshold.expected[i,1] <- length(ID[data1<=threshold.expected[i,4]])/n; threshold.expected[i,2]<- length(ID[sample1<=threshold.expected[i,4]])/n; threshold.expected[i,3] <- threshold.expected[i,1]*threshold.expected[i,2]*n } expected = threshold.expected[,3] observed = threshold[,1] ratios = matrix(0,l,1) for(i in 1:l){ ratios[i,1] <- observed[i]/expected[i] } ratios.null[,k] <- ratios } ID=seq(1,repl) p=length(ID[max.null>=max]) pvalue<- p/repl postscript(paste("Pvalue","_",output.ratio$name,".ps")) hist(max.null,main="",xlab="T",ylab="",cex.main=0.9,xlim=c(min(max.null),max(max(max.null),max)),yaxt="n") legend(x=max/2,y=n/4,legend=paste("Pvalue =",pvalue),bty="n",cex=0.9) abline(v=max,lty=2) dev.off() return(pvalue=pvalue) }