| Tmc {BGcom} | R Documentation |
This function use bootstrap for calculating the empirical distribution of max(T) under the null hypothesis of independence between the two experiments. An empirical p-value is calculated to evaluate how the data are far from the hypothesis of independence
Tmc(repl, output.ratio, dir, data)
data |
The data matrix for the experiments to be compared |
dir |
directory for storing the plots |
repl |
Number of replicates to be performed |
output.ratio |
The output object from the ratio function |
This function uses bootstrap for calculating the empirical distribution of the maximum of T (i.e. T(q*)) under the null hypothesis of independence between the two experiments. The pvalues of one list are randomly permuted B times, while the ones for the other list are keeping fixed. In this way, any relationship between the two lists is destroyed. At each permutation b (b varies from 1 to B) a Tb(q) is calculated for each q and a maximum statistic Tb(q*) is returned; its distribution represents the null distribution under the condition of independence. The relative frequency of Tb(q*) larger than T(q*) is noted as empirical p value: it returns the proportion of Tb(q*) from permuted dataset greater than the observed one (so indicates where the observed T(q*) is located on the null distribution).
Tmc: Returns the empirical pvalue from testing T(q*).
M. Blangiardo
Stone et al.(1988), Investigations of excess environmental risks around putative sources: statistical problems and a proposed test,Statistics in Medicine, 7, 649-660.
M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments, Genome Biology, 8, R54
data = simulation(n=500,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,DEcommon=100)
T<- ratio(data$Pval,interval=0.01,dir="D:/",name="CompData1Data2",pvalue=TRUE)
bootstrap<- Tmc(data$Pval,repl=100,output.ratio=T,dir="D:/")
## The function is currently defined as
function(repl,output.ratio,dir,data){
if(output.ratio$pvalue==FALSE){
data=1-data
}
lists = ncol(data)
l=length(output.ratio$DECommon)
Tmax = max(output.ratio$ratios,na.rm=TRUE)
Tmax.null = rep(NA,repl)
ratios.null = matrix(NA,l,repl)
sample = matrix(NA,dim(data)[1],lists)
sample[,1] <- data[,1]
for(k in 1:repl){
int = c()
L=matrix(0,l,lists)
data1 = matrix(NA,dim(data)[1],lists)
data1[,1] <- data[,1]
for(j in 2:lists){
sample[,j] = sample(data[,j])
data1[,j] = sample[,j]
}
threshold = output.ratio$q
for(i in 1:l){
temp = data1<=threshold[i]
for(j in 1:lists){
L[i,j] <- sum(temp[,j])
temp[temp[,j]==FALSE,j]<-0
temp[temp[,j]==TRUE,j]<-1
}
int[i] <- sum(apply(temp,1,sum)==lists)
}
expected = apply(L,1,prod)/(dim(data)[1])^(lists-1)
observed = int
ratios = matrix(0,l,1)
for(i in 1:l){
ratios[i,1] <- observed[i]/expected[i]
}
ratios.null[,k] <- ratios
ratios <- ratios[threshold>0]
Tmax.null[k] = max(ratios)
}
ID=seq(1,repl)
p=length(ID[Tmax.null>=Tmax])
pvalue<- p/repl
postscript(paste("Pvalue","_",output.ratio$name,".ps"))
hist(Tmax.null,main="",xlab="T",ylab="",xaxt="n",cex.main=0.9,xlim=c(min(Tmax.null),max(c(Tmax,max(Tmax.null)))),yaxt="n",cex.axis=0.9)
axis(1,at = seq(min(Tmax.null),max(c(Tmax,max(Tmax.null))),5),labels = round(seq(min(Tmax.null),max(c(Tmax,max(Tmax.null))),5),0))
legend(x=Tmax/2,y=dim(data)[1]/100,legend=paste("P value =",pvalue),bty="n",cex=0.9)
abline(v=Tmax,lty=2)
dev.off()
return(pvalue=pvalue)
}