Tmc {BGcom}R Documentation

Function to calculate the empirical null distribution of max(T)

Description

This function use bootstrap for calculating the empirical distribution of max(T) under the null hypothesis of independence between the two experiments. An empirical p-value is calculated to evaluate how the data are far from the hypothesis of independence

Usage

Tmc(repl, output.ratio, dir, data)

Arguments

data The data matrix for the experiments to be compared
dir directory for storing the plots
repl Number of replicates to be performed
output.ratio The output object from the ratio function

Details

This function uses bootstrap for calculating the empirical distribution of the maximum of T (i.e. T(q*)) under the null hypothesis of independence between the two experiments. The pvalues of one list are randomly permuted B times, while the ones for the other list are keeping fixed. In this way, any relationship between the two lists is destroyed. At each permutation b (b varies from 1 to B) a Tb(q) is calculated for each q and a maximum statistic Tb(q*) is returned; its distribution represents the null distribution under the condition of independence. The relative frequency of Tb(q*) larger than T(q*) is noted as empirical p value: it returns the proportion of Tb(q*) from permuted dataset greater than the observed one (so indicates where the observed T(q*) is located on the null distribution).

Value

Tmc: Returns the empirical pvalue from testing T(q*).

Author(s)

M. Blangiardo

References

Stone et al.(1988), Investigations of excess environmental risks around putative sources: statistical problems and a proposed test,Statistics in Medicine, 7, 649-660.

M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments, Genome Biology, 8, R54

Examples

data = simulation(n=500,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,DEcommon=100)
T<- ratio(data$Pval,interval=0.01,dir="D:/",name="CompData1Data2",pvalue=TRUE)
bootstrap<- Tmc(data$Pval,repl=100,output.ratio=T,dir="D:/")

## The function is currently defined as
function(repl,output.ratio,dir,data){
    if(output.ratio$pvalue==FALSE){
    data=1-data
    }

    lists = ncol(data)
    l=length(output.ratio$DECommon)
    Tmax = max(output.ratio$ratios,na.rm=TRUE)
    Tmax.null = rep(NA,repl)
    ratios.null = matrix(NA,l,repl)
    sample = matrix(NA,dim(data)[1],lists)
    
    sample[,1] <- data[,1]
    for(k in 1:repl){
    int = c()
    L=matrix(0,l,lists)
    data1 = matrix(NA,dim(data)[1],lists)
    data1[,1] <- data[,1]

        for(j in 2:lists){
                sample[,j] = sample(data[,j])
                data1[,j] = sample[,j]
                }
    
    threshold = output.ratio$q
    for(i in 1:l){
                temp = data1<=threshold[i]
                for(j in 1:lists){
                L[i,j] <- sum(temp[,j])
                temp[temp[,j]==FALSE,j]<-0
                temp[temp[,j]==TRUE,j]<-1
                }
                int[i] <- sum(apply(temp,1,sum)==lists) 
                        }


expected = apply(L,1,prod)/(dim(data)[1])^(lists-1)
observed = int
ratios = matrix(0,l,1)

for(i in 1:l){
    ratios[i,1] <- observed[i]/expected[i]
    }
ratios.null[,k] <- ratios
ratios <- ratios[threshold>0]
Tmax.null[k] = max(ratios)
}

ID=seq(1,repl)
p=length(ID[Tmax.null>=Tmax])
pvalue<- p/repl

postscript(paste("Pvalue","_",output.ratio$name,".ps"))
hist(Tmax.null,main="",xlab="T",ylab="",xaxt="n",cex.main=0.9,xlim=c(min(Tmax.null),max(c(Tmax,max(Tmax.null)))),yaxt="n",cex.axis=0.9)
axis(1,at = seq(min(Tmax.null),max(c(Tmax,max(Tmax.null))),5),labels = round(seq(min(Tmax.null),max(c(Tmax,max(Tmax.null))),5),0))
legend(x=Tmax/2,y=dim(data)[1]/100,legend=paste("P value =",pvalue),bty="n",cex=0.9)
abline(v=Tmax,lty=2)
dev.off()
return(pvalue=pvalue)
}


[Package BGcom version 1.0 Index]