simulation.indep {BGcom} | R Documentation |
The function simulate two vectors of p-values using the procedure described in Hwang et al. for independent experiments
simulation.indep(n, GammaA, GammaB, epsilonM = 0, epsilonSD = 1, r1, r2, DEfirst, DEsecond)
n |
Number of features to be simulated |
GammaA,GammaB |
Parameters of the Gamma distribution |
epsilonM,epsilonSD |
Parameters of the Gaussian noise |
r1,r2 |
Additional experiment-specific noise |
DEfirst,DEsecond |
Number of DE features in each experiment |
Considering two experiments (k=1,2), each of them with two classes, and n genes, for each gene we simulate a true difference between the classes dg, drawn from a Gamma distribution with random sign. The true difference dg is 0 if the gene is not differentially expressed. We then add two normal random noise components, r[k] that is the experiment specific component and e(gk), that is the gene-experiment component. The former is assigned deterministically, whilst the latter is drawn from a standard Gaussian distribution. So the log fold change (T(gk)) is the sum of all these components for each gene and experiment. We divide the n genes in three groups: genes differentially expressed only in the first experiment, genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment. There are not true positive genes (i.e. truly DE in both experiments), so we should find no genes in common using our method.
Then, as described in Hwang et al., a two tails T-test is performed for each T(gk) and a p-value is generated as: P(gk) = 2 Normal cdf(-absolute value (T(gk)/r(k))) where T(gk) is the t statistic that evaluates the differential expression between the two classes for the g gene and k experiment.
names |
Which group each simulated gene expression valeue belongs to |
T1,T2 |
T statistic for the first and second experiment |
Pval |
p-values for the experiment to be compared |
Marta Blangiardo
Hwang D, Rust A, Ramsey S, Smith J, Leslie D, Weston A, de Atauri P, Aitchison J, Hood L, Siegel A, Bolouri H: A data integration methodology for system biology. PNAS 2005, 29:17296–17301.
M.Blangiardo and S.Richardson Statistical tools for synthesizing lists of differentially expressed features in related experiments, Genome Biology, 8, R54
data.indep = simulation(n=500,GammaA=1,GammaB=1,r1=0.5,r2=0.8,DEfirst=300,DEsecond=200,DEcommon=100) ## The function is currently defined as function(n,GammaA=2,GammaB=2,epsilonM=0, epsilonSD=1, r1,r2,DEfirst,DEsecond){ T1=c() T2=c() delta=rgamma(n,shape=GammaA,scale=GammaB) epsilon1=rnorm(n,epsilonM,epsilonSD) epsilon2=rnorm(n,epsilonM,epsilonSD) names=c() #Group 1 : DE in the first experiment for(i in (1):(DEfirst)){ x=rbinom(1,1,0.5) if(x==1){T1[i] <- delta[i] + epsilon1[i]*r1} if(x==0){T1[i] <- -delta[i] - epsilon1[i]*r1} T2[i] <- epsilon2[i]*r2 names[i] <- "DEfirst" } #Group 2 : DE in the second experiment for(i in (DEfirst+1):(DEfirst+DEsecond)){ x=rbinom(1,1,0.5) T1[i] <- epsilon1[i]*r1 if(x==1){T2[i] <- delta[i] + epsilon2[i]*r2} if(x==0){T2[i] <- -delta[i] - epsilon2[i]*r2} names[i] <- "DEsecond" } #Group 3 : Not DE in Both experiments for(i in (DEfirst+DEsecond+1):(n)){ T1[i] <- epsilon1[i]*r1 T2[i] <- epsilon2[i]*r2 names[i] <- "Null" } ############################################## #Assign the Pvalues Pval1 = c() Pval2 = c() for(i in 1:n){ Pval1[i] <- 2*pnorm(-abs(T1[i]/r1)) Pval2[i] <- 2*pnorm(-abs(T2[i]/r2)) } ############################################## return(list(names=names,T1=T1,T2=T2,Pval=cbind(Pval1,Pval2))) }