GUESS is a computationally optimised C++ implementation of a fully Bayesian variable selection approach that can analyse, in a genome-wide context, single and multiple responses in an integrated way. The program uses packages from the GNU Scientific Library (GSL) and offers the possibility to re-route computationally intensive linear algebra operations towards the Graphical Processing Unit (GPU) through the use of proprietary CULA-dense library.
The multi-SNP model of GUESS typically seeks for the best
combinations of SNPs to predict the (possibly multivariate) outcome of interest.
In its current implementation, using its GPU capacities,
GUESS is able to handle hundreds of thousands of
predictors, which enables genome-wide sized datasets to be
analysed. However, the use of GPU-based numerical libraries implies
extensive data transfer between the memory/CPU
and the GPU, which, in turn, can be computationally expensive. As a
consequence, for smaller datasets (as the example provided in the
package) for which the matrix operations are not rate-limiting, the CPU version
of GUESS may be more efficient. Hence, to ensure both an optimal use of
the algorithm, and to enable running GUESS on non-CULA
compatible systems, the call to GPU-based calculations within
GUESS can easily be switched off.
GUESS code has been wrapped into an R
package called
R2GUESS
which contains R
scripts that are necessary to monitor the MCMC run and visualize the main ouptuts from the
run. The beta version of the package is available here together with its documentation. The stable version of
R2GUESS
will be soon released on CRAN. As GUESS, it
relies on the GNU Scientific
(GSL) and CULA-dense libraries. If
not already installed please see the instructions below. To install
R2GUESS
package, simply type:
R CMD INSTALL R2GUESS_0.9.tar.gz
in the directory where the archive has been saved.
Extensive documentation detailing the implementation of GUESS as well as all its features and options is available here.
GUESS_v1.1.tgz
archive file
tar xzvf GUESS_v1.1.tgz
cd GUESS_v1.1/Main
makefile
and set
CUDA=0
make
The makefile
provided in the package may be edited if the
paths to the GSL library files are not set to
/opt/local/*
.
For users willing to use the CPU version of GUESS only,
replacing the GSL-BLAS with a multi-threaded version (e.g. OpenBLAS) may be recommended. The
makefile
should be modified accordingly.
Example
folder by typing
cd ../Example
./GUESS_example.sh
-cuda
option from the shell file.-cuda
option will be ignored.
----------------- | ||
Source Code: | GUESS.cc |
- Main program calling functions and objects defined in |
(Main directory) |
Routines and Classes folders respectively
|
|
makefile |
- File specifying the compilation options | |
For non-CULA users , set CUDA=0 in this file
|
||
----------------- | ||
Input Files: | X_example.txt |
- The predictor matrix containing 770 SNPs (in columns) |
(Example/Input directory) |
in 29 individuals (in rows). File header is the matrix size. | |
Y_example.txt |
- The response matrix containing 7 measures (in columns) | |
in 29 individuals (in rows). File header is the matrix size. | ||
Par_file_example.xml |
- XML -formatted file defining the parameters of
the run
| |
Full listing of the available options implemented in GUESS | ||
can be found in the documentation (Table 1). | ||
Init_example.txt |
- TXT file specifying the variables to include in
the first model.
| |
If undefined, the initial guess of the MCMC algorithm | ||
will be derived from a step-wise regression model. | ||
----------------- | ||
Output Files: | *_output_best_visited_models.txt |
- This file decribes the best models visited along the run |
(Example/Output directory) |
ranked according the their posterior probability (MPP). | |
MPP calculations include the null and all univariate models, | ||
even if these have not been visited during the MCMC run. | ||
*_output_marg_prob_incl.txt |
- Displays the marginal probabilities of inclusion (MPPI) for each | |
SNP in the predictor matrix. MPPIs can be viewed as the posterior | ||
strength of association between a single SNP and a group of phenotypes. | ||
*_output_*_history.txt |
- The output of these files is enabled by
-history command line option.
| |
They summarise the evolution along the run of key features of all | ||
EMC moves, of the selection coefficient g, of the temperature of each | ||
chain (during burn-in), and of the model size and marginal probability. | ||
These files are needed to assess the behaviour/convergence of the model. | ||
----------------- | ||
R2GUESS
package.