bgx logo

Bayesian Integrative Genomics

Phase I: Funded by the BBSRC Exploiting Genomics initiative, from May 2002 to February 2007.

Phase II: A collaborative programme started in 2008, coordinated by Prof Sylvia Richardson.



GUESS: Graphical processing Unit Evolutionary Stochastic Search



Overview


GUESS is a computationally optimised C++ implementation of a fully Bayesian variable selection approach that can analyse, in a genome-wide context, single and multiple responses in an integrated way. The program uses packages from the GNU Scientific Library (GSL) and offers the possibility to re-route computationally intensive linear algebra operations towards the Graphical Processing Unit (GPU) through the use of proprietary CULA-dense library.

The multi-SNP model of GUESS typically seeks for the best combinations of SNPs to predict the (possibly multivariate) outcome of interest.
In its current implementation, using its GPU capacities, GUESS is able to handle hundreds of thousands of predictors, which enables genome-wide sized datasets to be analysed. However, the use of GPU-based numerical libraries implies extensive data transfer between the memory/CPU and the GPU, which, in turn, can be computationally expensive. As a consequence, for smaller datasets (as the example provided in the package) for which the matrix operations are not rate-limiting, the CPU version of GUESS may be more efficient. Hence, to ensure both an optimal use of the algorithm, and to enable running GUESS on non-CULA compatible systems, the call to GPU-based calculations within GUESS can easily be switched off.

GUESS code has been wrapped into an R package called R2GUESS which contains R scripts that are necessary to monitor the MCMC run and visualize the main ouptuts from the run. The beta version of the package is available here together with its documentation. The stable version of R2GUESS will be soon released on CRAN. As GUESS, it relies on the GNU Scientific (GSL) and CULA-dense libraries. If not already installed please see the instructions below. To install R2GUESS package, simply type:

R CMD INSTALL R2GUESS_0.9.tar.gz
in the directory where the archive has been saved.

Extensive documentation detailing the implementation of GUESS as well as all its features and options is available here.



Installation Guide


  1. If not already installed, install the GNU Scientific Library (GSL). The installation procedure is well documented on the GSL website.
  2. For users willing to use GPU features of GUESS
  3. Download the GUESS_v1.1.tgz archive file
  4. Uncompress the file by typing
    tar xzvf GUESS_v1.1.tgz
  5. Move to the main directory by typing
    cd GUESS_v1.1/Main
  6. Create the executable file
    1. For non-CULA users only : open the makefile and set CUDA=0
    2. Compile the code by typing:
      make
      The makefile provided in the package may be edited if the paths to the GSL library files are not set to /opt/local/*.

      For users willing to use the CPU version of GUESS only, replacing the GSL-BLAS with a multi-threaded version (e.g. OpenBLAS) may be recommended. The makefile should be modified accordingly.

  7. Move to the Example folder by typing
    cd ../Example
    
  8. To check the installation and generate the example output files run GUESS by typing
    ./GUESS_example.sh
    



I/O-Files Description


-----------------
Source Code: GUESS.cc - Main program calling functions and objects defined in
(Main directory)   Routines and Classes folders respectively
makefile - File specifying the compilation options
  For non-CULA users , set CUDA=0 in this file
-----------------
Input Files: X_example.txt - The predictor matrix containing 770 SNPs (in columns)
(Example/Input directory)   in 29 individuals (in rows). File header is the matrix size.
Y_example.txt - The response matrix containing 7 measures (in columns)
  in 29 individuals (in rows). File header is the matrix size.
Par_file_example.xml - XML-formatted file defining the parameters of the run
  Full listing of the available options implemented in GUESS
 can be found in the documentation (Table 1).
Init_example.txt - TXT file specifying the variables to include in the first model.
  If undefined, the initial guess of the MCMC algorithm
  will be derived from a step-wise regression model.
-----------------
Output Files: *_output_best_visited_models.txt - This file decribes the best models visited along the run
(Example/Output directory)   ranked according the their posterior probability (MPP).
  MPP calculations include the null and all univariate models,
  even if these have not been visited during the MCMC run.
*_output_marg_prob_incl.txt - Displays the marginal probabilities of inclusion (MPPI) for each
  SNP in the predictor matrix. MPPIs can be viewed as the posterior
  strength of association between a single SNP and a group of phenotypes.
*_output_*_history.txt - The output of these files is enabled by -history command line option.
  They summarise the evolution along the run of key features of all
  EMC moves, of the selection coefficient g, of the temperature of each
  chain (during burn-in), and of the model size and marginal probability.
  These files are needed to assess the behaviour/convergence of the model.
-----------------



Additional resources easing the postprocessing of GUESS run are now embedded in the R2GUESS package.