Bayesian Integrative Genomics

Phase I: Funded by the BBSRC Exploiting Genomics initiative, from May 2002 to February 2007.

Phase II: A collaborative programme started in 2008, coordinated by Prof Sylvia Richardson.

GUESS: Graphical processing Unit Evolutionary Stochastic Search

Overview

GUESS is a computationally optimised C⁺⁺ implementation of a fully Bayesian variable selection approach that can analyse, in a genome-wide context, single and multiple responses in an integrated way. The program uses packages from the GNU Scientific Library (GSL) and offers the possibility to re-route computationally intensive linear algebra operations towards the Graphical Processing Unit (GPU) through the use of proprietary CULA-dense library.

The multi-SNP model of GUESS typically seeks for the best combinations of SNPs to predict the (possibly multivariate) outcome of interest.
In its current implementation, using its GPU capacities, GUESS is able to handle hundreds of thousands of predictors, which enables genome-wide sized datasets to be analysed. However, the use of GPU-based numerical libraries implies extensive data transfer between the memory/CPU and the GPU, which, in turn, can be computationally expensive. As a consequence, for smaller datasets (as the example provided in the package) for which the matrix operations are not rate-limiting, the CPU version of GUESS may be more efficient. Hence, to ensure both an optimal use of the algorithm, and to enable running GUESS on non-CULA compatible systems, the call to GPU-based calculations within GUESS can easily be switched off.

GUESS code has been wrapped into an R package called R2GUESS which contains R scripts that are necessary to monitor the MCMC run and visualize the main ouptuts from the run. The beta version of the package is available here together with its documentation. The stable version of R2GUESS will be soon released on CRAN. As GUESS, it relies on the GNU Scientific (GSL) and CULA-dense libraries. If not already installed please see the instructions below. To install R2GUESS package, simply type:

R CMD INSTALL R2GUESS_0.9.tar.gz

in the directory where the archive has been saved.

Extensive documentation detailing the implementation of GUESS as well as all its features and options is available here.

Installation Guide

If not already installed, install the GNU Scientific Library (GSL). The installation procedure is well documented on the GSL website.
For users willing to use GPU features of GUESS
- Check the CUDA compatibility of your system's graphics card
- Install the NVIDIA-CUDA drivers compatible with your graphics card
- Purchase and install the CULA-dense library
Download the GUESS_v1.1.tgz archive file
Uncompress the file by typing
```
tar xzvf GUESS_v1.1.tgz
```
Move to the main directory by typing
```
cd GUESS_v1.1/Main
```
Create the executable file
1. For non-CULA users only : open the makefile and set CUDA=0
2. Compile the code by typing:
```
make
```
  The makefile provided in the package may be edited if the paths to the GSL library files are not set to /opt/local/*.
  For users willing to use the CPU version of GUESS only, replacing the GSL-BLAS with a multi-threaded version (e.g. OpenBLAS) may be recommended. The makefile should be modified accordingly.
Move to the Example folder by typing
```
cd ../Example
```
To check the installation and generate the example output files run GUESS by typing
```
./GUESS_example.sh
```
- For CULA users: the GPU version of GUESS will be slower than its CPU alternative due to the small size of the example dataset. To disable the GPU-feature of GUESS simply remove the -cuda option from the shell file.
- For non-CULA users: The -cuda option will be ignored.

I/O-Files Description

-----------------
Source Code:	`GUESS.cc`	- Main program calling functions and objects defined in
`(Main` directory)		`Routines` and `Classes` folders respectively
	`makefile`	- File specifying the compilation options
		For non-CULA users , set `CUDA=0` in this file
-----------------
Input Files:	`X_example.txt`	- The predictor matrix containing 770 SNPs (in columns)
`(Example/Input` directory)		in 29 individuals (in rows). File header is the matrix size.
	`Y_example.txt`	- The response matrix containing 7 measures (in columns)
		in 29 individuals (in rows). File header is the matrix size.
	`Par_file_example.xml`	- `XML`-formatted file defining the parameters of the run
		Full listing of the available options implemented in GUESS
		can be found in the documentation (Table 1).
	`Init_example.txt`	- `TXT` file specifying the variables to include in the first model.
		If undefined, the initial guess of the MCMC algorithm
		will be derived from a step-wise regression model.
-----------------
Output Files:	`*_output_best_visited_models.txt`	- This file decribes the best models visited along the run
`(Example/Output` directory)		ranked according the their posterior probability (MPP).
		MPP calculations include the null and all univariate models,
		even if these have not been visited during the MCMC run.
	`*_output_marg_prob_incl.txt`	- Displays the marginal probabilities of inclusion (MPPI) for each
		SNP in the predictor matrix. MPPIs can be viewed as the posterior
		strength of association between a single SNP and a group of phenotypes.
	`_output__history.txt`	- The output of these files is enabled by `-history` command line option.
		They summarise the evolution along the run of key features of all
		EMC moves, of the selection coefficient g, of the temperature of each
		chain (during burn-in), and of the model size and marginal probability.
		These files are needed to assess the behaviour/convergence of the model.
-----------------

Additional resources easing the postprocessing of GUESS run are now embedded in the R2GUESS package.

Bayesian Integrative Genomics

Phase I: Funded by the BBSRC Exploiting Genomics initiative, from May 2002 to February 2007.

Phase II: A collaborative programme started in 2008, coordinated by Prof Sylvia Richardson.

GUESS: Graphical processing Unit Evolutionary Stochastic Search

Overview

Installation Guide

I/O-Files Description

Additional resources easing the postprocessing of GUESS run are now embedded in the R2GUESS package.

Additional resources easing the postprocessing of GUESS run are now embedded in the `R2GUESS` package.