GUESS is a computationally optimised C^{++} implementation of a fully Bayesian variable selection approach that can analyse, in a genomewide context, single and multiple responses in an integrated way. The program uses packages from the GNU Scientific Library (GSL) and offers the possibility to reroute computationally intensive linear algebra operations towards the Graphical Processing Unit (GPU) through the use of proprietary CULAdense library.
The multiSNP model of GUESS typically seeks for the best
combinations of SNPs to predict the (possibly multivariate) outcome of interest.
In its current implementation, using its GPU capacities,
GUESS is able to handle hundreds of thousands of
predictors, which enables genomewide sized datasets to be
analysed. However, the use of GPUbased numerical libraries implies
extensive data transfer between the memory/CPU
and the GPU, which, in turn, can be computationally expensive. As a
consequence, for smaller datasets (as the example provided in the
package) for which the matrix operations are not ratelimiting, the CPU version
of GUESS may be more efficient. Hence, to ensure both an optimal use of
the algorithm, and to enable running GUESS on nonCULA
compatible systems, the call to GPUbased calculations within
GUESS can easily be switched off.
An extensive documentation detailing the implementation of GUESS as well as all its features and options is available here.
GUESS_v1.0.tgz
archive file
tar xzvf GUESS_v1.0.tgz
cd GUESS_v1.0/Main
makefile
and set
CUDA=0
make
The makefile
provided in the package may be edited if the
paths to the GSL library files are not set to
/opt/local/*
.
For users willing to use the CPU version of GUESS only,
replacing the GSLBLAS with a multithreaded version (e.g. OpenBLAS) may be recommended. The
makefile
should be modified accordingly.
Example
folder by typing
cd ../Example
./GUESS_example.sh
cuda
option from the shell file.cuda
option will be ignored.
  
Source Code:  GUESS.cc 
 Main program calling functions and objects defined in 
(Main directory) 
Routines and Classes folders respectively


makefile 
 File specifying the compilation options  
For nonCULA users , set CUDA=0 in this file


  
Input Files:  X_example.txt 
 The predictor matrix containing 770 SNPs (in columns) 
(Example/Input directory) 
in 29 individuals (in rows). File header is the matrix size.  
Y_example.txt 
 The response matrix containing 7 measures (in columns)  
in 29 individuals (in rows). File header is the matrix size.  
Par_file_example.xml 
 XML formatted file defining the parameters of
the run
 
Full listing of the available options implemented in GUESS  
can be found in the documentation (Table 1).  
Init_example.txt 
 TXT file specifying the variables to include in
the first model.
 
If undefined, the initial guess of the MCMC algorithm  
will be derived from a stepwise regression model.  
  
Output Files:  *_output_best_visited_models.txt 
 This file decribes the best models visited along the run 
(Example/Output directory) 
ranked according the their posterior probability (MPP).  
MPP calculations include the null and all univariate models,  
even if these have not been visited during the MCMC run.  
*_output_marg_prob_incl.txt 
 Displays the marginal probabilities of inclusion (MPPI) for each  
SNP in the predictor matrix. MPPIs can be viewed as the posterior  
strength of association between a single SNP and a group of phenotypes.  
*_output_*_history.txt 
 The output of these files is enabled by
history command line option.
 
They summarise the evolution along the run of key features of all  
EMC moves, of the selection coefficient g, of the temperature of each  
chain (during burnin), and of the model size and marginal probability.  
These files are needed to assess the behaviour/convergence of the model.  
  
Scripts automating the postprocessing of GUESS output files are available upon request to Dr Leonardo Bottolo or Dr Marc ChadeauHyam. These comprise:
R
scripts converting outputs of GUESS
into R
compatible objectsR
scripts plotting a typical set of traces to
monitor the convergence of the runR
scripts generating Manhattantype plots highlighting
the candidate SNPs found by GUESSMatlab
codes to calculate empirical FDR from
GUESS output (this also requires to run GUESS on
several permuted datasets)Matlab
codes to calculate Ratios of Bayes Factorsbeta
version, and will
be included in the package shortly.