INSTALLATION
The package distributed consists of:
      main source file        gmx.f
      auxiliary routines in
            fortran and C     algama.f dgauss4.f pnorm.f dgamma.f xflush.f sd.c
      makefile                Makefile_gmx
      example data files      enz.dat galx.dat lnacid.dat (used in Richardson and Green)
					AACRresiduals.dat (used in Broët et al)
      this file               readme.txt
All files are text files, so can be run under Unix or Windows (subject to correct format for line endings).
---------------------------------------------
RUNNING
Gmix is a Fortran program for Unix-like systems, implementing 
the methodology for univariate normal mixture
analysis described in Richardson and Green 
(J. R. Statist. Soc. B, 1997, 59, 731-792; see also the
correction in J. R. Statist. Soc. B, 1998, 60, 661).
These notes assume familiarity with this paper. 
The model described in the above papers has been extended
 so that one component has fixed mean and possible fixed 
variance. Instead of one parameter k for the number of components,
 there are now two parameters kleft and kright for the numbers of
 components to the left and right respectively of the fixed mean
 component. Either of these parameters can be zero.
The program reads a single main data file, and produces 
multiple output text files summarising the posterior
distribution of the model, for subsequent analysis
and display (R or Splus are good choices for these
tasks). (Only very limited summary information
is computed by this program itself.)
The run of the program is controlled by options, switches
and parameter settings that have sensible default values,
and can be set by command line arguments in Unix shell
style. 
For example, the run on the Breast Cancer dataset in Broët et al,
with run-length of 100000, burn-in of 100000, and minimal output
options could be replicated by:
gmix -range -empty -dfix1 -fixl -kil0 -kpois2 -nb100000 -n100000 AACRresiduals.dat
The principal options are as follows, listed in
several main groups. In this list, N denotes
a numerical value passed in as a parameter of the model
or sampler, other options are logical settings
that are by default off, turned on by specifying the
option.
1. Model options
      -prior      disregard the likelihood and the data (except possibly
                  in setting prior settings, etc.), thus computing the
                  prior instead of the posterior.
      -fixl       suppress dimension-changing moves, thus fixing the
      -fixr       number of mixture components (to the value set by the 
                  -kil and -kir options below).
  
2. Parameter settings
      -range      use range-based hyperparameters (default: don't use them)
      -kpoisN     set kleft,kright poisson(N) (default Uniform, default for N=2)
      -dN         set delta (hyperparameter for weights) (default 1)
      -spN        set s (spacing parameter in prior for means) (default 1)
                  (see rejoinder to discussion of Richardson and Green, page 786)
      -aN         set alpha (hyperparameter for variances) (default 2)
      -bN         set beta (hyperparameter for variances) (default Gamma(g,h))
      -gN         set g (hyperparameter for beta) (default 0.2)
      -hN         set h (hyperparameter for beta) (default 10)
                  The upper limit kmax of the number of components
                  is hard-wired equal to 15 for both kleft and kright in the program 			- it can be changed by editing line 1 of gmx.f (where they are 				termed ncmaxleft,ncmaxright)
3. Monte Carlo sampler options
      -nN         number of sweeps (default 10000)
      -nbN        length of burn-in (default 0)
      -kilN       initial number of components (default 1) similar for right
      -empty      use empty-component birth-death moves
      -seedN      random number seed (default 0, meaning use clock time to 
                  initialise). In the log file for the run, the seed
                  actually used is printed, and the run can be repeated
                  with the same random numbers by specifying that value
                  in this option on the subsequent run.
4. Output options
      -pLETTERS   additional output files, if LETTERS includes
                        d:    .avn .den .dev
                        c:    .pcl .scl
                        w:    .wms.*
                        a:    .z.*
                  (default: none of these)
                  See below for contents of these files.
      -nsN        sets nsamp: number of (equi-spaced) Monte Carlo samples 
                  dumped in .out file (default 100)
      -nspN       sets nspace: spacing in sweeps between successive records 
                  in trace files .ent, .bk, .dev, .wms*, .z* (default 20)
      -debug      turn on verbose output
Input and output file names
The input data should be in a file with extension .dat, 
with the sample size n on the first line, and the data
values y_1, y_2, ..., y_n following in free format, with arbitrary
spaces and newlines.
All output files are given names of the
form N.ext, where N is a unique integer assigned by the program
to label successive runs (it chooses the smallest positive integer
such that N.log does not already exist), and .ext signifies the
type of output the file contains.
The extensions and their meanings are:
(a) normal output:
      .log        logs information about run, parameter settings, 
                  acceptance rates, etc.
      .kl         posterior for kleft, Bayes factors, prior on kleft,
                  posterior for kposleft, empty-component statistics
                  and, optionally, average deviances, etc.  (.kr similar for right)      
      .pe         parameter estimates: posterior means for weights, means 
                  and variances for each component, conditional on each
                  pair of values of kleft,kright
      .out   [*]  sample of output states: k, kpos,
                  weight, mean, variance and number of observations
                  assigned, for each component
      .bdlogl [*] changes to kleft or kposleft: number of sweep (file not 
                  produced when -fix option set) .bdlogr same for right
      .ent [nsp]  k, kpos, entropy
(b) additionally, if beta or kappa are random:
      .bk  [nsp]  beta, kappa, k, kpos
(c) optionally, if -debug specified
      .db         verbose records of all moves of the sampler
(d) optionally, as determined by -p... options (see above)
      .avn        for each (kleft,kright), number of visits to (kleft,kright), average numbers 
                  of observations assigned to each component
      .den        posterior expected density, on a grid of 200 equi-spaced
                  y values (spanning the range of the data, expanded by
                  5% in each direction): output is grid, density, conditional
                  on k=1,2,...,6, and unconditionally
      .dev  [nsp] k, kpos, deviance, modified deviance
      .pcl        predictive classification: for each (kleft,kright), for each j=1,2,..,k-1,
                  for each y* on grid (see .den above), posterior probability 
                  that z*=j
      .scl        within-sample classification: for each (kleft,kright), for each j=1,2,..,k-1,
                  for each i = 1,2,...,n, posterior probability that z(i)=j
      .wms.K [nsp]in a separate file for each k, current values for weights,
                  means and variances of each component j=1,2,...,k (thus each
                  sample record is of k consecutive lines)
      .z.K  [nsp] in a separate file for each k, current values for allocations
                  z(i),i=1,...,n
key:  kpos = number of non-empty components
      [nsp] - trace files with output dumped every nspace sweeps (see
                  -nsp option above)
      [*]   - other trace files
      other output files contain information aggregated over the run.
Gmix is free of charge for educational and non-commercial research purposes.