FAQ-3.2

From MbWiki

Jump to: navigation, search

Contents

Practical Issues

How do I cite the program?

If you want to cite the program, we suggest you use the two papers published in Bioinformatics (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003). If you are using the MPI version of the program, you may also want to cite Altekar et al. (2004).

How do I run MrBayes in batch mode?

When you become more familiar with MrBayes, you will undoubtedly want to run it in batch mode instead of typing all commands at the prompt. This is done by adding a MRBAYES block to a Nexus file, either the same file containing the DATA block or a separate Nexus file. The MRBAYES block simply contains the commands as you would have given them from the command line, with the difference that each command line is ended with a semi-colon. For instance, a MRBAYES block that performs three single-run analyses of the data setprimates.nex under the GTR + Γ model and stores each result in a separate file is given below:

begin mrbayes;
   set autoclose=yes nowarn=yes;
   execute primates.nex;
   lset nst=6 rates=gamma;
   mcmc nruns=1 ngen=10000 samplefreq=10 file=primates.nex1;
   mcmc file=primates.nex2;
   mcmc file=primates.nex3;
end;

Since this file contains the “execute” command, it must be in a file separate from the primates.nex file. You start the analysis simply by typing execute <filename>, where filename is the name of the file containing the MRBAYES block. The set command is needed to change the behavior of MrBayes such that it is appropriate for batch mode. When autoclose = yes, MrBayes will finish the MCMC analysis without asking you whether you want to add more generations. When nowarn = yes, MrBayes will overwrite existing files without warning you, so make sure that your batch file does not inadvertently cause the deletion of previous result files that should be saved for future reference.

The UNIX version of MrBayes can execute batch files in the background from the command prompt. Just type mb <file> > log.txt & at the UNIX prompt, where <file> is the name of your Nexus batch file, to have MrBayes run in the background, logging its output to the file log.txt. If you want MrBayes to process more than one file, just list the files one after the other with space between them, before the output redirection sign (>). When MrBayes is run in this way, it will quit automatically when it has processed all files; it will also terminate with an error signal if it encounters an error.

Alternatively, the UNIX version of MrBayes can also be run in batch mode using input redirection. For that you need a text file containing the commands exactly as you would have typed them from the command line. For instance, assume that your data set is in primates.nex and that you want to perform the same analyses specified above. Then type mb < batch.txt > log.txt & with the batch.txt file containing this text:

set autoclose=yes nowarn=yes
execute primates.nex
lset nst=6 rates=gamma
mcmc ngen=10000 savebrlens=yes file=primates.nex1
mcmc file=primates.nex2
mcmc file=primates.nex3
quit

The quit command forces MrBayes to terminate. With previous versions of MrBayes we have had problems with infinite loops when the quit command is not included at the end of the file. This problem has been solved in version 3.1, however we advise you to still use the quit command.

What do I do when it is difficult to get convergence

There are several things you can do to improve the efficiency of your analysis. The simplest is to just increase the length of the run. However, the computational cost of doing so may be prohibitive. A better way is then to try improving the mixing behavior of the chain. First, examine the acceptance rates of the proposal mechanisms used in your analysis (output at the end of the run). The Metropolis proposals used by MrBayes work best when their acceptance rate is neither too low nor too high. A rough guide is to try to get them within the range of 10 % to 70 %. Rates outside this range are not necessarily a big problem but they typically mean that the analysis is inefficient. If the rate is too high, you can make the proposal bolder by changing tuning parameters (see Appendix) using the props command. Be warned, however, that changing tuning parameters of proposals and proposal probabilities may destroy any hope of getting convergence. For instance, you need at least one move changing each parameter in your model.

The next step is to examine the heating parameters if you are using Metropolis-coupled MCMC. If acceptance rates for the swaps between adjacent chains (the values close to the diagonal in the swap statistics matrix) are low, then it might be a good idea to decrease the temperature to make the cold and heated chains more similar to each other so that they can change states more easily. The efficiency of the Metropolis coupling can also be improved by increasing the number of parallel chain. A good way of improving convergence is to start the analysis from a good tree instead of starting it from a randomly chosen tree. First define a good tree, with or without branch lengths and put this tree in a separate NEXUS tree block. Then use this tree as the starting value for the tree parameter (tau) and the branch length parameter (V) using startvals tau=treename V=treename, with treename the name you used in the tree block. For example


begin trees;
  tree mystarttree = (...newick tree..);
end trees;
begin mrbayes;
   startvals 
      tau = mystarttree
      V = mystarttree
   ;
end;

A disadvantage with starting the analysis from a good tree is that it is more difficult to detect problems with convergence using independent runs. A compromise is to start each chain from a slightly perturbed version of a good tree. MrBayes can introduce random perturbations of a starting tree; this is requested using mcmcp nperts=<integer_value>. You can also define a separate tree for each run and chain,

 begin trees;
    tree mystarttree11 = (...);
    tree mystarttree12 = (...);
    ...
 end trees;
 begin mrbayes;
     startvals 
          tau(1,1) = mystarttree11
          V(1,1)   = mystarttree11
          tau(1,2) = mystarttree12
          V(1,2)   = mystarttree12
     ...
     ;
 end;

How many data partitions can I have in MrBayes?

MrBayes 3 allows 150 partitions. If you need more partitions, simply change the variable MAX_NUM_DIVS in the source file mb.h and recompile the program.

Does MrBayes run faster on a dual-processor or dual-core machine?

No, MrBayes 3.2 is not multithreaded so it will not take advantage of more than one processor on a single machine. However, you should be able to run two copies of MrBayes without noticeable decrease in performance on a dual-processor machine (provided you have enough RAM for both analyses). You could also install the lam-mpi package and run the parallel version as decribed in this question.

How much memory is required?

You can calculate the amount of memory needed to store the conditional likelihoods for an analysis roughly as 2 * (# taxa) * (# states in the Q matrix) * (# gamma categories) * 4 bytes (for the single-precision float version of the code; double the memory requirement for the double-precision code). The program will need slightly more memory for various book-keeping purposes but the bulk of the memory required for an analysis is typically occupied by the conditional likelihoods.

Problems Running MrBayes

Warnings

  • I get the warning In LIKE_EPSILON - for standard division 0 char 130 has like = 0.00000000000000000000000000000"

(from the mailing list) Most probably it is the combination of model settings / proposal parameters that causes the problem, though I'd blame covarion=yes rather than aamodelpr in the first place. I've seen similar errors with codon models. My impression is that with certain complicated models one of the parameters sometimes goes out of range. Setting a simpler model often rectifies the situation. In your case covarion=yes adds most of complexity.

George

This could be a bug in MrBayes, so please, feel free to report it on the mrbayes bug list. However, we can not fix a bug when we can not replicate it. So, we kindly request you to send a (minimal) data file along with your bug report.

The likelihood values first increase and then drop. What is the problem?

Several users have observed that likelihood values can sometimes increase in the early phase of a run and then decrease to a stable value; one user referred to the phenomenon as “burn-out”. Actually, this type of behavior can be seen with certain types of data sets and models and is part of the normal burn-in. However, it does indicate a problem with the model. Typically, the problem is due to over-parameterization, a poor prior, or a combination of these factors.

In MrBayes, the starting value for most parameters is an arbitrarily chosen value that is likely to be close to the maximum likelihood estimate (MLE) of the parameter. The MLE value typically also corresponds to the mode (peak) in the posterior probability distribution. In most cases, you expect the bulk of the probability mass in the posterior probability distribution to be in the region close to the MLE. However, it is possible that there is a region in parameter space with only moderate height (lower likelihood values) but considerably larger probability mass than the MLE region. It is like comparing the mass of a tower to the mass of a huge office complex. Even if the tower is considerably higher, its mass is going to be only a fraction of the mass of the office complex. A typical situation in which this can occur is if you: (1) use a uniform prior on branch lengths, which puts considerable prior probability on long branches; (2) have data that are relatively uninformative about branch lengths; and (3) have a model, such as the gamma model, with a low but not insignificant probability associated with long branches for weak data. In such cases, the MLE region at short branch lengths can have considerably smaller probability mass than the less likely but much larger region at longer branch lengths.

Unless you feed MrBayes with your own starting tree, the run will start with all branch lengths set to 0.1. This is close to the MLE region, and in the early phase of the run you will see the likelihood values climb as the topology is improved by branch rearrangements while the branch lengths remain small. Eventually, however, the long branch length region will attract the chain through its high probability mass and you will see the branch lengths increase and the likelihood values decrease to a stable region. There are basically two ways of fixing the “burn-out” problem. One is to change your priors so that they put more probability in the MLE region. An obvious step is to change a uniform prior on branch lengths to an exponential prior; as explained above in the section on branch length priors, an exponential prior is more uninformative than the uniform prior anyway. The other possibility is to simplify your model. For instance, assume equal rates over sites instead of a gamma model, or choose a substitution model with fewer free parameters.

Scientific Issues

How are gaps and missing characters treated?

MrBayes uses the same method as most maximum likelihood programs: it treats gaps and missing characters as missing data. Thus, gaps and missing characters will not contribute any phylogenetic information. There is no way in which you can treat gaps as a fifth state in MrBayes (but see below for information on how you can use gap information in your analysis).

How do I use gap information in my analysis?

Often, insertion and deletion events contain phylogenetically useful information. Although MrBayes 3 is not able to do statistical multiple sequence alignment, treating the insertion-deletion process under a realistic stochastic model, there is nevertheless a way of using some of the information in the indel events in your MrBayes analysis: Code the indel events as binary characters (presence/absence of the gap) and include them as a separate binary (restriction) data partition in your analysis. See more information on this possibility in the section on the binary model in this manual.

How do I specify the Mk model (Markov k model; Lewis, 2001) for my morphological data?

Short answer: Any data of type Standard (and Restriction) is automatically assigned the Mk model when data is executed in MrBayes.

Things to consider

A couple of parameters are applicable to a data partition with the Mk model, and the use of those are potentially important. Below are examples for data type Standard. See Help Lset and the MrBayes manual for more info on data type Restriction.

 Lset Coding = <All/Variable>;

The Coding parameter specifies how characters were sampled. The options all and variable apply to data type Standard. As an example, morphological data are rarely sampled in such way that we observe data columns with the same character state for all taxa (e.g., "all 0", "all present", etc). If constant characters were sampled, the option all should be used. If, on the other hand, constant characters are absent from the matrix, the option variable should be used instead. Note that all is the default in MrBayes. The theoretically effect of not using variable is that branch lengths tend to be over estimated (Lewis, 2001).

 Lset Rates = <Equal/Gamma/Propinv/Invgamma>;

The Rates parameter specifies how rates are distributed across the characters in a data matrix. Applicable to Standard data and the Mk model is Equal, Gamma, and sometimes also Propinv, and Invgamma. Typically, neighboring "sites" in a morphological matrix are not correlated in the same way as sites in a codon, or sites belonging to stems and loops in a secondary structure. Therefore, the use of the option adgamma has no direct relevance to morphological data. Note that the use of Propinv, and Invgamma only is applicable if the data matrix is of coding type All (matrix contains constant sites, see above).

 Ctype <ordering>:<characters>;

The Ctype parameter specifies the ordering of the characters. By default, all characters are of character type Unordered. Characters can also be changed to character type Ordered, or Irreversible.

Example

 #NEXUS
 Begin Data;
   Dimensions Ntax=4 Nchar=4;
   Format Datatype=Standard;
   Matrix
    Apa 0021
    Bpa 0121
    Cpa 0110
    Dpa 0100
   ;
 End;
 Begin MrBayes;
   Lset Rates=Gamma Coding=All;
   Ctype Ordered:3;
 End;

See also the section Standard Discrete (Morphology) Model for help on the Format, Ctype, Lset, and Prset commands.

How can I test models using Bayes factors?

The Bayesian approach provides a convenient way of comparing models through the calculation of Bayes factors, which can be interpreted as indicators of the strength of the evidence in favor of the best of two models. The Bayes factor values are typically interpreted according to recommendations developed by Kass and Raftery (1995).

Unlike a hierarchical likelihood ratio test, the models compared with Bayes factors need not be hierarchically nested. A Bayes factor is calculated simply as the ratio of the marginal likelihoods of the two models being compared. The logarithm of the Bayes factor is the difference in the logarithms of the marginal model likelihoods.

The marginal likelihood of a model is difficult to estimate accurately but a rough estimate may be obtained easily as the harmonic mean of the likelihood values of the MCMC samples (Newton and Raftery, 1994). MrBayes calculates this estimator when you summarize your samples with the command sump. In the output from the sump command, you will find the following table (it might look a little different depending on how many simultaneous runs you have performed; this table is for two runs):

    Estimated marginal likelihoods for runs sampled in files
     "replicase.nex.run1.p", "replicase.nex.run2.p", etc:
     (Use the harmonic mean for Bayes factor comparisons of models)
  
  Run   Arithmetic mean   Harmonic mean
  --------------------------------------
    1       -5883.41          -5892.10
    2       -5883.82          -5892.81
  --------------------------------------
  TOTAL     -5883.60          -5892.52
  --------------------------------------

For instance, assume we want to compare a GTR model with an HKY model. Then simply run two separate analyses, one under each model, and estimate the logarithm of the marginal likelihoods for the two models (using only samples from the stationary phase of the runs). Then simply take the difference between the logarithms of the harmonic means and find the corresponding interpretation in the table of Kass and Raftery (1995; to use this table, you actually have to calculate twice the difference in the logarithm of the model likelihoods). The same approach can be used to compare any pair of models you are interested in. For instance, one model might have a group constrained to be monophyletic while the other is unconstrained, or one model can have gamma-shaped rate variation while the other assumes equal rates across sites. As stated above, models need not be hierarchically nested. An interesting property of the Bayes factor comparisons is that it can favor either the more complex model or the simpler model, so they need not be corrected for the number of parameters in the models being compared. Additional discussion of Bayesian model testing, with several examples, is found in Nylander et al. (2004).

Can I do model-jumping in MrBayes?

Bayesian MCMC model jumping provides a convenient alternative to model selection prior to the analysis. In model jumping, the MCMC sampler explores different models and weights the results according to the posterior probability of each model. The only model jumping implemented in MrBayes 3 is the estimation of fixed-rate amino-acid substitution models (see the section on those models in this manual). General model jumping across models of different dimensionality will be implemented in version 4 of MrBayes.

ModelTest suggest a model for my data. How do I implement it in MrBayes

A model selection procedure, such as that implemented in ModelTest and MrModelTest, often suggests a quite specific model for your analysis, including estimates of all parameters. This suggestion is often based on several simplifications; for instance, you might have fixed the topology when comparing models and you might have used a small set of the possible models. In the Bayesian approach, there is only a moderate computational penalty associated with estimating parameters rather than fixing them prior to analysis. The Bayesian approach is typically also good at handling multi-parameter models. Therefore, we recommend that you take the general type of model suggested by your model selection method and then estimate all of the parameters in that model in MrBayes. If the suggested model is not implemented in MrBayes, use the next more complex model available in the program. If, for some reason, you feel that you really need to fix model parameters in MrBayes to specific values, you can do that using prset <parameter_prior_name> = fixed (<value or comma-separated values>). For instance, if you want to fix the shape parameter of the gamma model to 0.12, use prset shapepr=fixed(0.05).

How do I fix the tree topology during an analysis?

In principle, one can fix a tree topology by specifying constraints for all of the nodes in the tree. However, we do not recommend doing this because it is computationally very inefficient. A better way is to set the proposal probability of all topology moves to 0 using the propset command. Then you need to switch on one proposal that changes branch lengths but not topology by increasing its proposal probability from 0 to some reasonable positive value (like 5). The “node slider” is, in our experience, the best of these proposals.

For example

 MrBayes> showmoves
     ...
     5 -- Move        = eTBR(Tau)
          Type        = Extending TBR
          Parameters  = Tau [param. 5] (Topology)
                        V [param. 6] (Branch lengths)
          Tuningparam = p_ext (Extension probability)
                        lambda (Multiplier tuning parameter)
                p_ext = 0.800
               lambda = 0.940
          Rel. prob.  = 15.0
 MrBayes> propset eTBR(Tau)$prob=0
 MrBayes> showmoves allavailable=yes
     ...
     7 -- Move        = Nslider(V)
          Type        = Node slider (uniform on possible positions)
          Parameter   = V [param. 6] (Branch lengths)
          Tuningparam = lambda (Multiplier tuning parameter)
               lambda = 0.191
          Rel. prob.  = 0.0
 MrBayes> propset nslider(V)$prob=5

Use help propset on how to set different probabilities and tuning parameter values for different runs and chains. One word of warning: You should be extremely careful when modifying any of the chain parameters using propset. It is quite possible to completely wreck any hope of achieving convergence by inappropriately setting the tuning parameters.

Technical Issues (installation etc.)

How do I compile single- and multi-processor versions on SGI machines?

To compile MrBayes 3.1 on a Silicon Graphics machine (running IRIX or Linux) you need to use the -lm flag at the end of the command line when linking. A typical compile session would look like this:

gcc -DUNIX_VERSION -O3 -c -o mb.o mb.c
gcc -DUNIX_VERSION -O3 -c -o mcmc.o mcmc.c
gcc -DUNIX_VERSION -O3 -c -o bayes.o bayes.c
gcc -DUNIX_VERSION -O3 -c -o command.o command.c
gcc -DUNIX_VERSION -O3 -c -o mbmath.o mbmath.c
gcc -DUNIX_VERSION -O3 -c -o model.o model.c
gcc -DUNIX_VERSION -O3 -c -o plot.o plot.c
gcc -DUNIX_VERSION -O3 -c -o sump.o sump.c
gcc -DUNIX_VERSION -O3 -c -o sumt.o sumt.c
gcc -DUNIX_VERSION -O3 mb.o bayes.o command.o mbmath.o mcmc.o model.o plot.o sump.o sumt.o -o mb -lm

(Options for turning on various warnings have been omitted here for clarity). The only difference from the standard Makefile is on the last line above, where the -lm flag is now at the end. Under IRIX the cc compiler could have been used instead. Under Linux you could have used the icc compiler.

If you want to compile a multi-processor (parallel, MPI) version of MrBayes 3.1 for an SGI machine, then you should not use mpicc. Instead you should add the flag -lmpi when linking. A typical compile session would look like this:

gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c bayes.c 
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c command.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c mbmath.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c mcmc.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c model.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c plot.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c sump.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c sumt.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 bayes.o command.o mbmath.o mcmc.o model.o plot.o sump.o sumt.o -o mb_multi -lm -lmpi

Notice the last line where the -lmpi flag is given at the end. Again warning options have been omitted, and again you could have used the cc or icc compilers. Here the MPI version of the executable has been named "mb_multi" to avoid confusion with the non-parallel version.

How do I run MrBayes MPI (parallel) version in the background on an SGI machine?

In order to run the MPI version of MrBayes in the background on an SGI machine (running IRIX or Linux), you need to make a bogus redirection of stdin along the following lines:

mpirun -np 8 mb_multi myfile.nxs < /dev/null > log.txt &

Here, mb_multi is the MPI version of MrBayes compiled as described above. For this to work, you must have all relevant commands (set, lset, prset, mcmc, etc.) in the nexus file as a MRBAYES block.

The reason for the redirection trick is explained in the mpirun manpage:

      Running an MPI job in the background is supported only when stdin is
      redirected.

      The mpirun process is still connected to the tty when a job is placed
      in the background.  One of the things that mpirun polls for is input
      from stdin.  If it happens to be polling for stdin when a user types in
      a window after putting an MPI job in the background, and stdin has not
      been redirected, the job will abort upon receiving a SIGTTIN signal.
      This behavior is intermittent, depending on whether mpirun happens to
      be looking for and sees any stdin input.

      The following examples show how to run an MPI job in the background.

      For a job that uses input_file as stdin:

           mpirun -np 2 ./a.out < input_file > output &

      For a job that does not use stdin:

           mpirun -np 2 ./a.out < /dev/null  > output &

I am trying to run MrBayes under MPI on an SGI IRIX machine. What does the error message "array services not available" mean?

Most frequently, it means that the machine does not have the arrayd daemon running; enter the command (as root) "chkconfig array on" followed by a reboot, then retry running MrBayes with mpirun.

--AllenSmith

How can I convert Nexus files to and from Mac format

(from the mailing list) Both linux and OS X come with a handy little tool called "tr" to translate files, which should be much quicker than opening in another program and resaving - I used to use PAUP* for this until I found tr.

For example, to convert linux/unix line breaks to mac:

 tr "\012" "\015" < unix.txt > mac.txt

and from mac to linux:

 tr "\015" "\012" < mac.txt > unix.txt

I generally put this in a small little shell script called convert.sh to make things a bit easier ( and don't forget to make it executable: chmod +x convert.sh) :

 #!/bin/sh
 tr "\012" "\015" < $1 > $2


So - simply run:

 convert.sh unixfile macfile

--Simon

How can I run MrBayes on one or more MAC Quad-G5 machine(s)

MrBayes versions 3.1.2 and older are not multi-threaded. Therefore, to optimally make use of the two dual-core G5 processors in your machine, you have to make use of the parallel version of MrBayes. To run this version, you need to install a mpi "virtual machine", for example lam/mpi.

Installing Lam

Download the mac lam package from http://www.lam-mpi.org/7.1/download.php This will install lam/mpi on your machine. This assumes you have the developer package installed.

When you want to use lam/mpi on multiple machines, you have to install it on all machines. Furthermore, you have to make sure that

  • you are using secure shell (ssh) for connections between the machines, e.g. by setting the MPI_RSH shell variable to ssh.
  • all machines have access to the mrbayes files, e.g. by using a network filesystem like NFS or a script that copies all necessary files (see man rsync).

Compiling the parallel MrBayes version

From a terminal window you can compile mrbayes with

make MPI=yes USEREADLINE=no

or if you have libreadline installed, you can use USEREADLINE=yes. When you receive an error, saying command not found, this means that the command mpicc is not in your path or lam/mpi is not installed correctly. Add /usr/local/bin to your path and try again.

Running MrBayes

First we have to start the 'lam' by creating a text file called lamhost. If you are using one quad G5 machine, you need only one line "localhost cpu=4", e.g. by typing in the terminal window

 echo "localhost cpu=4" > lamhosts

When you have multiple machines, you need a line for every machine in the lamhost file and the full machine name instead of localhost. The file might look like

 $ cat lamhosts
 mac01 cpu=4
 mac02 cpu=4
 mac03 cpu=4

To start the lam you can type

 lamboot lamhosts

from a terminal window.

To run mrbayes on N processors, you use

 mpirun -np N ./mb

N is typically four times the number of machines you are using.

To stop the lam type

 wipe lamhosts

in a terminal window.

Personal tools