FAQ 3.2

From MbWiki

Jump to: navigation, search

Contents

Practical Issues

How do I cite the program?

If you want to cite the program, we suggest you use the two papers published in Bioinformatics (Huelsenbeck and Ronquist, 2001;Ronquist and Huelsenbeck, 2003). If you are using the MPI version of the program, you may also want to cite Altekar et al. (2004).

How do I run MrBayes in batch mode?

When you become more familiar with MrBayes, you will undoubtedly want to run it in batch mode instead of typing all commands at the prompt. This is done by adding a MRBAYES block to a Nexus file, either the same file containing the DATA block or a separate Nexus file. The MRBAYES block simply contains the commands as you would have given them from the command line, with the difference that each command line is ended with a semi-colon. For instance, a MRBAYES block that performs three single-run analyses of the data setprimates.nex under the GTR + Γ model and stores each result in a separate file is given below:

begin mrbayes;
   set autoclose=yes nowarn=yes;
   execute primates.nex;
   lset nst=6 rates=gamma;
   mcmc nruns=1 ngen=10000 samplefreq=10 file=primates.nex1;
   mcmc file=primates.nex2;
   mcmc file=primates.nex3;
end;

Since this file contains the “execute” command, it must be in a file separate from the primates.nex file. You start the analysis simply by typing execute <filename>, where filename is the name of the file containing the MRBAYES block. The set command is needed to change the behavior of MrBayes such that it is appropriate for batch mode. When autoclose = yes, MrBayes will finish the MCMC analysis without asking you whether you want to add more generations. When nowarn = yes, MrBayes will overwrite existing files without warning you, so make sure that your batch file does not inadvertently cause the deletion of previous result files that should be saved for future reference.

The UNIX version of MrBayes can execute batch files in the background from the command prompt. Just type mb <file> > log.txt & at the UNIX prompt, where <file> is the name of your Nexus batch file, to have MrBayes run in the background, logging its output to the file log.txt. If you want MrBayes to process more than one file, just list the files one after the other with space between them, before the output redirection sign (>). When MrBayes is run in this way, it will quit automatically when it has processed all files; it will also terminate with an error signal if it encounters an error.

Alternatively, the UNIX version of MrBayes can also be run in batch mode using input redirection. For that you need a text file containing the commands exactly as you would have typed them from the command line. For instance, assume that your data set is in primates.nex and that you want to perform the same analyses specified above. Then type mb < batch.txt > log.txt & with the batch.txt file containing this text:

set autoclose=yes nowarn=yes
execute primates.nex
lset nst=6 rates=gamma
mcmc ngen=10000 savebrlens=yes file=primates.nex1
mcmc file=primates.nex2
mcmc file=primates.nex3
quit

The quit command forces MrBayes to terminate. With previous versions of MrBayes we have had problems with infinite loops when the quit command is not included at the end of the file. This problem has been solved in version 3.1, however we advise you to still use the quit command.

How do I specify the Mk model (Markov k model; Lewis, 2001) for my morphological data?

Short answer: Any data of type Standard (and Restriction) is automatically assigned the Mk model when data is executed in MrBayes.

Things to consider

A couple of parameters are applicable to a data partition with the Mk model, and the use of those are potentially important. Below are examples for data type Standard. See Help Lset and the MrBayes manual for more info on data type Restriction.

 Lset Coding = <All/Variable>;

The Coding parameter specifies how characters were sampled. The options all and variable apply to data type Standard. As an example, morphological data are rarely sampled in such a way that we observe data columns with the same character state for all taxa (e.g., "all 0", "all present", etc). If constant characters were sampled, the option all should be used. If, on the other hand, constant characters are absent from the matrix, the option variable should be used instead. In MrBayes, variable is the default for standard data (and has been since at least version 3.0), while all is the default for all other types of data. The theoretical effect of using all instead of variable is that branch lengths tend to be overestimated (Lewis, 2001), which can lead to errors in phylogenetic inference. However, the correction for coding bias or ascertainment bias is typically negligible unless the tree is small (few taxa and short branch lengths).

 Lset Rates = <Equal/Gamma/Propinv/Invgamma>;

The Rates parameter specifies how rates are distributed across the characters in a data matrix. Applicable to Standard data and the Mk model is Equal, Gamma, and sometimes also Propinv, and Invgamma. Typically, neighboring "sites" in a morphological matrix are not correlated in the same way as sites in a codon, or sites belonging to stems and loops in a secondary structure. Therefore, the use of the option adgamma has no direct relevance to morphological data. Note that the use of Propinv, and Invgamma only is applicable if the data matrix is of coding type All (matrix contains constant sites, see above).

 Ctype <ordering>:<characters>;

The Ctype parameter specifies the ordering of the characters. By default, all characters are of character type Unordered. Characters can also be changed to character type Ordered, or Irreversible.

Example

 #NEXUS
 Begin Data;
   Dimensions Ntax=4 Nchar=4;
   Format Datatype=Standard;
   Matrix
    Apa 0021
    Bpa 0121
    Cpa 0110
    Dpa 0100
   ;
 End;
 Begin MrBayes;
   Lset Rates=Gamma Coding=All;
   Ctype Ordered:3;
 End;

See also

MrBayes manual (http://mrbayes.csit.fsu.edu/wiki/index.php/Evolutionary_Models_Implemented_in_MrBayes_3#Standard_Discrete_.28Morphology.29_Model) help for commands Format, Ctype, Lset, Prset

References

Lewis, P. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50:913--925.

Problems Running MrBayes

Warnings

  • I get the warning In LIKE_EPSILON - for standard division 0 char 130 has like = 0.00000000000000000000000000000"

(from the mailing list) Most probably it is the combination of model settings / proposal parameters that causes the problem, though I'd blame covarion=yes rather than aamodelpr in the first place. I've seen similar errors with codon models. My impression is that with certain complicated models one of the parameters sometimes goes out of range. Setting a simpler model often rectifies the situation. In your case covarion=yes adds most of complexity.

George

This could be a bug in MrBayes, so please, feel free to report it on the mrbayes bug list. However, we can not fix a bug when we can not replicate it. So, we kindly request you to send a (minimal) data file along with your bug report.

Scientific Issues

How are gaps and missing characters treated?

MrBayes uses the same method as most maximum likelihood programs: it treats gaps and missing characters as missing data. Thus, gaps and missing characters will not contribute any phylogenetic information. There is no way in which you can treat gaps as a fifth state in MrBayes (but see below for information on how you can use gap information in your analysis).

How do I use gap information in my analysis?

Often, insertion and deletion events contain phylogenetically useful information. Although MrBayes 3 is not able to do statistical multiple sequence alignment, treating the insertion-deletion process under a realistic stochastic model, there is nevertheless a way of using some of the information in the indel events in your MrBayes analysis: Code the indel events as binary characters (presence/absence of the gap) and include them as a separate binary (restriction) data partition in your analysis. See more information on this possibility in the section on the binary model in this manual.


Technical Issues (installation etc.)

How do I compile single- and multi-processor versions on SGI machines?

To compile MrBayes 3.1 on a Silicon Graphics machine (running IRIX or Linux) you need to use the -lm flag at the end of the command line when linking. A typical compile session would look like this:

gcc -DUNIX_VERSION -O3 -c -o mb.o mb.c
gcc -DUNIX_VERSION -O3 -c -o mcmc.o mcmc.c
gcc -DUNIX_VERSION -O3 -c -o bayes.o bayes.c
gcc -DUNIX_VERSION -O3 -c -o command.o command.c
gcc -DUNIX_VERSION -O3 -c -o mbmath.o mbmath.c
gcc -DUNIX_VERSION -O3 -c -o model.o model.c
gcc -DUNIX_VERSION -O3 -c -o plot.o plot.c
gcc -DUNIX_VERSION -O3 -c -o sump.o sump.c
gcc -DUNIX_VERSION -O3 -c -o sumt.o sumt.c
gcc -DUNIX_VERSION -O3 mb.o bayes.o command.o mbmath.o mcmc.o model.o plot.o sump.o sumt.o -o mb -lm

(Options for turning on various warnings have been omitted here for clarity). The only difference from the standard Makefile is on the last line above, where the -lm flag is now at the end. Under IRIX the cc compiler could have been used instead. Under Linux you could have used the icc compiler.

If you want to compile a multi-processor (parallel, MPI) version of MrBayes 3.1 for an SGI machine, then you should not use mpicc. Instead you should add the flag -lmpi when linking. A typical compile session would look like this:

gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c bayes.c 
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c command.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c mbmath.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c mcmc.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c model.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c plot.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c sump.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 -c sumt.c
gcc -DUNIX_VERSION -DMPI_ENABLED -O3 bayes.o command.o mbmath.o mcmc.o model.o plot.o sump.o sumt.o -o mb_multi -lm -lmpi

Notice the last line where the -lmpi flag is given at the end. Again warning options have been omitted, and again you could have used the cc or icc compilers. Here the MPI version of the executable has been named "mb_multi" to avoid confusion with the non-parallel version.

How do I run MrBayes MPI (parallel) version in the background on an SGI machine?

In order to run the MPI version of MrBayes in the background on an SGI machine (running IRIX or Linux), you need to make a bogus redirection of stdin along the following lines:

mpirun -np 8 mb_multi myfile.nxs < /dev/null > log.txt &

Here, mb_multi is the MPI version of MrBayes compiled as described above. For this to work, you must have all relevant commands (set, lset, prset, mcmc, etc.) in the nexus file as a MRBAYES block.

The reason for the redirection trick is explained in the mpirun manpage:

      Running an MPI job in the background is supported only when stdin is
      redirected.

      The mpirun process is still connected to the tty when a job is placed
      in the background.  One of the things that mpirun polls for is input
      from stdin.  If it happens to be polling for stdin when a user types in
      a window after putting an MPI job in the background, and stdin has not
      been redirected, the job will abort upon receiving a SIGTTIN signal.
      This behavior is intermittent, depending on whether mpirun happens to
      be looking for and sees any stdin input.

      The following examples show how to run an MPI job in the background.

      For a job that uses input_file as stdin:

           mpirun -np 2 ./a.out < input_file > output &

      For a job that does not use stdin:

           mpirun -np 2 ./a.out < /dev/null  > output &

I am trying to run MrBayes under MPI on an SGI IRIX machine. What does the error message "array services not available" mean?

Most frequently, it means that the machine does not have the arrayd daemon running; enter the command (as root) "chkconfig array on" followed by a reboot, then retry running MrBayes with mpirun.

--AllenSmith

How can I convert Nexus files to and from Mac format

(from the mailing list) Both linux and OS X come with a handy little tool called "tr" to translate files, which should be much quicker than opening in another program and resaving - I used to use PAUP* for this until I found tr.

For example, to convert linux/unix line breaks to mac:

 tr "\012" "\015" < unix.txt > mac.txt

and from mac to linux:

 tr "\015" "\012" < mac.txt > unix.txt

I generally put this in a small little shell script called convert.sh to make things a bit easier ( and don't forget to make it executable: chmod +x convert.sh) :

 #!/bin/sh
 tr "\012" "\015" < $1 > $2


So - simply run:

 convert.sh unixfile macfile

--Simon

How can I run MrBayes on one or more MAC Quad-G5 machine(s)

MrBayes versions 3.2 and older are not multi-threaded. Therefore, to optimally make use of the two dual-core G5 processors in your machine, you have to make use of the parallel version of MrBayes. To run this version, you need to install a mpi "virtual machine", for example lam/mpi.

Installing Lam

Download the mac lam package from http://www.lam-mpi.org/7.1/download.php This will install lam/mpi on your machine. This assumes you have the developer package installed.

When you want to use lam/mpi on multiple machines, you have to install it on all machines. Furthermore, you have to make sure that

  • you are using secure shell (ssh) for connections between the machines, e.g. by setting the MPI_RSH shell variable to ssh.
  • all machines have access to the mrbayes files, e.g. by using a network filesystem like NFS or a script that copies all necessary files (see man rsync).

Compiling the parallel MrBayes version

From a terminal window you can compile mrbayes with

./configure --enable-mpi
make

When you receive an error, saying command not found, this can mean that the command mpicc is not in your path or lam/mpi is not installed correctly. Add /usr/local/bin to your path and try again. See also the Compiling MrBayes section for more help on the configure script.

Running MrBayes

First we have to start the 'lam' by creating a text file called lamhost. If you are using one quad G5 machine, you need only one line "localhost cpu=4", e.g. by typing in the terminal window

 echo "localhost cpu=4" > lamhosts

When you have multiple machines, you need a line for every machine in the lamhost file and the full machine name instead of localhost. The file might look like

 $ cat lamhosts
 mac01 cpu=4
 mac02 cpu=4
 mac03 cpu=4

To start the lam you can type

 lamboot lamhosts

from a terminal window.

To run mrbayes on N processors, you use

 mpirun -np N ./mb

N is typically four times the number of machines you are using.

To stop the lam type

 wipe lamhosts

in a terminal window.

Personal tools