Sunday, June 26, 2016

Implementation of substitution models in phylogenetic software

Concluding the little series of posts on nucleotide substitution models, below is a summary of my current understanding of how to set several of the models discussed in the previous post in PAUP and MrBayes. But first a few comments.

For PAUP there are three possible options for the basefreq parameter. My current understanding is that equal sets equal base frequencies as in the JC model (duh), empirical fixes them to the frequencies observed in the data set, and estimate has them estimated across the tree. My understanding is further that while estimate is more rigorous, empirical is often 'close enough' and saves computation time. The point is, in any case below where it says one of the latter two options I believe one could also use the other without changing the model as such. I hope that is accurate.

What I do not quite understand is how one would set models like Tamura-Nei in PAUP. At least one of the sources I consulted when researching for this post (see below) suggests that one can set in PAUP models for example with variable transition rates but equal transversion rates, but the PAUP 4.0b10 manual states that the only options for the number of substitution rate categories are 1 (all equal), 2 (presumably to distinguish Tv and Ts), and 6 (all different). Would one not need nst = 3 or nst = 5 to set the relevant models? Perhaps the trick is to set nst = 6 but fix the substitution matrix? But that would mean one cannot estimate it during the analysis.

For the MrBayes commands note that the applyto parameter needs to be specified with whatever part(s) of the partition the particular model should apply to, as in applyto = (2) for the second part. The MrBayes commands I found my sources appear to be rather short; I assume that default settings do the rest. Note also that the old MrBayes versions that I am familiar with have now been superseded by RevBayes, but I have no experience with the latter program's idiosyncratic scripting language.

In addition to PAUP and MrBayes, I mention whether a model can be selected in three other programs I am familiar with, RAxML, BEAST and PhyML. I have no personal experience with most other Likelihood phylogenetics packages. I tried to check what is available in MEGA, but it wouldn't install for me, and I am not sure if the list of models in the online manual shows only those that are actually available for Likelihood search or whether it includes some that are only available for Distance analysis. The relevant list is linked from a page on Likelihood, but its own text implies it is about Distance. Either way, MEGA appears to have lots of options but I didn't indicate them below.

General Time-Reversible (GTR)

PAUP: lset nst = 6 basefreq = empirical rmatrix = estimate;

MrBayes: lset applyto=() nst=6;

Available in RAxML, BEAST 2.1.3 and PhyML 3.0.

Tamura - Nei (TN93 / TrN)

Available in BEAST 2.1.3 and PhyML 3.0.

Symmetrical (SYM)

PAUP: lset nst = 6 basefreq = equal rmatrix = estimate;

MrBayes: lset applyto=() nst=6; prset applyto=() statefreqpr=fixed(equal);

Hasegawa-Kishino-Yano (HKY / HKY85)

PAUP: lset nst=2 basefreq=estimate variant=hky tratio=estimate;

MrBayes: lset applyto=() nst=2;

Available in RAxML, BEAST 2.1.3 and PhyML 3.0.

Felsenstein 84 (F84)

PAUP: lset nst=2 basefreq=estimate variant=F84 tratio=estimate;

Available in PhyML 3.0.

Felsenstein 81 (F81 / TN84)

PAUP: lset nst=1 basefreq=empirical;

MrBayes: lset applyto=() nst=1;

Available in PhyML 3.0.

Kimura 2-parameters (K80 / K2P)

PAUP: lset nst=2 basefreq=equal tratio=estimate;

MrBayes: lset applyto=() nst=2; prset applyto=() statefreqpr=fixed(equal);

Available in RAxML and PhyML 3.0.

Jukes Cantor (JC / JC69)

PAUP: lset nst=1 basefreq=equal;

MrBayes: lset applyto=() nst=1; prset applyto=() statefreqpr=fixed(equal);

Available in RAxML, BEAST 2.1.3 and PhyML 3.0.

Invariant sites (+ I)

For PAUP, add the following to the lset command: pinvar = estimate

For MrBayes, add the following to the lset command: rates = propinv

Gamma (+ G)

For PAUP, add the following to the lset command: pinvar = 0 rates = gamma ncat = 5 shape = estimate (or another number of categories of your choice for ncat).

For MrBayes, add the following to the lset command: rates = gamma

Invariant sites and Gamma (+ I + G)

For PAUP, add the following to the lset command: pinvar = estimate rates = gamma ncat = 5 shape = estimate (or another number of categories of your choice for ncat).

For MrBayes, add the following to the lset command: rates = invgamma

References

This post is partly based on the following very helpful sources, all accessed c. 20 June 2016.

Faircloth B. Substitution models in mrbayes. https://gist.github.com/brantfaircloth/895282

Lewis PO. PAUP* Lab. http://evolution.gs.washington.edu/sisg/2014/2014_SISG_12_7.pdf

Lewis PO. BIOL848 Phylogenetic Methods Lab 5. http://phylo.bio.ku.edu/slides/lab5ML/lab5ML.html

And a PDF posted on molecularevolution.org.

No comments:

Post a Comment