Large-Scale Simultaneous Multiple
Alignment and Phylogeny Estimation
This project is funded by
NSF DEB 0733029 (ATOL),
and is a
collaboration between the University of
Texas, the University of Georgia, The University of Nebraska,
and the University of Kansas.
Main goals
- New methods for simultaneous estimation of alignments and
trees.
- New simulator tools for evolution which produce
more realistic data (including rearrangements, duplications,
and indels)
- Educational outreach
Symposium and Workshop Announcement
We will hold a Symposium and Workshop on new methods for
phylogenomics and metagenomics at UT-Austin
on Feb 16-17, 2013. (This is a follow-up to the
symposium and workshop we had on new methods for
alignment and tree estimation at the Smithsonian
Institution in Washington, DC, on May 20-22, 2012, but
covers new topics.)
Participation is limited and
registration is required.
Please contact Laurie Alvarez
(lauriea@austin.utexas.edu) to
obtain registration forms, and
click here
for more information.
Project personnel
Senior Personnel:
Current and former students and postdocs
University of Texas at Austin
- Md. Shamsuzzoha
Bayzid, UT-Austin, current PhD student of Tandy Warnow (awarded
Fulbright Fellowship)
- Siavash Mirarab, UT-Austin, PhD student of Tandy Warnow (awarded HHMI
International Student Graduate Fellowship)
- Nam Nguyen, UT-Austin, PhD student of Tandy Warnow
- Jimmy Yang, former PhD student of Tandy Warnow
- Shel Swenson, former PhD student and postdoctoral fellow of Tandy Warnow
- Kevin Liu, former PhD student and postdoctoral fellow of Tandy Warnow
University of Kansas
- Jeet Sukumaran, University of Kansas, PhD student of Mark Holder
- Daniel Money, postdoc of Mark Holder at the University
of Kansas
- Jamie Oaks, PhD student of Mark Holder at the University of Kansas.
- Jiaye Yu, University of Kansas, PhD student of Mark Holder
- Derrick Zwickl, University of Kansas, postdoctoral fellow of Mark Holder (now
postdoc at the University of Arizona)
University of Nebraska
-
Cory Strope,
University of Nebraska, PhD student and postdoc of Etsuko Moriyama (now postdoc
of Jeff Thorne at North
Carolina State University)
- Catherine
Anderson, University of Nebraska, PhD student of Etsuko Moriyama
University of Georgia
- Michael McKain, PhD student at the University of Georgia,
of Jim Leebens-Mack
Smithsonian Institution
- Sarah Kingston, PhD student of Mike Braun at the Smithsonian
Institution.
Pennsylvania State University
-
Aakrosh Ratan, PhD student of Webb Miller
-
Giltae Song, PhD student of Webb Miller
News
Benchmark Datasets
Please see
this page for benchmark datasets enabling testing of
methods for estimating multiple sequence alignments,
phylogenetic trees, supertrees, and phylogenetic placements
of short metagenomic reads.
Software
-
SATé, Simultaneous
Alignment and Tree Estimation.
This is a collaboration between the University of Texas
at Austin and the University of Kansas.
The design of the SATé is
ongoing research by Warnow and her students,
and the public distribution is provided by Mark Holder and
his group
at The University of Kansas.
-
PASTA,
Practical Alignment using SATé and TrAnsitivity.
PASTA is a direct improvement over SATé, offering
improved accuracy and speed.
The software is maintained at UT-Austin, and there is
a GUI with the same interface as SATé.
-
Software from the Miller Lab
at
Penn State University includes DIAL, YASRA, and CHAP2.
DIAL performs de novo identification of alleles,
and is useful for analyzing sequences from species that are
quite different from available
genomes.
YASRA is ``yet another short read assembler".
Finally, CHAP2 is a tool for determining orthologous
regions in an inter-species comparison of gene clusters,
using both traditional and alternative definitions
of ``orthology" (which may either consider or ignore
conversion events). CHAP2 includes graphical summaries
that can be particularly helpful in understanding
evolutionary relatedness.
-
GARLI: Genetic
Algorithms for Rapid Likelihood Inference, developed
by Derrick Zwickl.
-
SuperFine, a new supertree method, developed by Shel
Swenson, Rahul Suri, C. Randal Linder, and Tandy Warnow.
-
FastSP: a fast method for comparing two
multiple sequence alignments, developed by Siavash Mirarab and
Tandy Warnow.
-
SEPP:
SATé-enabled phylogenetic placement, developed
by S. Mirarab, N. Nguyen, and T. Warnow.
-
indel-Seq-Gen,
developed by Cory L. Strope, software for simulating
nucleotide or amino-acid sequence evolution.
-
SuiteMSA, visual tools for multiple sequence
alignment comparison and molecular sequence simulation.
Developed by C.L. Anderson, C.L. Strope, and E.N. Moriyama.
- Probtree (see "The effect of the guide tree on multiple
sequence alignments and subsequent phylogenetic analyses"). This
is a variant of ProbCons which uses a particular guide tree
in order to improve the alignment and also the topological accuracy
of the ML tree on the alignment.
(Not publically available, but contact us on how to run this.)
Publications
In press
2014
- S. Mirarab, N. Nguyen, and T. Warnow.
"PASTA: ultra-large multiple sequence alignment".
Accepted to RECOMB 2014
(PDF).
Supplementary materials at
figshare --
(PDF)
-
McKain, M.R., N. Wickett, Y. Zhang, S. Ayyampalayam, W. R. McCombie,
M. W. Chase, J. C. Pires, C. W.
dePamphilis, and J. Leebens-Mack. 2012.
"Phylogenomic analysis of transcriptome data elucidates
co-occurrence of a paleopolyploid event and the origin of bimodal karyotypes in Agavoideae
(Asparagaceae)." American Journal of Botany,
99 (2), 397-406.
2013
- T. Warnow.
"Large-scale multiple sequence alignment and
phylogeny estimation"
Chapter 6 in "Models and Algorithms for Genome Evolution", edited by Cedric Chauve, Nadia El-Mabrouk and Eric Tannier, Springer series on "Computational Biology".
For a preprint (not in final form) of this chapter, see this
PDF,
-
M.S. Bayzid, S. Mirarab, and T. Warnow.
"Inferring
optimal species trees under gene duplication and loss."
Pacific Symposium on Biocomputing, 18:250-261 (2013).
(PDF).
2012
-
Vos, Rutger A., James P. Balhoff, Jason A. Caravas,
Mark T. Holder, Hilmar Lapp,
Wayne P. Maddison,
Peter E. Midford,
Anurag Priyam,
Jeet Sukumaran,
Xuhua Xia, and Arlin Stoltzfus.
" NeXML: rich, extensible, and verifiable representation
of comparative data and metadata",
Systematic Biology (2012) 61 (4):675-689.
-
Ayres DL, Darling A, Zwickl DJ, Beerli P,
Holder MT, Lewis PO, Huelsenbeck JP, Ronquist F, Swofford DL, Cummings
MP, Rambaut A, and Suchard M. 2012.
"BEAGLE:
an Application Programming Interface and High-Performance
Computing Library for Statistical Phylogenetics",
Systematic Biology. 61(1):170-173.
-
Nelesen, S., K. Liu, L.-S. Wang, C. R. Linder, and T. Warnow.
"DACTAL: fast and accurate
estimations of trees without computing full sequence alignments."
Proceedings of ISMB 2012 and
Bioinformatics, Vol. 28, pages i274-i282.
(PDF)
-
Bayzid, Md. S. and T. Warnow.
"Finding Optimal Species Trees from Incomplete
Gene Trees under Incomplete Lineage Sorting."
Journal of Computational Biology,
June 2012, Vol. 19, No. 6, pages 591-605,
special issue for Simon Tavare and Michael Waterman.
(HTML).
-
Swenson, M.S., R. Suri, C.R. Linder, and T. Warnow, 2012.
SuperFine: fast and accurate supertree estimation
(PDF),
Systematic Biology, 61(1): 90-106.
-
Liu, K., T.J. Warnow, M.T. Holder, S. Nelesen, J. Yu,
A. Stamatakis, and C.R. Linder, 2012.
"SATe-II: Very Fast and Accurate Simultaneous Estimation of Multiple
Sequence Alignments and Phylogenetic Trees."
(PDF)
-
Song, G., C. Riemer, B. Dickins,
H. L. Kim, L. Zhang, Y. Zhang,
C-H Hsu, R. C. Hardison, NISC Comparative Sequencing
Program, E.D. Green, and Webb Miller, 2012.
"Revealing Mammalian Evolutionary Relationships
by Comparative Analysis of Gene Clusters."
Genome Biology and Evolution,
Volume 4,
pages 586-601.
-
Heath, Tracy A., Mark T. Holder, and John P. Huelsenbeck.
"A Dirichlet Process Prior for Estimating Lineage-Specific Substitution Rates".
Mol Biol Evol (2012) 29(3): 939-955.
-
Neves, D. T., T. Warnow, J. L. Sobral and K.
Pingali.
"Parallelizing SuperFine" (PDF).
27th Symposium on Applied Computing (ACM-SAC), Bioinformatics, 2012.
- Liu, K. and T. Warnow.
"Treelength optimization for phylogeny estimation."
PLoS One,
7(3):e33104. doi:10.1371/journal.pone.0033104.
-
T. Warnow. "Standard maximum likelihood
analyses of alignments with gaps can be statistically
inconsistent."
PLoS Currents Tree of Life.
Available
here and
here.
-
Nguyen, N., S. Mirarab, and T. Warnow.
"MRL and SuperFine+MRL: new supertree methods."
Journal Algorithms for Molecular Biology 7:3, 2012.
-
Mirarab, S., N. Nguyen, and T. Warnow.
"SEPP: SATe-Enabled Phylogenetic
Placement."
Proceedings of the 2012 Pacific Symposium on Biocomputing
(PDF).
-
Liu, K., C. Randal Linder, and T. Warnow.
"RAxML and FastTree: Comparing Two Methods
for Large-Scale Maximum Likelihood Phylogeny Estimation",
PLoS One,
6(11): e27731. doi:10.1371/journal.pone.0027731
2011
-
Song, G., Chih-Hao Hsu,
Cathy Riemer,
Yu Zhang,
Hie L Kim,
Federico Hoffmann,
Louxin Zhang,
Ross C Hardison,
NISC Comparative Sequencing Program,
Eric D Green, and Webb Miller.
"Conversion events in gene clusters",
BMC Evolutionary Biology 2011, 11:226.
-
Anderson, C.L., C. L. Strope, and E. N. Moriyama.
Assessing Multiple Sequence Alignments Using Visual Tools.
In Bioinformatics: Trends and Methodologies (ed. Mahdavi, M. A.). InTech (ISBN 978-953-307-282-1), 2011.
-
Mirarab, S. and T. Warnow. "FastSP: Linear-time calculation of alignment accuracy."
Bioinformatics
(2011) 27(23):3250-3258.
-
Liu, K., C. Randal Linder, and T. Warnow.
"RAxML and FastTree: Comparing Two Methods
for Large-Scale Maximum Likelihood Phylogeny Estimation."
PLoS ONE
6(11): e27731. doi:10.1371/journal.pone.0027731
-
Yu, Y., T. Warnow, and L. Nakhleh.
"Algorithms for MDC-based Multi-locus Phylogeny Inference."
Proceedings of RECOMB 2011
(PDF).
-
Yu, Y., T. Warnow, and L. Nakhleh.
Algorithms for MDC-Based Multi-Locus Phylogeny Inference: Beyond Rooted Binary Gene Trees on Single Alleles,
J. Computational Biology, November 2011, Vol. 18, No. 11,
pp 1543-1559
JCB.
-
Yang, J. and T. Warnow
"Fast and accurate methods for phylogenomic analyses".
RECOMB-CG 2011, and BMC Bioinformatics 12(Suppl 9): S4 (5 October 2011)
- Anderson, C.L., C. L. Strope, and E. N. Moriyama (2011)
SuiteMSA: Visual Tools for Multiple Sequence Alignment Comparison and Molecular Sequence Simulation.
BMC Bioinformatics 12:184.
- Holder MT, Steel M. 2011. "Estimating phylogenetic trees from pairwise likelihoods and
posterior probabilities of substitution counts". Journal of Theoretical Biology.
280(1):159-166.
(PDF).
2010
-
Bradner, J.E., N. West, M.L. Grachan,
E. Greenberg, S.J. Haggarty, T. Warnow and R. Mazitschek.
"Chemical phylogenetics of histone deacetylases", 2010.
Nature Chemical Biology.
6, pp. 238-243; published online 7 February 2010.
-
Holder MT, Lewis PO, Swofford DL. 2010.
"The Akaike Information Criterion Will Not Choose
the No Common Mechanism Model". Systematic Biology. 59(4):477-485.
(PDF)
-
Swenson, M.S., F. Barbancon, C.R. Linder, and T. Warnow.
"A simulation study comparing supertree and combined analysis methods
using SMIDGen."
Algorithms for Molecular Biology, 2010,
5:8 (4 January 2010),
special issue of selected papers
from WABI 2009.
(PDF)
-
Ratan A, Zhang Y, Hayes VM, Schuster SC, Miller W. 2010.
``Calling SNPs without a reference sequence."
BMC Bioinformatics 11: 130.
-
Schuster S.C., Miller W., Ratan A., Tomsho L.P., Giardine B., Kasson L.R., Harris R.S., Petersen D.C.,
Zhao F., Qi J., Alkan C., Kidd J.M., Sun Y., Drautz D.I., Bouffard P., Muzny D.M., Reid J.G., Nazareth
L.V., Wang Q., Burhans R., Riemer C., Wittekindt N.E., Moorjani P., Tindall E.A., Danko C.G., Teo
W.S., Buboltz A.M., Zhang Z., Ma Q., Oosthuysen A., Steenkamp A.W., Oostuisen H., Venter P., Gajewski
J., Zhang Y., Pugh B.F., Makova K.D., Nekrutenko A., Mardis E.R., Patterson N., Pringle T.H.,
Chiaromonte F., Mullikin J.C., Eichler E.E., Hardison R.C., Gibbs R.A., Harkins T.T., Hayes V.M.
(2010) Complete Khoisan and Bantu genomes from southern Africa. Nature 463:943-7.
- Swenson, M.S., R. Suri, C.R. Linder, and T. Warnow.
"An experimental study of Quartets
MaxCut and other supertree methods."
Journal of Algorithms for Molecular Biology 2010,
special issue of selected papers from WABI 2010.
See
(PDF) for the earlier (incomplete)
version of this paper.
-
Linder, C.R., R. Suri, K. Liu, and T. Warnow.
"Benchmark datasets and software for developing
and testing methods for large-scale multiple
sequence alignment and phylogenetic inference."
PLoS Currents: Tree of Life, 2010.
Available at knol.google.com,
here.
-
Liu, K., C.R. Linder, and T. Warnow.
"Multiple sequence alignment: a major challenge for
large-scale phylogenetics."
PLoS Currents: Tree of Life, 2010.
Available on knol.google.com,
here.
2009
-
Liu, K., S. Nelesen, S. Raghavan, C.R. Linder, and T. Warnow. Barking up the wrong treelength: The impact of gap penalty on alignment and tree accuracy.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 1, pp. 7-21, Jan.-Mar. 2009, doi:10.1109/TCBB.2008.63
-
Liu, K., S. Raghavan, S. Nelesen, C. R. Linder, T. Warnow.
Rapid and Accurate Large-Scale Coestimation of Sequence Alignments and
Phylogenetic Trees. Science, vol. 324, no. 5934, pp. 1561-1564, 19 June 2009, doi: 10.1126/science.1171243.
See the bottom of this page for a link to the paper.
-
Wang, Li-San, Jim Leebens-Mack, P. Kerr Wall, Kevin Beckmann, Claude W. de
Pamphilis, and Tandy Warnow.
The impact of multiple protein sequence alignment on
phylogenetic estimation.
IEEE Transactions on Computational Biology and Bioinformatics (TCBB), epub 01 Sept. 2009.
-
Strope, C.L., K. Abel, S.D. Scott, and E.N. Moriyama.
Biological Sequence Simulation For Complex Evolutionary
Hypotheses with indel-Seq-Gen version 2.
Molecular Biology and Evolution, (2009),
26:2581-2593.
-
Strope, C.L., S.D. Scott, and E.N. Moriyama.
Simulating sequence superfamilies for biological hypothesis testing.
Proc. Biotechnology and Bioinformatics Symposium (BIOT-2009),
pp. 69--70.
-
Swenson, M.S., F. Barbancon, R. Linder, and T. Warnow.
"A simulation study comparing supertree and combined analysis methods
using SMIDGen."
Proceedings of WABI (Workshop on Algorithms for
Bioinformatics) 2009.
2008
-
Nelesen, S., K. Liu, D. Zhao, C. R. Linder, and T. Warnow.
The Effect of the Guide Tree on Multiple Sequence Alignments and Subsequent Phylogenetic Analyses. Pacific Symposium on Biocomputing 13:15-24, 2008. Auxiliary materials available here.
Paper available in
(PDF) form.
-
Holder, M.T., J. Sukumaran, and P.O. Lewis.
A justification for reporting majority-rule consensus
tree in Bayesian phylogenetics.
Systematic Biology, (2008), 57(5):814-821.
-
Holder, M.T., D.J. Zwickl, C. Dessimoz.
Evaluating the robustness of phylogenetic methods to among-site
variability in substitution processes.
Philosophical Transactions of the Royal Society B-Biological Sciences
vol. 363, (2008), p. 4013., doi: 10.1098/rstb.2008.016.