DIMACS Working Group Meeting on Mathematical and Computational Aspects Related to the Study of The Tree of Life

March 11-14, 2003
DIMACS Center, CoRE Building, Rutgers University, Piscataway, NJ

Organizers:
Melvin F. Janowitz, DIMACS, melj@dimacs.rutgers.edu
Francois-Joseph Lapointe, Universite de Montreal, lapoinf@biol.umontreal.ca
F. R. McMorris, Illinois Institute of Technology, mcmorris@iit.edu
Fred Roberts, DIMACS, froberts@dimacs.rutgers.edu
Presented under the auspices of the Special Focus on Computational Molecular Biology and the Special Focus on Computational and Mathematical Epidemiology.



DIMACS recently held two working group meetings on Bioconsensus. The general goal of these meetings was to investigate the use of consensus techniques in evolutionary biology, and in particular their applications to phylogenetics. The meetings were highly successful and will result in the publication of a volume in the DIMACS book series next year. There has been a strong feeling among the participants in these meetings that there should be a third event, but that it should not focus on the specific topic of Bioconsensus. There was some sentiment for a further meeting on supertrees, and more generally for a working group designed to explore algorithmic and mathematical aspects related to the study of the Tree of Life. There is considerable interest in this type of project in both the Biology and the Mathematics community, as witnessed by the following:
1. The Tree of Life web project. (www.tolweb.org/tree/phylogeny.html)

2. The recent National Science Foundation program (NSF-02-074) whose proposal deadline was May 22, 2002, and whose title is Assembling the Tree of Life.

3. The lead article by Richard Karp in the May, 2002 AMS Notices "Mathematical Challenged from Genomics and Molecular Biology".

4. The symposium held May 30-June 1, 2002 at the American Museum of Natural History entitled "Assembling the Tree of Life: Science, Relevance and Challenges"

5. A symposium on supertrees that will be held in conjunction with the annual Evolution Meeting (Chico, CA, June 21-24, 2003). The title of this is: "Phylogenetic Supertrees: the stage play ".

6. Kluwer Academic will be publishing a book in 2003 called "Phylogenetic supertrees: the book".

Description of the project: Vast quantities of molecular data are becoming available, and there is a need to provide efficient computer algorithms that will appropriately scale to accommodate the size and quality of the underlying data sets. We recognize that for many reasons viral evolution may have features not necessarily present in the evolution of organisms on a multicellular scale. Since a workshop on the role of evolution in epidemiology is in the planning stage, our intention is to not focus on such organisms.

There are a number of possible topics. We just indicate some possible themes here. We intend to ask the Bioconsensus community for suggestions. First of all should this Tree of Life really be a tree, or is some other data structure a more plausible model? Certainly certain local portions of evolutionary structure should be tree-like, but when these local structures are assembled, should they form a supertree or some more general structure? Here we wish to compare mathematical theory with current research trends in the biological community. We hope that the biologists will suggest areas they find useful, as opposed to mathematicians just suggesting models of interest to them.

How does one handle possible errors in the data? What about missing data? Certainly statistics must play a role here, but we need also to study combinatorial methods, consensus methods, some notion of approximate compatibility analysis, generalized pyramids, and some form of cluster analysis. How, for example, should one deal with reticulate evolution? Should errors in data be dealt with by allowing alternate models of evolution, or by allowing reversals and overlap or is there some other technique dependent upon the data? There is a need to develop better techniques for handling microarray data as well. Here new consensus or classification techniques need to be developed. Thought should also be given to the development of appropriate measures of dissimilarity. Are there any techniques other than string matching of interest to the biological community?

Existing methods of supertree construction should be compared with simulations. The NP-completeness of current techniques should be tested, and where appropriate fast approximate algorithms (heuristics) should be developed. Is there a role for compatibility analysis? What about parsimony? There is a need to compare MRP methods with other methods of constructing a supertree. There is a need even to arrive at an acceptable definition of a supertree. How does one deal with massive data sets to derive an accurate Tree of Life?


Next: Call for Participation
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on March 11, 2003.