Visualizing the Evolutionary Tree of Life

Kannan Mahadevan

One of the grand challenges in biology is the discovery of the detailed pattern of relatedness among all life forms, reflecting the process of lineage diversification which formed the magnificent evolutionary tree of life. Phylogenetic trees have become central to a range of key questions in biology, from searching for evolutionary patterns and analyzing natural selection, to examining genetic change and evolutionary rules that apply across divergent forms of life.  However, when they get really big—more than a few hundred species—evolutionary trees are difficult to see and explore, a problem Carl Zimmer addressed in a recent New York Times article—“Crunching the Data for the Tree of Life.”  One of the chief difficulties seemed to be compromise: how to focus in on an area while still retaining the broader context of one’s location.  For example, how can a user zoom in on the hummingbirds, while simultaneously retaining a more panoramic view—e.g. their position in relation to other birds, as well as dinosaurs, crocodiles and other animal groups not far away? In addition, how can we visualize large evolutionary trees in a platform rich with data and content, so that while zooming in, out and around the tree of life we have access to massive sources such as the Encyclopedia of Life and the Tree of Life  (TOLweb) content set?

Since the inability of current software tools to perform these important visualization tasks represents a major challenge in computational biology, it is natural that the problem of tree visualization belongs to computer scientists and graphic designers, as well as evolutionary biologists.  To address this challenge, a group of tree-thinking representatives from evolutionary biology, academic computing, and the software and graphics industry participated in a meeting from May 1-3, designed to synthesize various approaches to tree visualization.  Sponsored by the EOL Biodiversity Synthesis Group, it featured 32 participants from the U.S. and Canada and consisted of a series of talks and group discussion, followed by more focused break-out sessions.

The meeting made progress on several initiatives that will benefit biodiversity research and education.  First, it committed a group of people to the goal of developing evolutionary visualization tools, and came up with proposal themes for several funding efforts.  Second, new software tools were conceived and the design process begun:  our software company participants were particularly interested in audience selection and software design.  One of the most exciting aspects for a general audience was the synergy achieved between existing projects like TOLWeb and Harvard’s INVOLV touch –table project, with industry representatives that can provide experienced guidance for reaching a public audience.  Third, interoperability was a common theme at the meeting, with representatives from EOL, TOL, Mesquite  and TreeBASE committed to the development of APIs (application program interface’s) and standards that will allow intercommunication.

Lastly, the meeting was a fun and highly active brainstorming session, an important facet of good synthesis meetings.  A cursory glance at the meeting room—its walls covered with post-it notes and scrawled-on dry erase boards—might have been enough to convince an onlooker that the brainstorming process had swallowed up the goal of organized synthesis.  However, Dr. Karen Cranston, postdoctoral research scientist at the Biodiversity Synthesis Center, called the meeting “amazingly productive,” especially considering the fact that people from such diverse backgrounds were thrown together for only two and a half days.  The general feeling was that the synthesis meeting is a fitting format to attack interdisciplinary problems in science, and to meet the goal of developing an NSF grant.  By inviting diverse input from the very beginning, the meeting identified a broader range of basic questions and generated greater enthusiasm, than a smaller scale approach that would have tried to enlist outside interest for a project with a pre-defined scope.

Many of the presentations introduced concepts or programs in keeping with a Google Earth model for tree visualization.  Tamara Munzner, of the University of British Columbia, presented a program called Tree Juxtaposer that can keep a desired clade or tip of the tree visible by treating it like a landmark.  While anchoring themselves to one spot with this ‘guaranteed visibility,’ users can explore the rest of the two dimensional tree space with fish-eye lenses and other zooming devices.  Greg Jordan of the European Bioinformatics Institute described a similar program called Phylowidget, which allows users to customize their viewing by hiding or showing different regions of tree data.  However, both these programs have to meet the challenge of working at a scale larger than a few thousand species.

One of the larger themes of the meeting was that a successful tree visualization tool will mean more than a neat gadget with a fancy user interface.  Dr. Cranston characterized “visual exploration” as a key step in the scientific process following data collection.  Organizing morphological or molecular information into a tree structure allows scientists to immediately see patterns they might otherwise miss if they compared traits as numbers—that is, if the visual element were removed.  Moreover, plotting data in trees imposes an evolutionary focus on the problem of species comparison from the very beginning.  Instead of arbitrarily comparing numbers, scientists compare them according to evolutionary categories, such as homologous traits.  Rather than hiding evolutionary relationships and questions, the data showcases these things because it has already been parsed according to an evolutionary framework.  But scientists cannot stop at visualizing one tree; they must compare a large collection of ‘best’ possible trees generated by phylogenetic analysis.  David Hillis (University of Texas at Austin) and Katherine St. John (Lehmann College, City University of New York) addressed this topic of tree set visualization—how to compare and summarize large collections of trees by structure and branching pattern.

Scientists and educators hope to embed tree visualization technology into third-party websites, thereby linking tree images with existing species description or data.  The tree visualization project initiated in the recent meeting will be tightly linked with the Encyclopedia of Life, so that people can browse the catalog of EOL species via phylogenetic trees rather than taxonomic hierarchy.  The educational potential of such a platform is immense: a single node on a tree would simultaneously reveal a species’ evolutionary history and biodiversity information, and direct users interested in either to learn about the other.

A fitting tool for a science bound together by Theodosius Dobzhansky’s famous doctrine, ‘Nothing in biology makes sense except in the light of evolution,’ a science whose most basic subjects—species—are both universes unto themselves and parts of a larger whole.

treevizgrouppic.jpg

The TreeViz participants outside the Field Museum in Chicago.

Leave a Reply