Illinois Bioblitz

Alta Buden
July 3rd, 2008

torsteninthefield.jpg

Torsten Dikow, net in hand.

This past weekend (July 27-28) BioSynC post-doc Torsten Dikow, expert on robberflies, was invited to participate in the 2008 Middlefork Savanna Bioblitz. On Saturday, he ventured into Middlefork Savanna, one of the “most important sites for biodiversity in northeastern Illinois” according to Chicago Wilderness, to look for specimens of robber flies. Here is the one he found (it was very windy, not stellar for fly catching).
bioblitz_middlefork_savanna_09.JPG

Bioblitzes are an brilliant way to simultaneously raise public awareness about local biodiversity and to collect scientific data: Torsten will now identify that fly and ad it to his research data. In addition the park preserve now has a better idea of what species live there, and thus how to better look out for them. This bioblitz featured over 100 scientists and many more civilians combing the preserve for 24 hours to attempt to identify as many species as possible. They discovered at least 1054 species and more are still being counted by participating scientists and taxonomists. Check out their blog to see how the event progressed.

Bioblitzes are usually centered around a “Tally Tent” where people bring photographs and specimens for documentation, and where the public can watch scientists at work. These events have exciting implications for the EOL; we hope that in the future the EOL can both serve as a resource for people identifying species in them as well as act as a host for the detailed regional data that they produce. Here is the flickr link to the best photos that they took like the one below of a shy and elusive eastern milk snake (Lampropeltis triangulum triangulaum).picture-7.jpg

BioSynC Attends Annual Joint Meeting on Evolution

Alta Buden
June 27th, 2008

minneapolis-019.jpgminneapolis-018.jpg

From June 20-24th BioSynC staff members Mark Westneat, Audrey Aronowsky, Darolyn Striley and myself hosted a booth at the 2008 Evolution meeting. The meeting attracts leading phylogeneticists, systematists, and evolutionary biologists from around the world making it an ideal conference to recruit scientists to EOL and also to solicit interest in synthesis meetings. It was held in Minneapolis at the University of Minnesota and had an attendence of about a thousand people. Our booth got lots of attention (we were only ones with candy :)) and we met several people who are going to propose synthesis meetings. Our booth was across from the NEScent booth and we had alot of fun getting to know our fellow synthesis meeting organizers, who, for the record are still leaps ahead of us in terms of booth set-up.

Mark gave a 15 minute talk to a full audience titled: “Phylogenetic visualization tools and phyloinformatics in the Encyclopedia of Life”. In the presentation, Mark used his own fish research ( a phylogenetic dataset) which incorporates systematics, developmental genomics, and biogeography to demonstrate the many kinds of data that EOL will be able to handle and also how the research communities that focus on each of these kids of data could work together using EOL. Here are some pictures of us and the booth in action:minneapolis-026.jpgminneapolis-012.jpgminneapolis-024.jpgminneapolis-016.jpg

New Members of the Informatics Team

David Shorthouse
May 30th, 2008

Jonathan Clapp
Software Developer

clapp.jpg
I grew up on Cape Cod and join the EOL Informatics Group as a software developer. My experience is in database and web application development.
I will help ensure that the foundations of the Encyclopedia of Life are as solid as possible while allowing flexibility in the future. I have followed this important project for some time and am thrilled to be contributing to its success. It has the potential to be a great resource for learning about and advancing the conservation of the myriad species on earth.

Vitthal Kudal
Software Developer

vitthal.jpg
I am working as a Software Developer with the informatics group of Encyclopedia of Life. I have Master’s degree in Computer Science from University of Pune, INDIA. I have worked for NCL center for Biodiversity Informatics (NCBI) as a Project Assistant. My dream is to make all species data available over the internet on a single command of user and which is going to fulfill by working with the EOL Group. What it is has been like working at EOL? Interesting, inspiring, insightful, impactful, fun and more such words.

Jeremy Rice
Software Developer

rice.jpg
Working with the Encyclopedia of Life is the realization a long-time dream for me. I love being at a university, advancing research. The Encyclopedia’s vision to synthesize all information about life present on Earth… that makes it something really essential for me. I’m joining the team with over ten years of experience developing a variety of applications. What really drives me is turning abstract ideas into working products that people appreciate. There’s an abundance of ideas here, and I hope that we can produce some amazing tools to facilitate them…

Dimitry Mozzherin
Software Developer

dimitry.jpg
I was born in Russia, and from my early school years I wanted to become a biologist and a wild life photographer. I happened to become both later. At some point I started to learn programming languages and after discovering Open Source movement I decided to make programming my profession. And I am now at EOL because here I can express my passion for wild life and passion for development Free Software at the same time!

Anne Thessen
Post-doctoral Investigator

thessen.jpg
I’m working on data mobilization for EOL and the International Census of Marine Microbes. Lots of biological data can be found on the printed page, which must be read to retrieve information. I’m trying to find ways to make this information easier to retrieve and use. Prior to joining the EOL team, I worked on Arctic primary production, toxin-producing diatoms and shellfish grazing.

SOS - State of Observed Species

Rod Page
May 29th, 2008

Arizona State University’s “International Institute for Species Exploration” has released it’s first State of Observed Species Report. It reports that 16,969 new species were discovered in 2006 (approximately 46 species per day). Not surprisingly, most are insects:

sos.png

SOS have also published a list of the “top 10″ species described in 2007.

2008_01th.jpg 2008_02th.jpg 2008_03th.jpg 2008_04th-1.jpg 2008_05th.jpg
2008_06th.jpg 2008_07th.jpg 2008_08th.jpg 2008_09th.jpg 2008_10th.jpg

This list has attracted some comment at The Other 95%, Zooillogix, and Catalogue of Organisms.

These lists have implications for EOL. The report gives us a lower bound on the rate of new species description — EOL will need to be able to add at east 46 species pages a day just to keep pace with new discoveries, never mind what has already been described. It isn’t doing anything like this at present, and hence none of the species in the SOS top ten list are in EOL (most are already in Wikipedia, and all return at least some information in iSpecies).

iNaturalist

Rod Page
May 17th, 2008

logo-1.gif
Ken-ichi Ueda told me about iNaturalist.org, a wonderful site he, Nathan Agrin, and Jessica Kline have created for their Masters at UC Berkeley’s School of Information. To quote from the web site:

iNaturalist.org is a place where you can record what you see in nature, meet other nature lovers, and learn about the natural world.

It looks gorgeous (lots of Flickr Creative Commons photos), use of Wikipedia, and the TimeMap Javascript library. arachnida.png

Arguably the species pages are clearer than EOL’s (compare Anolis carolinensis on iNaturalist and EOL).But what makes it especially cool is the way it engages users with the ability to add observations of organisms, and request identifications. I like the emphasis on being

…a fun and efficient way to record, find, and share nature observations.

I think its a great project that could provide useful ideas for the design of EOL’s pages.

Citizen science podcast

Rod Page
May 15th, 2008

PodcastLogo.png
Jon Udell has a great podcast where he interviews Janis Dickinson, who directs the citizen science program at at Cornell’s Laboratory of Ornithology. On his blog Jon writes:

Extracting signal from noise is, of course, one of the classic bread-and-butter activities of information science. What’s fascinating here is the Web 2.0 angle. Birdwatchers are famously passionate data collectors who develop reputations among their peers. When they contribute their data to eBird — and thence to the Avian Knowledge Network — those reputations can begin to be measured, and used to tune the analysis of a large body of contributed data.

These are, of course, issues directly relevant to EOL. Jon has long been interested in integrating information (including digital libraries), social networking, and how people interact with technology. His podcast is a mine of useful information. Click this link to subscribe to it in iTunes.

IAG review of BIG

Rod Page
May 1st, 2008

2415336890_84744a837e_t.jpgOn Monday and Tuesday, 14-15 April, the MBL at Woods Hole hosted the first review of EOL’s Biodiversity Informatics Group (BIG). This meeting was a chance for the Informatics Advisory Group (IAG — sorry, there are still more acronyms to come) to hear about progress to date, and where BIG wanted to go next. Chris Freeland (from BHL) has some posted some photos of the meeting on Flickr, which give a sense of the number of people involved: members of BIG and IAG, together representatives from BHL, BioSynC, the Steering Committee, and interested observers. What the photos can’t convey is the spirited nature of the discussion, which made the two days hugely enjoyable.

It was my task as chair of the IAG to try and condense the detailed reports given by BIG, and the subsequent discussions, into a written report. That has now been done, and the result presented to BIG. In this post I will summarise two key areas, namely content and vetting. The report also addressed topics such as the site design, globally unique identifiers, and organisational matters, but I think content and vetting are the two that generated the most debate.

Content

Now that EOL is live and people have had a chance to look around, it is striking that 76% percent of visitors don’t return, and 44% of all visitors left in under 10 seconds. After the initial launch where, if anything, EOL was too popular, interest seems to have dropped off markedly. One possible reason for this is the relative lack of content. As I noted elsewhere, for many pages EOL compares unfavourably with other sites, such as David Stang’s ZipcodeZoo, or my own mashup iSpecies. EOL’s current strategy has been to limit its content to “vetted” information from trusted providers. For 24 exemplar taxa EOL provides relatively detailed information, but for the rest of life the content it currently displays is pretty sparse.

The challenge is how to cover all life in reasonable detail. If we take the well-worn estimate of 1,800,000 million described species, and EOL’s 10 year time frame, then BIG needs to add around 500 species pages per day! Doing this without massive automation simply won’t scale. Assembling the 24 exemplar pages required considerable effort, yet simple aggregators such as iSpecies can generate a roughly similar level of detail within seconds. The diagram below compares pages for Anolis carolinensis (EOL exemplar taxa) in EOL (left, or go here) and iSpecies (right, or go here). The iSpecies account is assembled automatically on the fly from sources such as GenBank, GBIF, Google Scholar, Yahoo Images, and Wikipedia.

ispecies.png

EOL is a long term project, and hence it may seem unfair to judge it so soon after it has been launched (and after Herculean efforts by BIG staff). However, given EOL’s current lack of content, and the existence of other web sites (such as ZipcodeZoo, DiscoverLife, and iSpecies) that already serve a much greater amount of information, my concern is that EOL risks being marginalised. I don’t think that EOL has anything like 10 years in which to prove itself.

How to add content quickly

For the report I prepared a cartoon plotting the cost of obtaining content against the amount of content obtained. “Costs” are in terms of developer time to import data (are they in a standard form, or a format unique to the provider), and time spent negotiating intellectual property agreements (such as how to display credit and attribution information, how the data will look, etc). At the bottom right (1) are large, freely available data sources such as GenBank, GBIF, and Wikipedia. At the left (2) are small sources that require tools to make their content available. In the middle (3) are well-established data providers that can require considerable effort to incorporate into EOL, due to both IPR issues and idiosyncratic data structures. The dotted line is an arbitrary cutoff, above which the effort required to obtain content outweighs the value that content would bring to EOL.
content.png

The report recommends going after content in category 1 first. These are sources have massive amounts of data that are freely available, and relatively easy to import. These include PubMed, GenBank, Wikipedia, ITIS, Flickr, and GBIF. As I noted on the iSpecies blog, GenBank records often contain metadata about organismal distribution, habitats, and ecological associations, which could be harvested. There are communities on Flickr building photo libraries of organisms, often tagged with scientific name and geographic location (e.g., Field Guide: Birds of the World). Harvesting these sources will provide considerable initial content for EOL. Of course, not all sources in this category are of comparable quality. GenBank and PubMed are, publicly funded, curated archives of scientific research, Wikipedia and Flickr are not.

Category 2 is next, and this is where we need tools to enable smaller providers to manage their own content, and contribute to EOL at the same time. This content would be targeted by “LifeDesks” (similar to the scratchpads being developed at the Natural History Museum, London).

Content in category 3 may have high scientific value, but in the short term the effort involved in incorporating it may outweigh the value it brings.

It’s perhaps a glib phrase, but I’m reminded of genius of “and” versus the tyranny of “or”. Harvesting resources in category 1 is not an argument against also going after resources in category 2, it’s a question of priorities. In the same way, tools developed for category 2 providers may well facilitate acquiring content from category 3 sources.

Vetting

The issue of “vetting” generated much discussion during the review meeting. It became clear that this term can mean different things:

  1. data that is error free (”correct”).
  2. data provided by scientific sources (”scientifically authenticated”)
  3. data that has been verified by experts

No data source is without error, so EOL will inevitably include erroneous information. Currently the bulk of its data comprise distribution maps from GBIF, which are known to contain errors. For example, some 16% of legume records are incorrect (doi:10.1371/journal.pone.0001124.). The GBIF map below shows numerous, erroneous records of the North American channel catfish (Ictalurus punctatus) in China.

At the scale at which EOL operates (100’s of millions of items of information), manually vetting all information before it is displayed is not feasible, and indeed by displaying GBIF maps EOL tacitly acknowledges this.

Of course, EOL wants to be an authoritative resource (in other words, more than a simple mashup), hence, one of its biggest challenges is to develop methods to catch errors. Innovative methods of annotation will need to be developed. Human Computation (see also Luis von Ahn’s talk at Google) is one approach, recently used by Google’s Image Labeler to annotate web images. BIG will need to develop easy-to-use interfaces so that EOL users can annotate data and flag possible errors. These annotations should be publicly visible, so that users who take the trouble to make annotations get instant feedback, and other users can see which records are contested.

Summary

A project on the scale of EOL is bound to take some time to settle in, and initial expectations were never going to be met, hence the generally under whelmed reaction in the blogosphere (myself included). There is much to do, and the overall theme of the IAG report is that EOL needs more content, fast, and needs to tackle the issue of vetting in a way that will scale.

BioSynC Springing up at the Field Museum

Alta Buden
April 30th, 2008

March came in like a fifteeen degree Lion of Tsavo here at the Field Museum, Mark Westneat wasn’t affected because he was at the Assembling the Tree of Life Annual PI Meeting (AToL) from March 7-9 which happened to be in warm New Orleans, Louisiana. Besides carousing in the music-filled New Orleans air, Mark and Paddy gave presentations to the meeting on the EoL and BioSynC accompanied by much (two hours!) of productive discussion. Mark returned to Chicago to find that construction of the new Synthesis Center had been completed and on March 12 the staff moved in to their comfortable, well-lit, energy efficient new offices. Have I mentioned that we are now the greenest part of the Field Museum? Almost everything in the new space is recycled or energy efficient, down to our potato-starch eating utensils.

The new Synthesis Center has quickly become a place where people from all over the museum feel comfortable stopping by to say hello and talk about science. Audrey has found that if she leaves baked goods out on the counter, which she often does, they will be gone by the end of the day and our lounge is almost always filled during lunch time. We finished construction in time to debut it at Museum Members night (March 27-28th) which had a record high attendance of 10,687 guests in two nights. We had mild but constant traffic and screened the EOL demo video in our large conference room on a loop the entire time. We also encouraged people to explore the web site live on our new computer consoles in the lounge.

Moving on to April, from the 18th to the 20th, we hosted our first in-center Synthesis Meeting:

“MegaTree: Mega-Phylogeny Assembly by Literature-Mining and Grafting”

This method-focused meeting, led by Rick Ree (featured in the second picture) was designed to refine approaches for assembling large phylogenies (evolutionary trees) and to train students in their use. The goal was to assemble a “knowledge-based” phylogeny for vascular plants that synthesizes the information from various sources using a method that allows grafting of information from different sources while simultaneously keeping track of the origin of the information. Basically this crew has found a way to take the large number of smaller evolutionary trees that have been created for plants by different people using different computer programs and graft them all together so that they still make sense. This way they can begin to piece together an extremely detailed and large evolutionary tree (a megatree!) for plants.

Here is the crew all together:

picture-019.jpg

That’s Darolyn peeking out behind them.

Here they are in action:picture-008.jpgpicture-015.jpgpicture-023.jpg

(This meeting was not, as I had originally hoped, about a group of electric cars that through transformer-like powers could form a tree that would use solar power to charge their batteries, thus battling evil by eliminating parking lots and oil-dependence. sigh, you can’t have it all.)

Nomina 2 workshop succeeded

Alexey Shipunov
April 24th, 2008

EOL Bioinfomatics Team just finished the three-days ‘Nomina 2′ workshop. More than 15 developers from all around the world assembled together and, finally, created the Global Names Index — tool and database which holds most of living things’ names, authorized by name specialists (nomenclators).
Cathy Norton (BHL Project) kindly provided us with some photos:
img_0849.jpg img_0848.jpg img_0847.jpg img_0846.jpg img_0845.jpg img_0844.jpg

img_0843.jpg img_0842.jpg

and video:

Biodiversity of the Week! Giant Magic Camouflage lovers!

Alta Buden
April 14th, 2008

picture-18.png

Not an octopus, not a squid…what is a cuttlefish?

For one, it’s a cephalopod, that means “head foot” in Greek (☺), and in fact, all of the above are also cephalopods and are basically just giant mushy heads attached to various numbers of long suction-cup covered feet. Delicious.

These creatures, which also include the much stranger looking nautilus, are known for being the camouflage wizards of the world. If Gandalf was a sea creature, this would be it. Also, if you thought chameleons were cool, its time to get with it and realize who’s really got it going on. They are able to look like a nasty lumpy clump of brown seaweed one second and then be covered in moving iridescent rainbows the next. The species page had some excellent video of this happening. They also can squirt some seriously black ink at you if you scare them and have an attitude if that doesn’t work. Here is a video of a baby one hunting like a fiend:


The EOL species page for this particular cuttlefish was curated by scientist Roger Hanlon, who has been studying cuttlefish for almost three decades. His work was featured in a New York Times article in February. The article and accompanying video describe Hanlon’s theory on how their skin and brains work to help them change shape and color in seconds. Here is another short video of him explaining a bit about his work and about the magical camo abilities of cuttlefish in under three minutes:

The Australian Giant Cuttlefish, Sepia apama, is the largest cuttlefish on earth. That means that this gem is in fact, bigger than a bread box, but not by much; reaching just above ~20 inches long. They are world famous for of all things, their mating habits. Every year during the austral fall (May - July) in northern Spencer Gulf, northwest of Adelaide (they are found all along the southern coast of Australia normally), they gather to spawn by the hundreds. This is the only place on earth where cuttlefish have been observed to gather en mass to mate, and it has received great attention, starting out as an easy catch for fisherman and then morphing into a mecca for diving ecotourists. As a result of this the area is now mostly protected from overfishing and the cuttlefish are free to love and make wild colorchanging babies. (My advice, GO THERE! What better excuse to go to Australia! Here are some stories from tourists in the area.)

The species page features several awesome videos of them mating fighting, changing color and also laying their eggs, which look like gooey lychee nuts that they stick to the undersides of rocks.

A second reason to go check them out is that during this mating event they exhibit a fascinating example of what some eco-tourists like to call “crossdressing” something that was noticed by Hanlon during his observations of them there. This is the phenomena of small males using their camouflage to disguise themselves as females in order to sneak under the eyes of protective and violent larger males and sneakily mate with the females they are guarding. I like to call this “The Woody Allen” maneuver (brains outwits brawn and wins the girl) and it’s pretty genius. Also great fodder for pondering the different ways that organisms connive to pass on their genes in the struggle for survival.

Cuttlefish may sound familiar for a couple of reasons, their bones, cuttlebones (which are inside their heads and one of the things that distinguishes them from squid and octopi) are sold in pet stores to provide calcium and sharpen the beaks of domestic birds. The word “Sepia” comes from the Persian word for cuttlefish, artists used to use cuttlefish and squid ink to paint with and tone things, hence the setting on your digital camera that makes pictures look all old. People love to eat cuttlefish, two fun examples of this are in Italy where they make a dish called Risotto al Nero di Seppie, a tar-looking risotto dish made with cuttlefish ink, as well as dried cuttlefish which is a popular snack in Asia.

180px-surume39.JPG

Streetside snack of dried cuttlefish in Asia

Finally in the popular novel Twenty Thousand Leagues Under the Sea by Jules Verne, Captain Nemo and his companions battle with a group of giant cuttlefish. While the cuttlefish loose, they do manage to kill one of the crew members. Jules Vernes’ description of the cuttlefish in the book is not completely scientifically accurate, but here is a video of our friend Sepia apama attacking a scuba diver (not very scary, but will give you an idea of  Vernes imaginative capabilities):