Archive for the ‘Biodiversity Informatics’ Category

Whoa! How come EOL has more pages than species?

David Patterson
Thursday, January 21st, 2010

Last year, the Australian Biological Resources Study published a study by Arthur Chapman as to the ‘Number of living species in Australia and the world’ (http://www.environment.gov.au/biodiversity/abrs/publications/other/species-numbers/index.html). The revised estimate is 1,900,000. That we are still dealing with estimates reveals that the community of taxonomists has yet to compile a single list of all species. On January 7th this year, the EOL logs revealed that the number of pages that we deliver passed the 1,900,000 mark. Is this an ‘ Oops’ moment, is the estimate of the numbers of species is wrong, or is EOL getting its numbers wrong?

Olenellus getzi, a cambrian trilobite.  Image by Bruce Liebermann, CC-BY license, from trilobites.lifedesks.org
Olenellus getzi, a cambrian trilobite. Image by Bruce Liebermann, CC-BY license, from trilobites.lifedesks.org

There are a number of reasons why we have more than 1,900,000 pages. For a starter, EOL prepares pages not only for living species, but also for extinct species. For example, there is a page for the thylacine, the wolf-like marsupial carnivore that went extinct in the 20th century (http://www.eol.org/pages/126716). EOL has recently added information from the ‘Trilobites Online Database’ (http:// trilobites.lifedesks.org). All trilobites are extinct, but we still gather information about those organisms. Pages are therefore waiting and become visible in a search or can be discovered when browsing if you use a classification that refers to trilobites – as a search on Olenellus will reveal. Estimates vary as to how many extinct species there are, but probably about 250,000 extinct species have been reported to date. Similarly, EOL has pages not only about species, but also about genera, families, orders and so on – which means that when complete, EOL should have about 2,500,000 pages.

That said, we will not simply converge on the two and a half million target soon, but will first grow to vastly exceed this number. The number of pages will slowly come down again to reach the target. The reason for this lies in how EOL collects information and displays it. EOL uses the names of species to gather information together. As the Global Names Index (at. www.globalnames.org) will attest, there are many more names than there are species (GNI knows of almost 20,000,000 names).One cause for the excess of names is a requirement to change a name when the classification of a species changes. Linnaeus created the foundations of contemporary biological classification in the 18th century. At that time, he and his ‘apostles’ only distinguished about 10,000 species, and placed them in about 1300 genera. The expansion to the current number of 1,900,000 species results mostly from the discovery of new species. Those species are now placed in several hundred thousand genera. The intervening 250 years have seen a massive expansion of our awareness of biological diversity. In this process, scientists have tried to refine and debate what species are as they learn more about the nature and evolution of them. They move species from one genus to another so that closely related species are grouped together. As the names of species contain one word for the species and another for the genus in which the species is placed, these moves create new names for the species. The yellow fever mosquito was described by Linnaeus as Culex aegypti, then became known as Aedes aegypti, and more recently was transferred to Stegomyia, to give it yet another name, Stegomyia aegypti. Although the species is unchanged, we have several names for it. Until we find out that the names refer to one species, EOL may have information for it from sources that still use an old name. Only when we are advised that the two names refer to the same species, do we know to bring content under both names together on the same page. Until then, we will have more pages than there are species.

aegypti2.jpg aegypti_description4001.jpgLinnaeus’ original description (above) is: Culex aegypti with white articulations. The size of the common gnat. Color grey from dusky (tawny shading into grey). Legs grey with white rings, small ones about (around) the articulations and in the joints. White spots on the edge of the back on the body, beneath the wings on each side, several of them, placed longitudinally. One white ring at the base of the thorax between it and the body. A white perpendicular line near the eyes, on each side a single small one. Place: Egypt, rarer than the common gnat.The image of Stegomyia aegypti is by Goeldi, and is out of copyright, image of Linnaeus description is out of copyright.

Another reason for the additional pages is that species are not well-defined objects like ‘a car’ or ‘a computer’. Rather they are like the smoke from a snuffed candle, conversations, or clouds. They differ every time you encounter them. They are living and transforming things, changing as the evolutionary process that produces them wends its way through time and across the globe. Genetic experiments are always being made. The pressures on, and opportunities available to, species change such that what a species looks like varies as do the numbers of individuals we would assign to that species. Scientists define – as best they can –each evolving lineage and refer to the products of the evolutionary process as ‘species’. Because of the indefinite nature of species, different experts come up with different points of view. Some think we should treat Gorilla graueri (the lowland gorilla) as the same species as the eastern gorilla (Gorilla beringei). Others think they are different, and yet others want to represent them as different subspecies of the same species. To be able to accommodate all of these points of view, we need to have at least four pages, one for each possible species or subspecies – even though we might have, in the end, a single species.

gorilla2.jpg
Gorilla. Image by Brian Gatwicke, CC-BY license.

This is not an isolated case, and is more the norm than an exception. Should we continue to treat the polar bear (Ursus maritimus) as different to Ursus arctos (the brown or grizzly bear) with which it can interbreed; and are zebras and horses members of the same species or not. Most biologists would theorize they should be treated as the same species because they can form hybrids. In contrast, most non-scientists will continue to think of the polar bear as a separate species because it looks different and has an unusual life-style. The different definitions of species are ‘concepts’. In order to show all information about all life, EOL prepares pages for each of these concepts, and again the number of pages grows to exceed the number of species.

bears2.jpg
To the left is a museum specimen of a polar bear / grizzly hybrid. Examples of hybrids are known from museum specimens, zoos, and in the wild. Image by Sarah Hartwell, CC BY license; from WikiMedia

By far the biggest source of the extra pages comes from the different ways that people refer to species. We use names to distinguish each type of organism. Some names are scientific, are in Latin, and follow conventions that can be found in the codes of nomenclature. Others are common names. Even though scientific names are regulated by the codes of nomenclature, they can appear in many different forms. ‘Grevillea glauca’, ‘G. glauca’, ‘Grevillea glauca Banks & Sol. ex Knight’, Grevillea glauca Banks and Solander 1809 are all legitimate ways of writing out the scientific name for the Australian shrub that is used for boomerangs or to assist in hanging out the washing. Although biologists know that these names refer to the same species, a computer registers that the names are different and makes the assumption that they refer to different species. Until the computers are told differently, we will have pages for the information that is attached to each of the names. We expect there to be at least 100,000,000 different names and forms of names that have been used for the 2,000,000 species. As this is the biggest source of the extra pages, EOL works hard to build new software and asks for help from the expert community to ‘reconcile’ these alternative names. A measure of our progress is how well we bring the number down to match the 2 million or so species that we believe exists today.

grevillea2.jpg
Grevillea glauca, from Joseph Banks’ ‘journal’ of his 1768-1771 trip on HMS Endeavour that, under the command of James Cook, visited Australia. The fruits give the plant its common name “Bushmans clothes-peg”. Image out of copyright.

A (modified) species page from EOL that contains information about an organism that has not been given a name (an “unidentified bacterium”, itself included within a ‘class’ called “environmental samples”.Recently EOL added information from GenBank – a place where people deposit molecular information. It is now possible to use molecular tools to explore nature. These new approaches are revealing species that seem to be new to science but cannot be identified. In the absence of formal names, those ‘species’ are listed under terms like “Uncultured bacterium HZ_056“ bioreactor sludge metagenome“ or even organisms that are just referred to as “unidentified”. We refer to those terms as ‘surrogates for names’ in the expectation that they will be given formal scientific names in due course. As molecular techniques become cheaper and more powerful, we expect to be deluged with information labeled with surrogate names. At this time, perhaps as much as 20% of the diversity known to EOL is in this form, and these also add significantly to the tally of pages in the site.

eol_page1.jpg
A (modified) species page from EOL that contains information about an organism that has not been given a name (an “unidentified bacterium”, itself included within a ‘class’ called “environmental samples”. Original.

Fortunately, many of these problems are hidden behind the scenes. When someone arrives at EOL, they can navigate among species using a classification scheme. They will see only the pages that match up with names in the classification. Obsolete names, or names of species not recognized by the classification are hidden from view. But, if you want to see everything that is in there, the hidden pages can be found through searches after setting the preferences to eliminate filtering.The Encyclopedia of all Life has picked up the challenge of bringing together and organizing information about all life. We do this in order to realize the vision of Ed Wilson (and others) to have a page for every species on Earth. We have invented new ways of managing information about organisms – that is, we are ‘biodiversity informaticians’. We work with other biodiversity informaticians around the world. We call on the tools emerging from computer sciences and invent new tools to organize information so that it can be used better to inform and guide the decisions that we need to make about the future of our world. Most of the current tools are first generation, and they will improve as the discipline of biodiversity informatics grows and matures, and as the need for information becomes more urgent.

eow.jpg
Professor E. O. Wilson, Harvard University. Image by Kevin Kelly, available under creative commons license (http://kk.org/ct2/new-media/).

We at EOL rely on innovative computer programs as one of four ways to improve the management of information about biology. The programs and algorithms give us ‘scalability’. That is, they provide the means of working through billions of pieces of information located at thousands of sites and assigning them to the correct species. Our second area addresses the ‘many names for one species’ problem. We are hard at work developing new ways of grouping together alternative names for the same species. The goal of assigning the expected 100,000,000 names to 2 million groups – one for each species –increasingly needs to be done through an open on-line environment that will allow initiatives and experts world-wide to co-operate in improving names management. As an open environment, all of us, from the search engines to school teachers, will benefit as names management will ensure that we find more and miss less information about each species. Thirdly, we will provide web tools with which experts can improve the quality of classifications that are used to organize information and to navigate around sites like EOL. Experts can work together to make sure that nothing is missing, that classifications reflect current thinking, that obsolete names and ideas are properly labeled and hidden from view, and that the groups that include alternative names for the same species are correct. Finally, EOL is building a community of curators who can correct any errors that persist. Anyone with a sharp eye and a commitment to quality is welcome. Balanced progress on these four fronts positions EOL to produce a robust high quality web environment about all life on Earth within the 10-year schedule that we set ourselves.

D Patterson

Senior Taxonomist

Jan 21 2010

Increasing undergraduate student authorship in EOL

Audrey Aronowsky
Friday, November 6th, 2009

Group photo

October 22-24, the Biodiversity Synthesis Center hosted an EOL workshop in conjunction with EOL partner Animal Diversity Web. The workshop goals were to broaden the reach and increase the effectiveness of university students authoring species accounts. A diverse group of 22 participants from four countries assembled to present different programs that involve student authorship and gathering of species data. The workshop was an effective mix of presentations, group discussions, and breakout groups. In the presentations, we heard about the successful methods of the ADW program and template, diverse ways that faculty implement the ADW template into their courses, methods and strategy of the complementary AmphibiaWeb authoring process, and other projects around the world that involve students of all ages in the collection of species data (including iSpot, High Tech High’s bay surveys, INBio, marine surveys in the Cape Verde islands, and others). The workshop will yield many important products, including instructor workflow models for integrating student authorship of species accounts into different types of courses, a best practices document for instructors and students, a 2010 RCN proposal to NSF, new content and curators for EOL, and increased use of the ADW species account template in the US and abroad.

The workshop was a first for EOL, co-sponsored by the Synthesis, Learning and Education, and Species Pages Groups, and can only be viewed as a great success.  Thanks to all who participated in 3 days of lively and highly productive discussion. Special thanks to meeting organizers Tricia Jones and Tanya Dewey!

e-Biosphere: a closing report

Audrey Aronowsky
Wednesday, June 3rd, 2009

The evening reception at the Natural History Museum

As e-Biosphere winds down, I wanted to report back on what happened, what I thought was interesting, and what my take-aways were.  The meeting had 483 participants from all over the world (42 countries!).

The set up and program were excellent, with a good diversity of speakers and perspectives.  Among my favorites were Jorge Soberon’s presentation on the need for integration across subdisciplines and fields and at varying scales. Nancy Knowlton’s presentation questioning whether we need to name biodiversity in order to study it and learn from it was probably the most controversial and thought-provoking. And Sandra Knapp’s presentation documenting the motivations and paths that the Solenum project used was informative and a great case study for how to effectively stimulate participation from a broad and diverse group of specialists.

The afternoon breakout session topics were broad and yet, interestingly, produced many common themes.  Group topics included ecology, cybertaxonomy, standards, developing countries, education, and conservation.  Common themes included a need for better standards, better metadata, simplification of contributions and involvement, and better templates. Also discussed were the importance of team building and collaboration, the key role of ecosystem services, and maintaining open access to information.

The main things that I will take away from the meeting are that there have been great advances in biodiversity informatics in the last 20 years, but there is still a long way to go. The field needs to articulate its goals better and communicate these goals with the public and policy makers in a more effective manner. Data sharing and collaboration are critically important to everyone, and major projects like EOL and GBIF need to take the lead in urging standards, creating templates and tools, and outreach.  We have a long way to go, but a great role to play.

Special thanks to the staff at the QE2 conference centre.  The facility is beautiful; bright and airy with stunning views of Westminster Abbey. Also special thanks to the organizing committee for bringing together a great group of speakers and participants.  I hope this becomes an annual thing!  A final thanks to NHM for hosting a lovely evening event (pictured above). Without them, I would not have been able to take the great photo of Jim Hanken and Sir Richard Owen below…

Jim Hanken and Richard Owen

Share your videos with EOL

Peter Mangiafico
Monday, February 16th, 2009

We are excited to announce that we’re now indexing videos as well as images from Flickr.  Videos uploaded to the EOL group in Flickr and tagged with a species name will now be featured in EOL species pages.  Visit the Honeybee (Apis mellifera) page to see some recent video additions by Arthur Chapman and Valter Jacinto . Since the group began less than 6 months ago contributors have submitted over 13,000 photos and now over 200 videos which are shown in EOL species pages. Follow the instructions on our group homepage and learn how to submit and tag your photos and videos. We encourage everyone to check out the EOL Flickr group and start submitting photos and videos today!

The EOL.org codebase is now open source

Peter Mangiafico
Thursday, February 12th, 2009

After a few weeks of preparation, we are now ready to release the code that runs the www.eol.org website into the open source community.  The code is written in Ruby on Rails and is released under the MIT License.  There are many moving parts to the system, including code needed for data indexing, harvesting, names finding algorithms, data model creation, as well as all the code used to generate the actual web pages.   This release includes all of the Ruby code used to generate the webpages for the www.eol.org website, including the full data model and includes all the code to set up your own miniature version of eol.org running on your Mac (or PC…).  All that is missing is all of the wonderful data, but the code comes with enough sample data, called “fixtures”, to get you going!  Some other parts of the system written in PHP (such as indexing and names finding algorithms) will be released separately.

To get started, check out the project homepages on Google Code or GitHub.

The Encyclopedia of Life, updated…

Peter Mangiafico
Wednesday, January 28th, 2009

Hi Folks,

As some of you may or may not have noticed, without any fanfare, we released a substantial update to the Encyclopedia of Life website on January 5, 2009.  We are calling it version 2, and in fact, if you look at the footer of the www.eol.org pages, you will now see our internal version number right there (today at 2.0.11).

So what’s new, you ask?  Well, a whole bunch actually:

  1. You can now tag images and then search for species associated with those images (e.g. “blue”, “marine”).  You can tag any image you want and once several people tag an image in the same way, it is promoted to a “public” tag viewable by all.
  2. You can comment on species, images or blocks of text.  Content providers will then be able to see your comments to help improve their material.
  3. You can add your own images to the EOL Flickr Group that, when tagged correctly, will automatically appear in EOL species pages.
  4. We now distinguish between “trusted” and “unknown” content with clear messages and colors and will soon invite curators to help indicate the trust level of content.
  5. And of course, many new species pages and more content, adding to the overall richness of the experience.

For a more complete listing of what’s been added, along with some screen-shots, head on over to the “What’s New?” page.  We’ve already begun working on the next set of features to add, with some great additions like public APIs high on the list.  The real fun will begin once the APIs are released and we begin seeing mashups between EOL and other projects - we can’t wait.

Thanks and enjoy,

Peter

Flickr, meet EOL

Cyndy Parr
Sunday, September 14th, 2008

We’ve opened up another way for everyone to help build the Encyclopedia of Life.

You may have noticed we still need lots of pictures. You can now now share your best photos and videos of organisms with us by adding them to a Flickr group. For more details, see the description and instructions on the Encyclopedia of Life Images group page. Even if you don’t have your own images to share, you can help add “machine tags” with the species identifications to those that don’t yet have them — these will help us display them on the right pages.

This won’t be the only way to contribute, but many of us already love Flickr, and we hope others will want to give it a try.   The images should start showing up on the site later this year. There already are more than a thousand from the first few enthusiastic group members, including these striking examples (all are CC-licensed; photo credits to Jeff Whitlock, Sarsifer, Valter Jacinto).

Eastern Screech Owl (Otus asio)Sabella spallanzaniiCalêndula // Pot Marigold (Calendula officinalis)

Drupal Taxonomy Code Sprint Redux

David Shorthouse
Thursday, September 11th, 2008

The EOL-sponsored Drupal Sprint is in the final hour and there has been some immense success. The crux of this 4-day event was to attach metadata to terms and to enable relationships among terms. This sort of background work means that in the context of biology, we have a mechanism to store information about whether or not a term (i.e. a scientific name) is a synonym of another, whether a species has a relationship with another species (e.g. is a parasite of, is a predator of, etc.). There were some contributed modules that started this whole process. These can be found on the Taxonomy Drupal Groups website for the sprint.

drupalers-007.jpg

Left to Right:

Simon Rycroft, Nathaniel Catchpole, Anthony Goddard, Lisa Walley, Roger Espinosa, Matthias Hutterer, Cyndy Parr, Dan Morrison, Chacha Sikes, David Shorthouse, Benjamin Doherty, Vitthal Kudal, Alexey Shipunov, Ben Melancon (missing: Vince Smith)

A big thanks goes out to the BioSync team at the Field Museum in Chicago who orchestrated much of the organization. Without their assistance and facility, the sprint would most certainly have been difficult to pull off.

Code Sprint!

David Shorthouse
Thursday, July 17th, 2008

UPDATE! (July 31, 2008): The code sprint will now take place September 8-11, 2008

DrupalThe Encyclopedia of Life has been keenly interested in content management systems and social networking phenomena, especially relating to how well these might be of benefit to practicing taxonomists who are under pressure to get online. So, we have been getting serious about Drupal and want to make a stab at hosting sites called “LifeDesks” that as a start will focus on particular groups of organisms and will be similar to Scratchpads, a most amazing collection of Drupal-based sites hosted at the Natural History Museum in London, England. LifeDesks would sit just to the side of EOL, but will have the advantage of providing some extra, distinct visibility for participants while still feeling part of the EOL dream. A bit of work needs to be done on Drupal to make this work and we’re interested in sharing developments with the wider Drupal community.

So, although it’s short notice, we’re going to host a Drupal code sprint August 11-14 in Chicago, Illinois to kick-off our relationship with Drupal. Please visit http://sprint.eol.org to see what we have in store and please also spread the word to any Drupal developers you know.

New Members of the Informatics Team

David Shorthouse
Friday, May 30th, 2008

Jonathan Clapp
Software Developer

clapp.jpg
I grew up on Cape Cod and join the EOL Informatics Group as a software developer. My experience is in database and web application development.
I will help ensure that the foundations of the Encyclopedia of Life are as solid as possible while allowing flexibility in the future. I have followed this important project for some time and am thrilled to be contributing to its success. It has the potential to be a great resource for learning about and advancing the conservation of the myriad species on earth.

Vitthal Kudal
Software Developer

vitthal.jpg
I am working as a Software Developer with the informatics group of Encyclopedia of Life. I have Master’s degree in Computer Science from University of Pune, INDIA. I have worked for NCL center for Biodiversity Informatics (NCBI) as a Project Assistant. My dream is to make all species data available over the internet on a single command of user and which is going to fulfill by working with the EOL Group. What it is has been like working at EOL? Interesting, inspiring, insightful, impactful, fun and more such words.

Jeremy Rice
Software Developer

rice.jpg
Working with the Encyclopedia of Life is the realization a long-time dream for me. I love being at a university, advancing research. The Encyclopedia’s vision to synthesize all information about life present on Earth… that makes it something really essential for me. I’m joining the team with over ten years of experience developing a variety of applications. What really drives me is turning abstract ideas into working products that people appreciate. There’s an abundance of ideas here, and I hope that we can produce some amazing tools to facilitate them…

Dimitry Mozzherin
Software Developer

dimitry.jpg
I was born in Russia, and from my early school years I wanted to become a biologist and a wild life photographer. I happened to become both later. At some point I started to learn programming languages and after discovering Open Source movement I decided to make programming my profession. And I am now at EOL because here I can express my passion for wild life and passion for development Free Software at the same time!

Anne Thessen
Post-doctoral Investigator

thessen.jpg
I’m working on data mobilization for EOL and the International Census of Marine Microbes. Lots of biological data can be found on the printed page, which must be read to retrieve information. I’m trying to find ways to make this information easier to retrieve and use. Prior to joining the EOL team, I worked on Arctic primary production, toxin-producing diatoms and shellfish grazing.