If you’ve been watching the EOL homepage closely over the last few weeks, you’ve probably wondered why the number of EOL “pages” went down from 750,000 to around 700,000 before suddenly going up to near 920,000. I’d like to give you some insight into what this number means and how it is calculated.
EOL has a lot of names (taxa) in it - well over 2 million of them. Each name (taxon) has a corresponding page in EOL. But just because we have a page for a taxon does not mean that there is any “content” associated with it. So rather than simply report on the total number of taxon pages in EOL, we report on the number of them that have at least one image, description (from a partner or user), video or sound.
A perfect storm of changes in how we report this number, combined with the arrival of a very large new content partner (Tropicos), resulted in the swings you noted in the number of pages on EOL. At the time I wrote this post, that number was 921,327 “pages with content”.
Normally, adding a new content partner does not have the sort of impact on our “page count” that the arrival of Tropicos did. Typically a new content partner makes many existing pages richer (a statistic you can see on the Updates tab of any taxon page) while also bringing content for pages that used to have none. In the case of Tropicos, the import added images and text to a lot of pages for which we had no content before.
On any given day, the EOL update process may result in an increase the number of pages (if new content is added by a partner), a decrease in the number (if content is removed by a partner or EOL gets enough data to merge previously separate pages) or it might have no impact. We also often make changes to the algorithms we use to compare taxonomic information we get from one partner with the rest of the taxonomic information EOL holds in our effort to reach a one-to-one correlation between EOL pages and distinct taxa.
By means of an example, EOL pages like this one were properly cleaned up recently - an action representative of our effeort to make sure pages representing surrogate taxa (Aus sp. for example) are not confused with pages for genera. Also duplicate pages like this one were properly merged with equivalent pages to reduce duplication.
Patrick Leary at MBL summarized this challenge well when he said:
While we strive to have EOL pages correspond one-to-one with taxa this does not always happens for reasons of homonymy, synonymy and general failure of our algorithm to merge pages for the same taxon.
The EOL team is constantly trying to make EOL the best online resource for biodiversity information and to ensure EOL has one and only one page for every taxon.
We are constantly bringing in new information and improving our data management tools to give our users the best possible experience we can. As a result EOL is a living index of content which is constantly changing, and we hope it is constantly improving to the benefit of our users.