Whoa! How come EOL has more pages than species?
David PattersonThursday, January 21st, 2010
Last year, the Australian Biological Resources Study published a study by Arthur Chapman as to the ‘Number of living species in Australia and the world’ (http://www.environment.gov.au/biodiversity/abrs/publications/other/species-numbers/index.html). The revised estimate is 1,900,000. That we are still dealing with estimates reveals that the community of taxonomists has yet to compile a single list of all species. On January 7th this year, the EOL logs revealed that the number of pages that we deliver passed the 1,900,000 mark. Is this an ‘ Oops’ moment, is the estimate of the numbers of species is wrong, or is EOL getting its numbers wrong?
![]() |
| Olenellus getzi, a cambrian trilobite. Image by Bruce Liebermann, CC-BY license, from trilobites.lifedesks.org |
There are a number of reasons why we have more than 1,900,000 pages. For a starter, EOL prepares pages not only for living species, but also for extinct species. For example, there is a page for the thylacine, the wolf-like marsupial carnivore that went extinct in the 20th century (http://www.eol.org/pages/126716). EOL has recently added information from the ‘Trilobites Online Database’ (http:// trilobites.lifedesks.org). All trilobites are extinct, but we still gather information about those organisms. Pages are therefore waiting and become visible in a search or can be discovered when browsing if you use a classification that refers to trilobites – as a search on Olenellus will reveal. Estimates vary as to how many extinct species there are, but probably about 250,000 extinct species have been reported to date. Similarly, EOL has pages not only about species, but also about genera, families, orders and so on – which means that when complete, EOL should have about 2,500,000 pages.
That said, we will not simply converge on the two and a half million target soon, but will first grow to vastly exceed this number. The number of pages will slowly come down again to reach the target. The reason for this lies in how EOL collects information and displays it. EOL uses the names of species to gather information together. As the Global Names Index (at. www.globalnames.org) will attest, there are many more names than there are species (GNI knows of almost 20,000,000 names).One cause for the excess of names is a requirement to change a name when the classification of a species changes. Linnaeus created the foundations of contemporary biological classification in the 18th century. At that time, he and his ‘apostles’ only distinguished about 10,000 species, and placed them in about 1300 genera. The expansion to the current number of 1,900,000 species results mostly from the discovery of new species. Those species are now placed in several hundred thousand genera. The intervening 250 years have seen a massive expansion of our awareness of biological diversity. In this process, scientists have tried to refine and debate what species are as they learn more about the nature and evolution of them. They move species from one genus to another so that closely related species are grouped together. As the names of species contain one word for the species and another for the genus in which the species is placed, these moves create new names for the species. The yellow fever mosquito was described by Linnaeus as Culex aegypti, then became known as Aedes aegypti, and more recently was transferred to Stegomyia, to give it yet another name, Stegomyia aegypti. Although the species is unchanged, we have several names for it. Until we find out that the names refer to one species, EOL may have information for it from sources that still use an old name. Only when we are advised that the two names refer to the same species, do we know to bring content under both names together on the same page. Until then, we will have more pages than there are species.
Another reason for the additional pages is that species are not well-defined objects like ‘a car’ or ‘a computer’. Rather they are like the smoke from a snuffed candle, conversations, or clouds. They differ every time you encounter them. They are living and transforming things, changing as the evolutionary process that produces them wends its way through time and across the globe. Genetic experiments are always being made. The pressures on, and opportunities available to, species change such that what a species looks like varies as do the numbers of individuals we would assign to that species. Scientists define – as best they can –each evolving lineage and refer to the products of the evolutionary process as ‘species’. Because of the indefinite nature of species, different experts come up with different points of view. Some think we should treat Gorilla graueri (the lowland gorilla) as the same species as the eastern gorilla (Gorilla beringei). Others think they are different, and yet others want to represent them as different subspecies of the same species. To be able to accommodate all of these points of view, we need to have at least four pages, one for each possible species or subspecies – even though we might have, in the end, a single species.
![]() |
| Gorilla. Image by Brian Gatwicke, CC-BY license. |
This is not an isolated case, and is more the norm than an exception. Should we continue to treat the polar bear (Ursus maritimus) as different to Ursus arctos (the brown or grizzly bear) with which it can interbreed; and are zebras and horses members of the same species or not. Most biologists would theorize they should be treated as the same species because they can form hybrids. In contrast, most non-scientists will continue to think of the polar bear as a separate species because it looks different and has an unusual life-style. The different definitions of species are ‘concepts’. In order to show all information about all life, EOL prepares pages for each of these concepts, and again the number of pages grows to exceed the number of species.
![]() |
| To the left is a museum specimen of a polar bear / grizzly hybrid. Examples of hybrids are known from museum specimens, zoos, and in the wild. Image by Sarah Hartwell, CC BY license; from WikiMedia |
By far the biggest source of the extra pages comes from the different ways that people refer to species. We use names to distinguish each type of organism. Some names are scientific, are in Latin, and follow conventions that can be found in the codes of nomenclature. Others are common names. Even though scientific names are regulated by the codes of nomenclature, they can appear in many different forms. ‘Grevillea glauca’, ‘G. glauca’, ‘Grevillea glauca Banks & Sol. ex Knight’, Grevillea glauca Banks and Solander 1809 are all legitimate ways of writing out the scientific name for the Australian shrub that is used for boomerangs or to assist in hanging out the washing. Although biologists know that these names refer to the same species, a computer registers that the names are different and makes the assumption that they refer to different species. Until the computers are told differently, we will have pages for the information that is attached to each of the names. We expect there to be at least 100,000,000 different names and forms of names that have been used for the 2,000,000 species. As this is the biggest source of the extra pages, EOL works hard to build new software and asks for help from the expert community to ‘reconcile’ these alternative names. A measure of our progress is how well we bring the number down to match the 2 million or so species that we believe exists today.
A (modified) species page from EOL that contains information about an organism that has not been given a name (an “unidentified bacterium”, itself included within a ‘class’ called “environmental samples”.Recently EOL added information from GenBank – a place where people deposit molecular information. It is now possible to use molecular tools to explore nature. These new approaches are revealing species that seem to be new to science but cannot be identified. In the absence of formal names, those ‘species’ are listed under terms like “Uncultured bacterium HZ_056“ bioreactor sludge metagenome“ or even organisms that are just referred to as “unidentified”. We refer to those terms as ‘surrogates for names’ in the expectation that they will be given formal scientific names in due course. As molecular techniques become cheaper and more powerful, we expect to be deluged with information labeled with surrogate names. At this time, perhaps as much as 20% of the diversity known to EOL is in this form, and these also add significantly to the tally of pages in the site.
Fortunately, many of these problems are hidden behind the scenes. When someone arrives at EOL, they can navigate among species using a classification scheme. They will see only the pages that match up with names in the classification. Obsolete names, or names of species not recognized by the classification are hidden from view. But, if you want to see everything that is in there, the hidden pages can be found through searches after setting the preferences to eliminate filtering.The Encyclopedia of all Life has picked up the challenge of bringing together and organizing information about all life. We do this in order to realize the vision of Ed Wilson (and others) to have a page for every species on Earth. We have invented new ways of managing information about organisms – that is, we are ‘biodiversity informaticians’. We work with other biodiversity informaticians around the world. We call on the tools emerging from computer sciences and invent new tools to organize information so that it can be used better to inform and guide the decisions that we need to make about the future of our world. Most of the current tools are first generation, and they will improve as the discipline of biodiversity informatics grows and matures, and as the need for information becomes more urgent.
![]() |
| Professor E. O. Wilson, Harvard University. Image by Kevin Kelly, available under creative commons license (http://kk.org/ct2/new-media/). |
We at EOL rely on innovative computer programs as one of four ways to improve the management of information about biology. The programs and algorithms give us ‘scalability’. That is, they provide the means of working through billions of pieces of information located at thousands of sites and assigning them to the correct species. Our second area addresses the ‘many names for one species’ problem. We are hard at work developing new ways of grouping together alternative names for the same species. The goal of assigning the expected 100,000,000 names to 2 million groups – one for each species –increasingly needs to be done through an open on-line environment that will allow initiatives and experts world-wide to co-operate in improving names management. As an open environment, all of us, from the search engines to school teachers, will benefit as names management will ensure that we find more and miss less information about each species. Thirdly, we will provide web tools with which experts can improve the quality of classifications that are used to organize information and to navigate around sites like EOL. Experts can work together to make sure that nothing is missing, that classifications reflect current thinking, that obsolete names and ideas are properly labeled and hidden from view, and that the groups that include alternative names for the same species are correct. Finally, EOL is building a community of curators who can correct any errors that persist. Anyone with a sharp eye and a commitment to quality is welcome. Balanced progress on these four fronts positions EOL to produce a robust high quality web environment about all life on Earth within the 10-year schedule that we set ourselves.
D Patterson
Senior Taxonomist
Jan 21 2010
















The Encyclopedia of Life has been keenly interested in content management systems and social networking phenomena, especially relating to how well these might be of benefit to practicing taxonomists who are under pressure to get online. So, we have been getting serious about 



