Progress of Colin’s 2013 Rubenstein Project (Ⅰ)
As a 2013 EOL Rubenstein Fellow, my project is “Using the multiple classifications harvested by EOL for analysis to obtain the degree of coverage and congruence among hierarchies and nomenclatures.” As we know, many taxonomic hierarchies have been created due to:
a. Taxonomists have different views on biological classification
b. New technologies like phylogenetics create new perspectives on taxa relationships.
c. Biological classification is a job proceeding with time, and the old classification should be updated when new groups of species or specimens are found.
d. The scopes and coverage of different classifications are usually geographically restricted.
Analysis on these hierarchies (or taxonomic trees) to find out the congruence and incongruence is interesting, and required both by researchers on biodiversity informatics and taxonomy according to our first stage requirement analysis. Considering the long term usage and the requirement of potential users, we plan to implement a mature tool Taxonomic Hierarchy Comparator (THC) for managing and comparing different taxonomic hierarchies.
Potential Users of THC:
1. Researchers on Biodiversity Informatics who want to know where the differences are among the hierarchies, what biological group is hot, and where the gap to be filled is.
2. Taxonomists who want to find out the incongruence between their taxonomic views and others’. They can analysis their own hierarchies against with EOL hierarchies.
Main Functional Requirement
1. User management
User should have an account before using THC. That will help them to create and manage their own taxonomic hierarchies and keep analysis experiment result permanently for reuse. User management is required to manage user account including registration, authorization, log in/out, and account update.
2. Hierarchy management
a. Create hierarchy: hierarchy can be created from different methods. User can upload an Xml in BSBC format or DWCA file, or copy a hierarchy from shared hierarchies’ pool. EOL hierarchies will be imported to THC by web API provided by EOL.
b. Edit hierarchy: a simple editor for user to modify the hierarchies. It will help user to edit scientific names, change position of taxon, insert new taxon, and delete taxon.
c. Share hierarchy: user can share his own hierarchies to others for analysis, but others cannot modify the shared hierarchies.
d. Export hierarchy: help users to save their hierarchy as standard DWCA file or BSBC xml.
3. Analysis experiment
a. Create experiment: give an identifier and some descriptions for the experiment; select which hierarchies for analysis. THC will keep the analysis result.
b. Share experiment: result of analysis can be shared with other users
c. Implementation: submit the analysis task to server, and waiting for response message. It is a time consuming process, so task queue is required to deal with multiple analysis tasks. Analysis is based on algorithm proposed by us.
d. Visualization: it is an important function for expressing analysis result. It should show where the congruence and incongruence explicitly is.
e. Computation: base on the result, “intersection” computation between two hierarchies is to extract the common part and “difference” is to produce the incongruence part.
Some use cases have been designed to show how users or manager engage THC and how the requirements are realized in the context of software. Fig1 is the use case of experiment.
Fig1 Use Case of Experiment
Another progress is a paper about the method and algorithm is being written, and we hope it can be published in a relative journal in next stage.
More progresses are about database design and interface design and will be posted at the end of July.
This article is excerpted from “requirement analysis of THC”
Author: Colin, 2013 EOL Rubenstein Fellow