*** Welcome to piglix ***

Automated species identification


Automated species identification is a method of making the expertise of taxonomists available to ecologists, parataxonomists and others via computers, mobile devices and other digital technology.

The automated identification of biological objects such as insects (individuals) and/or groups (e.g., species, guilds, characters) has been a dream among systematists for centuries. The goal of some of the first multivariate biometric methods was to address the perennial problem of group discrimination and inter-group characterization. Despite much preliminary work in the 1950s and '60s, progress in designing and implementing practical systems for fully automated object biological identification has proven frustratingly slow. As recently as 2004 Dan Janzen updated the dream for a new audience:

The spaceship lands. He steps out. He points it around. It says ‘friendly–unfriendly—edible–poisonous—safe– dangerous—living–inanimate’. On the next sweep it says ‘Quercus oleoides—Homo sapiens—Spondias mombin—Solanum nigrum—Crotalus durissus—Morpho peleides— serpentine’. This has been in my head since reading science fiction in ninth grade half a century ago.

Janzen’s preferred solution to this classic problem involved building machines to identify species from their DNA. His predicted budget and proposed research team is “US$1 million and five bright people.” However, recent developments in computer architectures, as well as innovations in software design, have placed the tools needed to realize Janzen’s vision in the hands of the systematics and computer science community not in several years hence, but now; and not just for creating DNA barcodes, but also for identification based on digital images.

A seminal survey published in 2004, studies why automated species identification had not become widely employed at this time and whether it would be a realistic option for the future. The authors found that "a small but growing number of studies sought to develop automated species identification systems based on morphological characters". An overview of 20 studies analyzing species’ structures, such as cells, pollen, wings, and genitalia, shows identification success rates between 40% and 100% on training sets with 1 to 72 species. However, they also identified four fundamental problems with these systems: (1) training sets—were too small (5-10 specimens per species) and their extension especially for rare species may be difficult, (2) errors in identification—are not sufficiently studied to handle them and to find systematics, (3) scaling—studies consider only small numbers of species (<200 species), and (4) novel species — systems are restricted to the species they have been trained for and will classify any novel observation as one of the known species.


...
Wikipedia

...