Biodiversity-DL: Create a training set on a particular group of living organisms for machine learning applications: any developer or data scientist working on a citizen observatory will be able to train an Artificial Intelligence (AI) model to identify a particular group of species very easily.

Service description:

Service that allows users to create a training set on a particular group of living organisms on-demand, i.e. allowing them to solicit specific data, such as images of specific species and/or specific platforms, or images with a sufficient quality of expert validation. 

Example: A data scientist or a developer who wants to train an artificial intelligence model on a particular group of species using pytorch software will be able to do it very easily.

Development & functioning :

To carry out this service will be necessary to:  

  • Ensure a common species dictionary across projects. This will take a global backbone dictionary such as that provided by GBIF or Catalogue of Life and ensure that regional dictionaries such as the UK Species Inventory work together to create a single dictionary that can be used by all participants and global repositories.
  • Data access services with quality assessment. This will enable training of different taxonomic domains. Not only does the very basic observation data needs to be supplied to the AI as a web service, it also needs information on the confidence of the identification. On-demand training set creation service. To train AI models on potentially any group or set of living species, the aggregation of verified data from several platforms is necessary.
  • AI performance indicators by measuring which species the AI becomes better at identifying and which species it still struggles with. Besides, this is vital to assess user behaviour i.e. are they simply choosing the most likely species offered by the AI even if other users later show this to be the wrong ID. It is also important to know the effect of the AI on the learning cycle and the reputation system on the platform.

Innovation for citizen observatories:

The service will allow for the aggregation of massive sets of image data related to large groups of species, using the APIs of several platforms and citizen observatories, in order to facilitate learning by efficient AI models for automated identification. 

There is nothing similar available. 

Questions & answers:

  • Who is this service meant for? Do I need a technical background?

Developers or data scientists with some background in data engineering and Python language.

  • Can I choose which source I want to extract the training list from, e.g. iNaturalist?

Yes, you can specify one or several GBIF data publishers.

  • Can I integrate this service into existing apps?

Yes, you can integrate Biodiversity-DL into your own workflow (under the MIT    open-source licence).

  • Can I create lists of both plant and animal species?

Yes, you can create both animal and plant species lists.

  • Can I create as many lists as I want?

Yes, there is no limit. You can create as many training sets as you need.

We are working on this service, it will be available soon!


Artificial Intelligence, AI, training data, data quality, species identification, biodiversity data. 


Help us to co-create and test the new generation of services for citizen observatories!