DAPlankton: a benchmark dataset for fine-grained domain adaptation
Kuvaus
The DAPlankton dataset consists of over 110k expert-labeled plankton images. The data is divided into two subsets: DAPlankton_LAB and DAPlankton_SEA. DAPlankton_LAB consists of images captured from multiple mono-specific phytoplankton cultures, which were analysed using three different imaging instruments: Imaging FlowCytoBot (IFCB), CytoSense (CS) flow cytometer, and FlowCam (FC) imaging microscope each producing cropped images with one plankton particle in each. An expert further verified the class of each image, ensuring that there was no cross contamination between different cultures. This process resulted in a balanced dataset with negligible label uncertainty. DAPlankton_SEA consists of images captured from water samples collected from the Baltic Sea using two different imaging instruments: IFCB and CS. Each image was manually labeled by an expert. DAPlankton_SEA provides a realistic and more challenging dataset with a large class imbalance and natural intra-class variance.
If you use this dataset in your research, we kindly ask that you reference the following paper:
D. Batrakhanov, T. Eerola, K. Kraft, L. Haraguchi, L. Lensu, S. Suikkanen, M.T. Camarena-Gomez, J. Seppälä H. Kälviäinen, DAPlankton: Benchmark Dataset for Multi-instrument Plankton Recognition via Fine-grained Domain Adaptation, arXiv, 2024.
**Data composition**
DAPlankton_LAB contains, in total, 47 471 images from 15 phytoplankton species and 3 different domains (imaging instruments). The number of images per class-domain combination varies between 286 and 2618. The list of classes (species) is as follows:
- Aphanizomenon flosaquae
- Apocalathium malmogiense
- Chrysotila roscoffensis
- Diatoma tenuis
- Gymnodinium corollarium
- Kryptoperidium foliaceum
- Levanderina fissa
- Melosira arctica
- Nephroselmis pyriformis
- Peridiniella catenata
- Pseudopedinella sp.
- Rhinomonas nottbecki
- Rhodomonas salina
- Teleaulax acuta
- Tetraselmis sp.
DAPlankton_SEA contains, in total, 64 453 images from 31 plankton classes and 2 different domains. The number of images per class-domain combination varies between 5 and 12 280. The list of classes is as follows:
- Aphanizomenon flosaquae
- Centrales sp
- Chaetoceros sp
- Chaetoceros sp (single)
- Chlorococcales
- Chroococcales
- Ciliata
- Cryptomonadales
- Cryptophyceae Teleaulax
- Cyclotella choctawhatcheeana
- Dinophyceae
- Dinophysis acuminata
- Dolichospermum Anabaenopsis
- Dolichospermum Anabaenopsis (coiled)
- Euglenophyceae
- Eutreptiella sp
- Gymnodiniales
- Gymnodinium like
- Heterocapsa rotundata
- Heterocapsa triquetra
- Heterocyte
- Katablepharis remigera
- Mesodinium rubrum
- Monoraphidium contortum
- Nitzschia paleacea
- Nodularia spumigena
- Oocystis sp
- Pseudopedinella sp.
- Pyramimonas sp.
- Skeletonema marinoi
- Snowella Woronichinia
Näytä enemmänJulkaisuvuosi
2024
Aineiston tyyppi
Tekijät
Instituto Español de Oceanografia
Maria Teresa Camarena-Gomez - Muu tekijä, Tekijä
Daniel Batrakhanov - Muu tekijä, Tekijä
Heikki Kälviäinen - Muu tekijä
Lasse Lensu - Muu tekijä
Jukka Seppälä - Muu tekijä
Kaisa Kraft - Muu tekijä, Tekijä
Lumi Haraguchi - Muu tekijä, Tekijä
Sanna Suikkanen - Muu tekijä
Projekti
Muut tiedot
Tieteenalat
Tietojenkäsittely ja informaatiotieteet; Ympäristötiede
Kieli
Saatavuus
Avoin