Threatened Species News Dataset

Kuvaus

This data is part of the research article: Automated retrieval of information on threatened species from online sources using machine learning, Ritwik Kulkarni and Enrico Di Minin, 2021, Methods in Ecology and Evolution Kindly cite this article for the dataset. 1 Considering limited conservation resources, gathering and analyzing information from digital data sources can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human-nature interactions. 2. In this study, we introduce methods to automatically collect information on species threatened by wildlife trade from online news. An end to end pipeline is constructed that begins from searching and downloading news articles about species listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. Additional relevant information is then extracted for each article using a Named Entity Recognition model. 3. The data collected over a one month period included 15,088 articles and focused on 585 species listed in Appendix I of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location, and quantities of traded animals. A regularly updated database is generated by the system, which can be queried and analysed for various research purposes and to inform conservation decision-making. 4. The results demonstrate that natural language processing can be used in an efficient manner to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human-nature interactions in conservation science and practice.
Näytä enemmän

Julkaisuvuosi

2021

Aineiston tyyppi

Tekijät

Enrico Di Minin - Muu tekijä, Oikeuksienhaltija, Kuraattori, Tekijä

Ritwik Kulkarni - Muu tekijä, Oikeuksienhaltija, Kuraattori, Tekijä, Julkaisija

Projekti

Muut tiedot

Tieteenalat

Ympäristötiede

Kieli

englanti

Saatavuus

Avoin

Lisenssi

Creative Commons Nimeä EiKaupallinen JaaSamoin 4.0 Kansainvälinen (CC BY NC SA 4.0)

Avainsanat

conservation, machine learning, Natural language processing, CITES, Online News, threatened species

Asiasanat

luonnonsuojelu

Ajallinen kattavuus

undefined

Liittyvät aineistot