This data is part of the research article: Automated retrieval of information on threatened species from online sources using machine learning, Ritwik Kulkarni and Enrico Di Minin, 2021, Methods in Ecology and Evolution
Kindly cite this article for the dataset.
1 Considering limited conservation resources, gathering and analyzing information from digital data sources can help investigate the global biodiversity crisis in a cost-efficient manner. Development and application of methods for automated content analysis of digital data sources are especially important in the context of investigating human-nature interactions.
2. In this study, we introduce methods to automatically collect information on species threatened by wildlife trade from online news. An end to end pipeline is constructed that begins from searching and downloading news articles about species listed in Appendix I of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and proceeds with implementing natural language processing and machine learning methods to filter and retain only relevant articles. Additional relevant information is then extracted for each article using a Named Entity Recognition model.
3. The data collected over a one month period included 15,088 articles and focused on 585 species listed in Appendix I of CITES. The accuracy of the neural network to detect relevant articles was 95.91% while the Named Entity recognition model helped extract information on prices, location, and quantities of traded animals. A regularly updated database is generated by the system, which can be queried and analysed for various research purposes and to inform conservation decision-making.
4. The results demonstrate that natural language processing can be used in an efficient manner to extract information from digital text content. The proposed methods can be applied to multiple digital data platforms at the same time and used to investigate human-nature interactions in conservation science and practice.
LisenssiCreative Commons Nimeä EiKaupallinen JaaSamoin 4.0 Kansainvälinen (CC BY NC SA 4.0)
Avainsanatconservation, machine learning, Natural language processing, CITES, Online News, threatened species