Yle multimodal media and machine translation dataset

Kuvaus

This dataset contains browse-quality video files, accompanied by parallel multilingual subtitles and program metadata such as production years, genre classifications and topical segmentation timecodes from Yle production systems for 113 news, current affairs and factual programs. This dataset is split into these subtitle language pairs: FIN-ENG, FIN-SWE, SWE-ENG with some additional content to demonstrate typical professional media products such as news broadcasts. This dataset contains 59,95 hours of media in total. --- Yle has released three datasets with an experimental license for a limited amount of time to support the development of language and media related technologies. These datasets were originally created by the MeMAD research and innovation project, a collaboration between media industry members and research groups. The MeMAD project received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 780069. LICENSE INFORMATION: The data is available for research purposes upon specific request from Yle. The party requesting the data has to be located in Finland to gain access to the data (but your other project partners do not need to be). Please see the website at https://developer.yle.fi/en/data/avdata/index.html for more detailed terms and conditions. Requests can be made until the end of year 2022 by submitting the form available via the website.

Näytä enemmän

Julkaisuvuosi

2022

Aineiston tyyppi

Tekijät

Yleisradio Oy

Lauri Saarikoski - Kuraattori

Tuomas Nolvi - Kuraattori

Projekti

Muut tiedot

Tieteenalat

Kielitieteet

Kieli

englanti, suomi, ruotsi

Saatavuus

Saatavuutta rajoitettu

Lisenssi

muu

Avainsanat

Asiasanat

Ajallinen kattavuus

undefined