Lahjoita puhetta -aineisto: Kehitysdata (10h)

Kuvaus

This resource is available for download in Kielipankki - The Language Bank of Finland as part of "Donate Speech: Selected dataset", http://urn.fi/urn:nbn:fi:lb-2022060127. The resource contains a subset of 10 hours of transcribed speech that was selected from the Donate Speech Corpus and used for developing an ASR system at Aalto University. The development data includes at least ten minutes of speech for each metadata class in each of the five metadata domains (age, dialect, gender, native/non-native and theme). The set contains speech from 103 different speakers (according to the metadata accompanying the original recordings). The gender ratio has been debiased, so that the set includes over 40% male speakers (similarly to the puhelahjat-test set, while the puhelahjat-train set has just over 20% of male speakers). For speech technology development purposes, the development dataset can be used together with the puhelahjat-test and puhelahjat-train datasets. There is no overlap of speakers between these three sets.
Näytä enemmän

Julkaisuvuosi

2022

Aineiston tyyppi

Tekijät

Aalto-yliopisto

Anssi Moisio Orcid -palvelun logo - Kuraattori

University of Helsinki - Kuraattori

Projekti

Muut tiedot

Tieteenalat

Kielitieteet

Kieli

suomi

Saatavuus

Saatavuutta rajoitettu

Lisenssi

CLARIN RES (Restricted) End User License 1.0

Avainsanat

Asiasanat

Ajallinen kattavuus

undefined

Liittyvät aineistot