Finnish First Encounter

Kuvaus

## Introduction The research material consists of the Finnish first encounters dialogue corpus collected as part of the NOMCO project, a Nordic cooperation project. The project aim for developing and analyzing multi-modal spoken language corpora in the Nordic countries, and to compare communication strategies in three closely related languages (Danish, Finnish, and Swedish). The main goals of the project are the following: 1. Providing comparative annotated multimodal data. 2. Using these data to investigate specific communicative phenomena such as feedback and turn-taking. 3. Developing, extending and adapting models of multimodal interactive communication management that can serve as a basis for interactive systems; 4. Applying machine learning techniques in order to test the possibilities for automatically recognizing or predicting hand gestures, head movements and facial expressions with different interactive communication functions. Within the project, dialog corpora using the same activity type, first encounters, have been collected in the different languages: using the same activity type, comparisons between languages and cultures can be enabled. ## Data collection The interactions in the first encounters corpora involve two subjects who are standing in front of a light background. The participants were instructed to get to know each other in a short interaction, as they might do at a party or a reception. After the recording they answered a questionnaire about their reactions to both the interlocutor and the interaction setting. The dataset consist of 16 video recordings of first-encounter interactions in Finnish. Video recordings show both participants at the same time as well as each participant individually, and there is also a mosaic version with the individual and the joint video streamed together in one video. In this study we concentrate on individual behaviours and not on interpersonal communication, and thus used the individual videos rather than videos with both participants as research material. The 14 participants did not know each other in advance, and were given the task of getting to know each other by having a conversation. Eleven of the participants had two conversations with a different partner, while three took part in only one conversation. Although one’s conversational activity depends on the partner, it seems useful to distinguish between the first and the second conversation, since in the second one, the speaker is familiar with the recording situation and can therefore feel more relaxed and attentive to the partner. In fact, in terms of the speaking time, the participants seem to speak about 7% more in their first conversation than in the second one, which can be interpreted as supporting this assumption: in the second conversation, they feel more experienced and in control of the situation, so as to follow and let the partner to speak. ## Dataset statistics and organization There are 14 participants, 4 males and 10 females, all native speakers of Finnish. Of the 16 collected conversations, there are 2 male-male conversations, 6 male-female conversations, and 8 female-female conversations. The shortest conversation is 3 minutes 49 seconds and the longest 8 minutes 2 seconds. The average length of the conversations is 6 minutes 25 seconds. ## Content and annotations The dataset provides recording in two primitive types of data: audio and video, which are annotated for three different tasks: * The transcriptions are provided in both Finnish, and translated English. * The laughter annotations mark the laughing events from each speaker. * The topic annotations, in English, specify discussing topic along the conversation. For more information concerned the structure of the dataset, you can check the Readme.txt included in the dataset.
Näytä enemmän

Julkaisuvuosi

2019

Aineiston tyyppi

Tekijät

Helsingin yliopisto - Julkaisija

Kristiina Jokinen Orcid -palvelun logo - Kuraattori, Oikeuksienhaltija, Muu tekijä, Tekijä

Projekti

Muut tiedot

Tieteenalat

Tietojenkäsittely ja informaatiotieteet; HUMANISTISET TIETEET; Kielitieteet

Kieli

englanti, suomi

Saatavuus

Vaatii luvan hakemista Fairdata-palvelussa

Lisenssi

muu

Avainsanat

conversational video, interactive engagement, multimodal copora

Asiasanat

Ajallinen kattavuus

undefined

Liittyvät aineistot