Speech Processing Group

Datasets

VoxAccent Dataset

Here we present VoxAccent, a list of audio files collected from the Voxceleb dataset. It is a .txt file containing id, gender and nationality of the people whose voices are present in the selected audios. Voxaccent is a test set and has been collected with the objective of studying fairness on speaker verification systems.

EpaDB: A database for development of pronunciation assessment systems

Epa-DB is a database designed for the development and evaluation of pronunciation rating systems. It contains 3200 phonetically balanced English sentences from Argentine speakers in the process of learning. Each sentence is annotated at the allophone level by two expert annotators. The sentences were recorded on the participants' computers to simulate the expected usage environment for the systems being developed.

Trust-UBA: Detecting trust from the trustor's voice

The protocol of the dataset consists of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant. In order to induce subjects to either trust or distrust the agent's skills, they are first informed that it was previously rated by other users as being either good or bad; subsequently, the agent answers the subjects' questions consistently to its alleged abilities. All interactions are speech-based, with subjects and agents communicating verbally, which allows the recording of speech produced under different trust conditions.

UBA Games Corpus

The UBA Games Corpus is a collection of spontaneous dialogues and monologues produced by native speakers of Argentine Spanish. It includes 706 minutes of dialogues in which collaborative tasks are solved (the same games as in the Columbia Games Corpus), as well as 119 minutes of monologues consisting of directions for navigating the city, with both spontaneous and read versions (similar to the Boston Directions Corpus). The corpus includes audio recordings, orthographic transcriptions, and other annotations.

Spanish DAL: Diccionario de Afectos en Español

SDAL is a lexicon of 2880 words manually annotated according to three affective dimensions: pleasantness, activation, imaginability.