Speech Processing Group

Datasets

VoxAccent Dataset

Here we present VoxAccent, a list of audio files collected from the Voxceleb dataset. It is a .txt file containing id, gender and nationality of the people whose voices are present in the selected audios. Voxaccent is a test set and has been collected with the objective of studying fairness on speaker verification systems.


EpaDB: A database for development of pronunciation assessment systems

Epa-DB is a database designed for the development and evaluation of pronunciation rating systems. It contains 3200 phonetically balanced English sentences from Argentine speakers in the process of learning. Each sentence is annotated at the allophone level by two expert annotators. The sentences were recorded on the participants' computers to simulate the expected usage environment for the systems being developed.


Trust-UBA: Detecting trust from the trustor's voice

The protocol of the dataset consists of an interactive session where the subject is asked to respond to a series of factual questions with the help of a virtual assistant. In order to induce subjects to either trust or distrust the agent's skills, they are first informed that it was previously rated by other users as being either good or bad; subsequently, the agent answers the subjects' questions consistently to its alleged abilities. All interactions are speech-based, with subjects and agents communicating verbally, which allows the recording of speech produced under different trust conditions.


UBA Games Corpus

The UBA Games Corpus is a collection of spontaneous dialogues and monologues produced by native speakers of Argentine Spanish. It includes 706 minutes of dialogues in which collaborative tasks are solved (the same games as in the Columbia Games Corpus), as well as 119 minutes of monologues consisting of directions for navigating the city, with both spontaneous and read versions (similar to the Boston Directions Corpus). The corpus includes audio recordings, orthographic transcriptions, and other annotations.


Spanish DAL: Diccionario de Afectos en EspaƱol

SDAL is a lexicon of 2880 words manually annotated according to three affective dimensions: pleasantness, activation, imaginability.


Code Repositories

Confidence intervals

This toolkit provides access to a simple implementation of the bootstrapping approach to compute confidence intervals for evaluation in machine learning.


Calibration Tutorial

A tutorial and toolkit for calibration of classification systems


EnCodecMAE

EnCodecMAE is a general audio embedding presented in "EnCodecMAE: Leveraging neural codecs for universal audio representation learning". This codebase allows to easily extract these embeddings and replicate the paper results.


Expected cost

Methods for computing the expected cost (EC) on an evaluation dataset. Methods for calibrating and making categorical decisions using Bayes decision theory are also provided.


ser-with-w2v2

Official implementation of the paper 'Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings'. Code for recreating results and checkpoints are available.


ast-pe

Implementation of the paper 'Study of positional encoding approaches for Audio Spectrogram Transformers'. Code for reproduction of the paper, checkpoints and colab tutorials are shared.


DCA-PLDA

This repository implements the Discriminative Condition-Aware PLDA Backend (DCA-PLDA) for speaker verification.