Versión en Español # Text Preprocessing for Spanish TTS <img style="float: right" src="./files/tts.png"> A **Text-To-Speech (TTS)** system receives a text as input, and produces audio with the corresponding speech as output. This project consists in implementing the modules for preprocessing the input text, for a screen reader of Wikipedia articles in Spanish. This process involves several steps: 1. extracting the relevant text from the HTML page; 1. segment the input text in sentences; 1. phonetically transcribe foreign words (e.g., 'hello' → 'jalóu'); 1. expand abbreviations and numeric expresions (e.g., '$2' → 'dos pesos', 'DGI' → 'de ge i'). These modules were implemented as part of the Graduation Theses of Ezequiel Saudino (3/2015) and Verónica Pechersky (12/2012) (advisor: A. Gravano). ## Documentation The implemented modules are described in detail in: * Ezequiel Saudino, "[Preprocesamiento y Normalización del Texto de un Sistema de Conversión Texto-Habla](http://www.dc.uba.ar/academica/tesis-de-licenciatura/2015/saudino.pdf)", Tesis de Licenciatura, Departamento de Computación, FCEyN, Universidad de Buenos Aires. March 2015. The methodology used in the text normalization module is further described in: * Verónica Pechersky, "[Normalización del texto de entrada para un sistema de síntesis del habla](http://www.dc.uba.ar/academica/tesis-de-licenciatura/2012/pechersky.pdf)", Tesis de Licenciatura, Departamento de Computación, FCEyN, Universidad de Buenos Aires. December 2012. ## Source code and other resources The source code of the implemented modules is released under the [Apache License 2.0](http://www.apache.org/licenses/) and may downloaded for free from [GitHub](https://github.com/aganha/tts-preprocessing). Other available resources: * [List of abbreviations](https://github.com/aganha/tts-preprocessing/blob/master/Normalizador-0.3/Abreviaturas.txt). * [List of acronyms and initialisms](https://github.com/aganha/tts-preprocessing/blob/master/Normalizador-0.3/siglas.txt). * [List of phonetic transcriptions of English words](https://github.com/aganha/tts-preprocessing/blob/master/Traductor-0.3/PalabrasIngles.txt). --- **Last updated:** 30 Aug 2016