Memorias de investigación
Ponencias en congresos:
Language Identification based on a Discriminative Text Categorization Technique
Año:2012

Áreas de investigación
  • Tecnología electrónica y de las comunicaciones,
  • Ingeniería eléctrica, electrónica y automática

Datos
Descripción
In this paper, we describe new results and improvements to a lan-guage identification (LID) system based on PPRLM previously introduced in [1] and [2]. In this case, we use as parallel phone recognizers the ones provided by the Brno University of Technology for Czech, Hungarian, and Russian lan-guages, and instead of using traditional n-gram language models we use a lan-guage model that is created using a ranking with the most frequent and discrim-inative n-grams. In this language model approach, the distance between the ranking for the input sentence and the ranking for each language is computed, based on the difference in relative positions for each n-gram. This approach is able to model reliably longer span information than in traditional language models obtaining more reliable estimations. We also describe the modifications that we have being introducing along the time to the original ranking technique, e.g., different discriminative formulas to establish the ranking, variations of the template size, the suppression of repeated consecutive phones, and a new clus-tering technique for the ranking scores. Results show that this technique pro-vides a 12.9% relative improvement over PPRLM. Finally, we also describe re-sults where the traditional PPRLM and our ranking technique are combined.
Internacional
Si
Nombre congreso
IberSPEECH 2012 - VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop
Tipo de participación
960
Lugar del congreso
Madrid, Spain
Revisores
Si
ISBN o ISSN
84-616-1535-2
DOI
Fecha inicio congreso
21/11/2012
Fecha fin congreso
22/11/2012
Desde la página
193
Hasta la página
203
Título de las actas
IberSPEECH 2012 - VII Jornadas en Tecnología del Habla and III Iberian SLTech Workshop

Esta actividad pertenece a memorias de investigación

Participantes

Grupos de investigación, Departamentos, Centros e Institutos de I+D+i relacionados
  • Creador: Grupo de Investigación: Grupo de Tecnología del Habla
  • Departamento: Ingeniería Electrónica