Adaptation approaches for pronunciation scoring with sparse training data

In Computer Assisted Language Learning systems, pronunciation scoring consists in providing a score grading the overall pronunciation quality of the speech uttered by a student. In this work, a log-likelihood ratio obtained with respect to two automatic speech recognition (ASR) models was used as sc...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Landini, F., Ferrer, L., Franco, H., Karpov A., Mporas I., Potapova R., ASM Solutions Ltd.
Formato: SER
Materias:
Acceso en línea:http://hdl.handle.net/20.500.12110/paper_03029743_v10458LNAI_n_p87_Landini
Aporte de:
Descripción
Sumario:In Computer Assisted Language Learning systems, pronunciation scoring consists in providing a score grading the overall pronunciation quality of the speech uttered by a student. In this work, a log-likelihood ratio obtained with respect to two automatic speech recognition (ASR) models was used as score. One model represents native pronunciation while the other one captures non-native pronunciation. Different approaches to obtain each model and different amounts of training data were analyzed. The best results were obtained training an ASR system using a separate large corpus without pronunciation quality annotations and then adapting it to the native and non-native data, sequentially. Nevertheless, when models are trained directly on the native and non-native data, pronunciation scoring performance is similar. This is a surprising result considering that word error rates for these models are significantly worse, indicating that ASR performance is not a good predictor of pronunciation scoring performance on this system. © Springer International Publishing AG 2017.