Fine-tuning an Automatic Speech Recognition Model for a Canadian Indigenous Counselling Program
Keywords:
Speech Recognition, Data Science, Data Selection, Machine LearningAbstract
Automatic Speech Recognition (ASR) systems are programs designed to transcribe or identify spoken language. Most modern ASRs are created using End to End Neural Networks and are largely dependent on the quantity and quality of available speech training data. The lack of accented speech data can lead to poor ASR performance with niche accents and voice types. The ASR model presented in this paper is designed to work within an interactive VR counselling software for Canadian Indigenous youth, with an elder. This paper outlines the use of fine-tuning and other data processing techniques to minimize the Word Error Rate of our ASR model. These techniques provide valuable insight into data selection and processing.