Enhancing the performance of machine learning models developed for neurodiagnostic testing of children with Auditory Processing Disorder
Keywords:Machine Learning, Auditory Processing Disorder, Auditory Brainstem Response, Signal feature extraction
Introduction: Auditory Processing Disorder (APD) present in 2-3 % of school-aged children worldwide, disrupts the ability to perceive and understand auditory information . Neurophysiologic evidence of APD can be seen through evaluation of the auditory brainstem response (ABR) at the supra threshold level  but requires extensive training and can be labor intensive. A machine learning (ML) model was recently developed at Western University to automate ABR analysis with an accuracy of 92% . Further development of the model was however limited by a lack of clinical datasets. Active learning (AL) techniques can be used to intelligently train and enhance the performance of ML models with fewer required datasets . The present work aimed to improve the accuracy of the previously developed ML model by utilizing an AL strategy.
Methods: Signal features were extracted from the auditory waveforms of 136 children (age range: 5-16 years, male: 88, and female: 48) suspected of APD and used to develop a ML model capable of detecting abnormal ABRs with an accuracy of 92% . Next, AL techniques were used to improve the accuracy of the model. In AL, the learner (ML model) queries the oracle, or the human annotator, for the labels that it is uncertain to classify/ predict. Once it receives the labels in question, it updates the samples and retrains the model. This iterative procedure allows the model to learn from the queries and use its newly acquired knowledge to improve its accuracy. In designing the active learning loop, a Query by Committee (QBC) technique with pool-based sampling  was used. In QBC, a group of ML models with competing hypotheses is used to identify the most informative instance to be queried.
Results: The accuracy of the model was improved to 97% (an increase of 5%) following 50 AL queries (Fig. 1). In addition, the AL technique required 127 clinical waveforms to reach this accuracy, whereas the previous supervised learning technique required 231 waveforms.
Conclusion: APD is a rare disorder with limited availability of clinical data needed to train ML models. Here, the use of active learning (AL) was found to be a suitable approach to improve the accuracy of a supervised learning model while requiring a smaller number of clinical datasets.
Acknowledgment: This work was supported by the Ontario Research Fund and the Natural Sciences and Engineering Council of Canada.