![]() This will make their interaction with machines possible for various tasks like reading tutors, language learning by children, information retrieval, and entertainment applications. Nowadays children have also become the potential users of these systems and, therefore, there is need for children's speech recognition. In recent years, development of speech recognition systems has enhanced the use of machines and other interactive multimedia systems in diverse areas. With combined normalization of the pitch, the speaking rate, and the formant frequencies, 80% and 70% relative improvements are obtained over the baseline for children's speech and adults' speech recognition under mismatched conditions. Significant improvements are obtained in the performance with normalization of these three parameters. Our initial study done on a connected digit recognition task shows that among these parameters only the formant frequencies, the pitch, and the speaking rate affect the automatic speech recognition performance. An effort is made to quantify the effect of these correlates by explicitly normalizing each of them using the already existing techniques available in literature. The different correlates studied in this work include the pitch, the speaking rate, the glottal parameters (open quotient, return quotient, and speech quotient), and the formant frequencies. This work explores the effect of mismatches between adults' and children's speech due to differences in various acoustic correlates on the automatic speech recognition performance under mismatched conditions.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |