Development of an Optimal Extracted Feature Classification Scheme in a Voice Recognition System Using a Dynamic Cuckoo Search Algorithm
Chapter One
Aim and Objectives
This research work aims to develop an optimal extracted feature classification scheme in voice recognition system using dynamic cuckoo search algorithm. This aim was accomplished by the following objectives:
- Obtaining a standard voice data from English Language Speech Database for Speaker Recognition (ELSDSR) database of the Technical University of Denmark (DTU), process and extract key features for voice recognition system (VRS).
- Developing a dynamic Cuckoo Search Algorithm (dCSA) for optimal extracted feature classification scheme in voice recognition system using same dataset obtained from ELSDSR database of DTU.
- Validating by comparing the performance of the VRS with a standard CSA-based scheme and the dCSA-based scheme using accuracy as performance metrics in order to determine improvement in the VRS.
CHAPTER TWO
LITERATURE REVIEW
Introduction
This chapter comprises of the review of fundamental concepts and the review of similar works. The review of fundamental concepts dwells on overview of concepts and theories that establish the basis of the study while the review of similar works dwells on review of literature to establish limitations and gaps in knowledge related to this study.
Review of Fundamental Concept
This section contains the review of concepts and theories that are fundamental to this study, such as Voice, Speech production, VRS, Classification techniques, Feature extraction, Feature matching, CSA, Lѐvy flight, Inertia weight, standard optimization test functions, amongst others.
A voice
Voice as earlier defined, refers to sound produced in a person‘s larynx and articulated through the mouth, as speech or song (Das & Nahar, 2016). Speech convey levels of information to the listener, at the primary level it conveys message via words, while at the secondary level it conveys information about the language being spoken, emotion, gender and generally the identity of the speaker (Reynolds, 2002). Speech is the foundation of self-expression and primary means of communication with others. It is the dominant mode of human social bonding and information exchange, to understand speech, humans consider not only the specific information conveyed to the ear, but also the context in which the information is being discussed (Huang et al., 2001). From technological curiosity about the mechanisms for mechanical realization of human speech capabilities, to the desire to automate simple tasks requiring human-machine interactions, research in speech recognition has attracted attention over the past decades (Juang & Rabiner, 2005).
Speech production
The fundamental aspect of a voice recognition system is the speech signal produced as voice or sound. The production of speech among humans is a common phenomenon that is encountered in their day to day life which is also part of communication between them. Human beings can generate varieties of sounds whose loudness and frequency spectrum changes rapidly (Mahendru, 2014). Speech production process is as shown in fig.2.1 (Honda, 2003).
Motor control function is energized by the human brain which generates a thought of what to speak and at the same time provides control signals through sensory nerves to the speech fabrication organs, then speech fabrication organs move and take proper form according to the words to be spoken or sound to be formed (Mahendru, 2014).
The articulation process is the most obvious one that takes place in the mouth and it is the process through which one can differentiate most speech sounds. In the mouth one can distinguish between the oral cavity, which acts as a resonator, and the articulators, which can be active or passive: upper and lower lips, upper and lower teeth, tongue (tip, blade, front, back) and roof of the mouth (alveolar ridge, palate and velum). So, speech sounds are distinguished from one another in terms of the place and the manner they are articulated (Phonetics & Trujillo). Numerous organs are involved in the making of speech and sound, these organs are flexible in nature and their shape and size adjusts the command of motor control signals received from the brain, as to what type of speech and sound to be produced. The lungs provide the required air force for the generation of sound in form of acoustic wave. The air passes through the vocal tract (the pipe that links lungs and throat), vocal cords, glottis, epiglottis, and other organs in the mouth and finally comes out through mouth and nasal cavities in the form of sound wave (Mahendru, 2014). Various organs involved in the of production of speech and sound are as shown in Fig. 2.2 (Kinnunen & Li, 2010).
CHAPTER THREE
MATERIALS AND METHODS
Introduction
This chapter gives details of the methods, materials and procedures employed for the successful completion of this research work. A standard dataset was obtained, and a voice recognition system using a newly formulated algorithm (dCSA) was developed. The algorithm was used to optimally classify extracted feature vectors from the voice signal. The steps of the methodology were as listed in section 1.6.
Development of Speakers’ Database.
Voice samples are primary factors in the development of a voice recognition system, hence, the need to have a good database. Standard voice datasets were obtained from the English Language Speech Database for Speaker Recognition (ELSDSR) of the Technical University of Denmark (DTU). The voice data was developed in a controlled environment to minimize noise. Details about ELSDSR, the recording of the voice data and recording environment were explained in the following sub-sections.
CHAPTER FOUR
RESULTS AND DISCUSSIONS
Introduction
In this section, analysis on the behavior of some the selected voice data obtained was performed, this is to understand the signal behavior of the selected voice samples, the result of the extraction process was discussed. The performance of the developed dynamic Cuckoo search algorithm and that of the standard Cuckoo search algorithm were evaluated using the optimization test functions discussed in subsection 2.2.13 and relevant results reported, and percentage improvement of dCSA over CSA was also determined and recorded. Moreover, the effectiveness of the developed algorithm was also demonstrated by comparing the dCSA based classification scheme with that of CSA based scheme in a voice recognition system using precision and accuracy as measure of evaluation.
CHAPTER FIVE
SUMMARY AND CONCLUSION
Summary
This research is aimed at the development of an optimal extracted feature classification scheme in voice recognition system using dynamic cuckoo search algorithm. Standard dataset was obtained from English Language Speech Database for Speaker Recognition (ELSDSR) from the Technical University of Denmark (DTU), processed and trained the data for voice recognition.
A swarm intelligent metaheuristic algorithm referred to as dynamic Cuckoo Search algorithm (dCSA) was developed to optimally classify the extracted feature vectors that were used in the Voice Recognition System (VRS). Performance of the VRS was compared using a CSA-based classification scheme and that of the developed dCSA-based classification scheme, and the results obtained showed that dCSA-based scheme produced more accurate result in the VRS than the CSA-based scheme.
Conclusion
Classification in any voice recognition system is an active and integral part that determines the accuracy of recognition. Classical and other traditional method were the predominant techniques used in VRS, hence, recognition level can be low. In order to overcome this common problem and increase the recognition accuracy, a metaheuristic algorithm was developed to optimally choose and classify the feature vectors of voice signals being used in voice recognition systems. The optimal extracted feature classification scheme using dynamic Cuckoo Search Algorithm in a VRS was developed in MATLAB R2013b. Moreover, the performance of the developed algorithm (dCSA) that was used for the classification technique was evaluated using ten unimodal and multimodal applied mathematical optimization test functions (Ackley, De jong, Easom, Rosenbrock, Griewangk, , Michalewicz, Rastrigin Rosenbrock, Schwefel, Shubert and Sphere). The simulation results obtained shows that dCSA performed better when compared with the standard CSA with an increase of 52% performance accuracy. The dCSA was then used in a voice recognition system to optimally select and classify extracted feature vectors of speech signal, and results obtained demonstrated that dCSA-based scheme has high recognition rate or accuracy than the CSA-based scheme with 3.2% accuracy increase. Likewise, it clearly demonstrated that the future of optimal classification techniques will be brighter as it may likely replace the traditional and classical methods being used.
Significant Contributions
The significant contributions of this research work are as follows:
- A Graphic User Interface based optimal classification scheme for extracted feature vectors in a voice recognition system using dynamic cuckoo search algorithm was developed.
- The developed dynamic cuckoo search algorithm based optimal extracted feature classification scheme produced a recognition accuracy of 93.18% compared to 90% produced by standard CSA-based classification scheme.
- The dynamic cuckoo search algorithm produced an improvement of 52% when compared with the standard cuckoo search algorithm.
Recommendations for Further Work
The following possible areas of further work are recommended for consideration for future research:
- Extension of the work for taking University class attendance to check students‘ impersonation.
- Implementation of the work for the control of automated vehicles
- Modification of the algorithm by hybridization for enhanced exploitation capabilities
- Extension of areas of application of the dCSA to other constrained and/or unconstrained optimization problems
REFERENCES
- Aggarwal, C. C., & Reddy, C. K. (2014). Data clustering. Algorithms and Applications, Chapman & Halls.
- Amarasinghe, A., & Wimalaratne, P. (2017). An Assistive Technology Framework for Communication with Hearing Impaired Persons. GSTF Journal on Computing (JoC), 5(2).
- Atal, B. S. (1976). Automatic recognition of speakers from their voices. Proceedings of the IEEE, 64(4), 460-475.
- Bansal, D., Turk, N., & Mendiratta, S. (2015). Automatic speech recognition by cuckoo search optimization based artificial neural network classifier. Paper presented at the Soft Computing Techniques and Implementations (ICSCTI), 2015 International Conference on.
- Bansal, J. C., Singh, P., Saraswat, M., Verma, A., Jadon, S. S., & Abraham, A. (2011). Inertia weight strategies in particle swarm optimization. Paper presented at the Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress on.
- Barthelemy, P., Bertolotti, J., & Wiersma, D. S. (2008). A Lévy flight for light. Nature, 453(7194), 495-498.
- Bhalla, A. V., Khaparkar, S., & Bhalla, M. R. (2012). Performance improvement of speaker recognition system. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3).
- Brown, C. T., Liebovitch, L. S., & Glendon, R. (2007). Lévy flights in Dobe Ju/‘hoansi foraging patterns. Human Ecology, 35(1), 129-138.
- Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437-1462.
- Chauhan, P., Deep, K., & Pant, M. (2013). Novel inertia weight strategies for particle swarm optimization. Memetic computing, 5(3), 229-251.
- Das, T., & Nahar, K. M. (2016). A Voice Identification System using Hidden Markov Model. Indian Journal of Science and Technology, 9(4).
- Dash, M., & Mohanty, R. (2014). Cuckoo search algorithm for speech recognition. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 3(10).
- Davies, N., & Cuckoos, C. (2000). Other Cheats. T. & AD Poyser, London.