الفهرس | Only 14 pages are availabe for public view |
Abstract This thesis proposes a speaker separation and identification system using deep learning to enhance the quality of Voice over Internet Protocol (VoIP) calls by reducing noise from multiple speakers. Existing approaches in online call systems focus on noise cancellation and call quality enhancement, which fail to effectively address the challenge of distinguishing between multiple speakers. The proposed system not only performs noise reduction but also separates and identifies the main speaker’s voice, ensuring that only their speech is transmitted over the call. By leveraging technologies such as deep neural networks, Short-Time Fourier Transform (STFT), and Mel-Frequency Cepstral Coefficients with Gaussian Mixture Model (MFCC-GMM), the system achieves satisfactory signal-to-noise ratios for up to four speakers. The thesis discusses challenges including processing time and adaptation to different VoIP systems. This practical solution improves the call experience, particularly in the context of the increasing adoption of work/study-from-home programs during the pandemic. By isolating and transmitting only the main speaker’s voice, regardless of other voices present, the proposed system showcases the integration of algorithmic technologies using deep neural networks and voice signal processing. |