This technology relates to a sound source localization method, and applies a dispersion mask created using CDR (Coherence to Diffuseness ratio), which is a coherence to dispersion power ratio, to a mixed signal input through multiple microphones in a noise and reverberation environment, and estimates the direction of the target sound source based on a cross-correlation technique. It relates to a sound source localization method and sound source localization device that are robust to reverberation and dispersion noise.
Previously, performance degradation of AI voice recognition speakers was a problem at long distances and in noisy and reverberant environments. To overcome these limitations, this technology proposes an innovative sound source localization method and device using a dispersion mask.
The input signal is pre-processed through a CDR-based binarization mask, and the GCC-PHAT or SRP-PHAT algorithm is applied to ensure robustness to noise and reflection and enable accurate sound source direction estimation. This dramatically improves voice recognition rates and provides stable AI services.
This technology was developed through support from the National Research Foundation of Korea's research project on robust continuous speech recognition based on multimodal deep learning for audio-visual information.
USA US10593344B2