Sound source localization method based CDR mask and localization apparatus using the method

Listed on

2026-03-23

Improved AI voice recognition performance Noise/reverberation robust sound source localization

This technology relates to a sound source localization method, and applies a dispersion mask created using CDR (Coherence to Diffuseness ratio), which is a coherence to dispersion power ratio, to a mixed signal input through multiple microphones in a noise and reverberation environment, and estimates the direction of the target sound source based on a cross-correlation technique. It relates to a sound source localization method and sound source localization device that are robust to reverberation and dispersion noise.

Previously, performance degradation of AI voice recognition speakers was a problem at long distances and in noisy and reverberant environments. To overcome these limitations, this technology proposes an innovative sound source localization method and device using a dispersion mask.

The input signal is pre-processed through a CDR-based binarization mask, and the GCC-PHAT or SRP-PHAT algorithm is applied to ensure robustness to noise and reflection and enable accurate sound source direction estimation. This dramatically improves voice recognition rates and provides stable AI services.

‍

Key Features:

Using the difference in 'Coherence' and 'Diffuseness' characteristics between the voice signal and the noise signal
Calculate CDR (Coherence to Diffuseness Ratio), which is a 'Coherence to Diffuseness Ratio' containing information about the target sound source and noise, to distinguish areas where the voice signal is dominant and areas where the noise is dominant
Binarized dispersion diagram that effectively suppresses noise and reverberation components By creating a mask (Binary Diffuseness Mask) and applying it to the input signal to pre-process the signal, the accuracy of sound source direction estimation is improved.

‍

This technology was developed through support from the National Research Foundation of Korea's research project on robust continuous speech recognition based on multimodal deep learning for audio-visual information.

‍

Sogang University

H. M. Park | R. Lee

Document

Date of application:

2018-01-25

Patent registration number:

10-2088222

Industry

IT•internet

software

Technology