Sold
Available
IBL-26-0069

Sound source localization method based CDR mask and localization apparatus using the method

Listed on
2026-03-23
Improved AI voice recognition performance Noise/reverberation robust sound source localization

This technology relates to a sound source localization method, and applies a dispersion mask created using CDR (Coherence to Diffuseness ratio), which is a coherence to dispersion power ratio, to a mixed signal input through multiple microphones in a noise and reverberation environment, and estimates the direction of the target sound source based on a cross-correlation technique. It relates to a sound source localization method and sound source localization device that are robust to reverberation and dispersion noise.

Previously, performance degradation of AI voice recognition speakers was a problem at long distances and in noisy and reverberant environments. To overcome these limitations, this technology proposes an innovative sound source localization method and device using a dispersion mask.

The input signal is pre-processed through a CDR-based binarization mask, and the GCC-PHAT or SRP-PHAT algorithm is applied to ensure robustness to noise and reflection and enable accurate sound source direction estimation. This dramatically improves voice recognition rates and provides stable AI services.

Key Features:
  • Using the difference in 'Coherence' and 'Diffuseness' characteristics between the voice signal and the noise signal
  • Calculate CDR (Coherence to Diffuseness Ratio), which is a 'Coherence to Diffuseness Ratio' containing information about the target sound source and noise, to distinguish areas where the voice signal is dominant and areas where the noise is dominant
  • Binarized dispersion diagram that effectively suppresses noise and reverberation components By creating a mask (Binary Diffuseness Mask) and applying it to the input signal to pre-process the signal, the accuracy of sound source direction estimation is improved.

This technology was developed through support from the National Research Foundation of Korea's research project on robust continuous speech recognition based on multimodal deep learning for audio-visual information.

Sogang University
H. M. Park | R. Lee
Document
출원일:
2018-01-25
|
특허등록번호:
10-2088222
Industry
IT•internet
software
Technology
Artifical Intelligence
Computer
Country
Korea
Family Patent

USA US10593344B2

Price
Price negotiable
Subscribe to our newsletter to receive the latest patent information faster than anyone else.
← Back to list