Communications Blog • 8 MIN READ

AI Separates Human's Voice from Background Noise to Measure VNR

VNR – a Metric to Determine Intelligibility of Voice Recordings

The “Dodd-Frank” Wall Street Reform and Consumer Protection Act is a massive piece of financial reform legislation passed in 2010 as a response to the financial crisis of 2008. A part of the act has enforced many banks in the U.S. – as well as Europe and Asia - to look at ways to verify audibility and intelligibility of voice recordings. Such recordings contain the calls of traders, wealth brokers, and contact center workers.

One of the main drawbacks that dramatically affects the ability to verify audibility and intelligibility of voice recordings is the presence of background noise such as music noise, babble noise, street noise, car noise, and white noise.

A few months ago, we proposed IR's own deep learning-based method to address the audibility of human's voice in the presence of background noise. VNR (Voice-to-Noise Ratio), is our recent proposal to address the intelligibility of voice recordings. Simply put, VNR provides a metric to determine how easily a human's voice can be understood over various types of background noise. It does this by measuring the ratio of the power of voice to the power of background noise. The lower the value of the VNR is, the less the probability that the human's voice can be intelligible.

To provide the VNR metric, I have trained a deep learning model that separates a human's voice from the above mentioned types of background noise. Figures below show some examples of extracted voice and extracted background noise from original recordings using the trained deep-learning model:

Extracted voice and extracted background noise are then used to measure the power of voice and power of background noise, respectively. Table below shows VNR's mean and standard deviation for various conditions:

Condition

Case

VNR (Mean)

VNR (Std. dev.)

Intelligible

Human's Voice with Background Noise (Music)

19.1147

4.2948

Intelligible

Human's Voice with Background Noise (Babble)

19.2605

4.1948

Intelligible

Human's voice Only

26.7468

2.7451

Unintelligible

Human's Voice with Background Noise (Music)

-4.5900

4.3836

Unintelligible

Human's Voice with Background Noise (Babble)

-9.2067

4.4564

Unintelligible

Background Noise Only (Music)

-12.1703

4.0922

Unintelligible

Background Noise Only (Babble)

-12.5714

4.0340

VNR along with our previous solution can dramatically reduce the risks, and therefore the costs, that are associated with non-compliance. We are very excited with the results achieved with the test data and optimistic about their real-world impact.

Topics: Communications

Subscribe to our blog

Stay up to date with the latest
Collaborate, Transact and Infrastructure
industry news and expert insights from IR.