Spectral Subtraction

ADAPTIVE NOISE CANCELLATION

THE PROJECT

In my senior year at DigiPen, my capstone project in the Computer Science and Digital Audio program was on Noise Cancellation. I've always had an interest in noise cancellation, and am an active user of noise cancelation devices for sensory disorders. This project was a way for me to understand a little bit more about the different techniques for noise cancellation.

When I initially started my project, I researched phase inversion cancellation. This is a noise cancellation technique in the time domain that plays a signal 180° out of phase of the unwanted noise to cancel it out. After a demo of this was made in JUCE, it became apparent that there was more application for this technique in hardware, as it is what noise cancelling headphones use. I wanted to stick with software, so I decided to take a different approach.

Next, I did some research into Spectral Subtraction, which is a noise reduction technique done in the frequency domain. The rest of my project consisted of developing and implementing the spectral subtraction algorithm, and expanding on it with adaptive noise estimation.

Spectral subtraction takes the basic idea that noise is additive, so any signal can be broken up into two separate signals consisting of a desired noise and an unwanted noise. The desired noise is often a speech of instrument signal, and the unwanted noise is the background noise distorting the desired noise. We can then simply say that s(m)=y(m)-d(m) where s(m) is the desired noise, y(m) is the input signal, and d(m) is the unwanted noise. Spectral subtraction also works off of the assumption that unwanted noise is due to distortion in the magnitude spectrum, and not phase. We can then say that |S(ω)|=|Y(ω)|-|D(ω)| and φ_S (ω)=φ_Y (ω). Since we don't explicitly know what the unwanted noise is, our goal is to now estimate the frequency content.

AVERAGE SPECTRUM

The implementation is based off of the Short-Time Fourier Transform (STFT). The STFT is a way to visualize how the frequency is changing over time, rather than one block of audio data. This consists of breaking up a signal into "frames" of data of size N, overlapping by 50%. The frames are then windowed with a windowing function, and transformed into the frequency domain with a FFT. We can then analyze the frequency content of each frame to see how the frequency changes. Throughout the project, a "frame" is considered to be windowed and transformed.

The other half of the the STFT processing is reconstructing the signal from the frequency data. To do this, we simply do an inverse FFT and overlap-add the frames so account for the windowing.

Code:

Using the STFT, we can then compute the Short-Time Power Spectrum (also called the average noise spectrum). This simply consists of taking all the frames over a chunk of data, and averaging it. We can then write an equation for the Short-Time Power Spectrum:

We then have a representation of the average frequency content that is present is some chunk of time, and it turns out that this makes for a pretty good estimate in our subtraction. However, this average must be taken during an initialization period where only the unwanted noise is present.

Code:

SUBTRACTION ALGORITHM

Spectral Subtraction is processed frame by frame, and then individually on each frequency bin. We can write an equation for the standard subtraction algorithm:

Here, α is a constant to determine the strength of the subtraction. We also clip negative values to prevent distortion when transforming back into the time domain.

After doing this, the background noise is definitely reduced, although a new type of noise is introduced that is often referred to as "Musical Noise". Musical noise are peaks in the frequency spectrum that were not subtracted out by the estimation. These peaks happen at points where the estimation was not accurate enough at a certain frequency bin, but was accurate in the surrounding frequencies. Because of the peak, we now hear a distinct pitch; hence the name, musical noise.

Input:

Output:

In order to reduce the amount of musical noise, two different methods are introduced: over-subtraction and spectral flooring. The idea for over-subtracting is that we will detect points where it is safe to subtract a large amount of our estimate to try to bring those peaks down.

We can use a calculation for the segmental SNR to detect points where we can safely over-subtract. The idea is that when speech is active, if we over-subtract the estimate we will likely distort the speech spectrum. When there isn't speech, we can safely subtract a large amount without worrying about distortion. The segmental SNR is calculated with the equation:

And use it in a linear function to calculate our subtraction factor α that is used in the subtraction algorithm

Code:

The other part to musical noise reduction is spectral flooring. This takes the idea that spectral peaks are produced by the "valleys" where the noise was fully subtracted. Instead of clipping those values to 0, we could fill in those valleys with a small amount of our estimate to smooth out the peak. To do this, we simply introduce a parameter β where β≈0.03. Our improved subtraction algorithm them becomes:

Bennett Ewanchyna

ADAPTIVE NOISE CANCELLATION

THE PROJECT