[30-04-2020] Root mean square normalization in Python


Audio normalization is a fundamental audio processing technique that consists of applying a constant amount of gain to an audio in order to bring its amplitude to a target level. A commonly used normalization technique is the Root Mean Square (RMS) normalization. This blog post introduces RMS normalization and provides a Python implementation of it.

* keywords: * Audio normalization, RMS normalization, Python

What is RMS normalization?

In general there are two principal types of audio normalization:

  • Peak normalization which adjusts the recording based on its highest signal level.

  • Loudness normalization which adjusts the recording based its perceived loudness.

RMS normalization falls under the latter, where the perceived loudness level is determined using the root mean square of the signal. The result is then used to compute the gain value used in the normalization. Since the gain value is constant and applied across the entire recording, the normalization does not affect the signal-to-noise ratio and the relative dynamics [She12]. The approach to RMS normalization can be summarized in the following mathematical formula [MM20]:

\begin{equation} y[n]=\sqrt{\frac{N-10\left(\frac{r}{20}\right)}{\sum_{i=0}^{N-1} x^{2} \left[ i\right]}} \cdot x[n] \end{equation}

where:

  • \(x[n]\) is the original signal.

  • \(y[n]\) is the normalized signal.

  • \(N\) is the length of \(x[n]\).

  • \(r\) is the input RMS level in dB.

How to implement it in Python?

Implementing the RMS normalization is fairly simple in Python and the algorithm can be summarized in the following steps:

  • Read audio as an array.

  • Compute the linear RMS level and its scaling factor

  • Normalize using the scaling factor.

  • Write the resulting array as an audio.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def normalize(infile, rms_level=0):
    """
    Normalize the signal given a certain technique (peak or rms).
    Args:
        - infile    (str) : input filename/path.
        - rms_level (int) : rms level in dB.
    """
    # read input file
    fs, sig = read_file(filename=infile)

    # linear rms level and scaling factor
    r = 10**(rms_level / 10.0)
    a = np.sqrt( (len(sig) * r**2) / np.sum(sig**2) )

    # normalize
    y = sig * a

    # construct file names
    output_file_path = os.path.dirname(infile)
    name_attribute = "output_file.wav"

    # export data to file
    write_file(output_file_path=output_file_path,
               input_file_name=infile,
               name_attribute=name_attribute,
               sig=y,
               fs=fs)

This implementation is available as part of the Pydiogment_library

Conclusion

This blog post provided a small introduction of the RMS normalization technique, which is commonly used in speech processing to improve the quality of recordings. We also provided a small implementation of the approach that is part of the Pydiogment_library.

References and Further readings

MM20

Ayoub Malek and Hasna Marwa Malek. Pydiogment: A Python package for audio augmentation. 2020. [Online; accessed 30.04.2020]. URL: https://github.com/SuperKogito/pydiogment/blob/master/paper/paper.pdf.

She12

Matt Shelvock. Audio Mastering as Musical Practice. Master’s thesis, The University of Western Ontario: The School of Graduate and Postdoctoral Studies, London, Ontario, Canada, 2012. URL: https://ir.lib.uwo.ca/etd/530.