Posts tagged Audio
Root mean square normalization in Python
- 30 April 2020
- 08 April 2022
- Munich
- English
- Signal processing
Audio normalization is a fundamental audio processing technique that consists of applying a constant amount of gain to an audio in order to bring its amplitude to a target level. A commonly used normalization technique is the Root Mean Square (RMS) normalization. This blog post introduces RMS normalization and provides a Python implementation of it.
How to pipe an FFmpeg output and pass it to a Python variable?
- 19 März 2020
- 08 April 2022
- Munich
- English
- Signal processing
When writing code, the key optimization points are speed and efficiency. I often face this dilemma when using FFmpeg with Python. For example: when I need to convert an mp3 to a wave file and then do some processing to it in Python. The simple way to do this, is by using FFmpeg to convert the mp3 input to a wave, then read the wave in Python and do process it. Although this works, but clearly it is neither optimal nor the fastest solution. In this blog post, I will present an improved solution to this inconvenience by piping the output of FFmpeg to Python and directly pass it to a numpy variable.
Spectral leakage and windowing
- 13 März 2020
- 08 April 2022
- Munich
- English
- Signal processing
Windowing is an important part of almost any signal processing system, that helps remove/ reduce spectral leakage when processing a non-periodic signal. This blog post provides a small overview of what is spectral usage, when does it occur and how to use windowing to suppress it.
Naive voice activity detection using short time energy
- 09 Februar 2020
- 08 April 2022
- Munich
- English
- Signal processing
An important part of speech/speaker recognition tasks is distinction of voiced segments from silent ones. This helps -for example- align phonemes with their associated voiced segments and avoid any extra information related to silence/ noise that would degrade the system's accuracy. This problem is known as Voice Activity Detection (VAD). This blog aims to introduce voice activity detection and present simple short time energy based VAD implementation.
Signal framing
- 25 Januar 2020
- 08 April 2022
- Munich
- English
- Signal processing
When it comes to non-stationary signals, spectral features in short parts/ sequences are of great use. Therefore, decomposing the signal into multiple ranges is the way to go about this type of features extraction. This technique is known as frame blocking or framing. The following blog explains why do we need framing and how to do it in python.
Voice based gender recognition using Gaussian mixture models
- 09 Mai 2019
- 10 April 2022
- Munich
- English
- Machine learning
The aforementioned implementation, uses The Free ST American English Corpus data-set (SLR45), which is a free American English corpus by Surfingtech, containing utterances from 10 speakers (5 females and 5 males).