We use cookies on this website to distinguish you from other users. We use this data to improve our content experience and for targeted advertising. By continuing to use this website you consent to our use of cookies. For more information, please see our Cookie Policy.

spafe.features.gfcc#

Description : Gammatone Frequency Cepstral Coefﬁcients (GFCCs) extraction algorithm implementation.
Copyright (c) 2019-2024 Ayoub Malek. This source code is licensed under the terms of the BSD 3-Clause License. For a copy, see <https://github.com/SuperKogito/spafe/blob/master/LICENSE>.

spafe.features.gfcc.erb_spectrogram(sig: ndarray, fs: int = 16000, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfilts: int = 24, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, scale: typing_extensions.Literal[ascendant, descendant, constant] = 'constant', fbanks: Optional[ndarray] = None, conversion_approach: typing_extensions.Literal[Glasberg] = 'Glasberg') → Tuple[ndarray, ndarray][source]#

Compute the Gammatone/ erb scale spectrogram also known as Cochleagram.

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
pre_emph (bool) – apply pre-emphasis if 1. (Default is True).
pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).
window (SlidingWindow) – sliding window object. (Default is None).
nfilts (int) – the number of filters in the filter bank. (Default is 40).
nfft (int) – number of FFT points. (Default is 512.
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate / 2).
scale (str) – monotonicity behavior of the filter banks. (Default is “constant”).
fbanks (numpy.ndarray) – filter bank matrix. (Default is None).
conversion_approach (str) – approach to use for conversion to the erb scale. (Default is “Glasberg”).

Returns

(numpy.ndarray) : the erb spectrogram (num_frames x nfilts)
(numpy.ndarray) : the fourrier transform matrix.

Return type

(tuple)

Tip

scale : can take the following options [“constant”, “ascendant”, “descendant”].
conversion_approach : can take the following options [“Glasberg”]. Note that the use of different options than the default can lead to unexpected behavior/issues.

Note

../_images/gammatone_spectrogram.png — Architecture of the Gammatone spectrogram computation algorithm.#

Examples

from spafe.features.gfcc import erb_spectrogram
from spafe.utils.vis import show_spectrogram
from spafe.utils.preprocessing import SlidingWindow
from scipy.io.wavfile import read

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute erb spectrogram
gSpec, gfreqs = erb_spectrogram(sig,
                                fs=fs,
                                pre_emph=0,
                                pre_emph_coeff=0.97,
                                window=SlidingWindow(0.03, 0.015, "hamming"),
                                nfilts=128,
                                nfft=2048,
                                low_freq=0,
                                high_freq=fs/2)

# visualize spectrogram
show_spectrogram(gSpec.T,
                 fs=fs,
                 xmin=0,
                 xmax=len(sig)/fs,
                 ymin=0,
                 ymax=(fs/2)/1000,
                 dbf=80.0,
                 xlabel="Time (s)",
                 ylabel="Frequency (kHz)",
                 title="Erb spectrogram (dB)",
                 cmap="jet")

spafe.features.gfcc.gfcc(sig: ndarray, fs: int = 16000, num_ceps: int = 13, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfilts: int = 24, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, scale: typing_extensions.Literal[ascendant, descendant, constant] = 'constant', dct_type: int = 2, use_energy: bool = False, lifter: Optional[int] = None, normalize: Optional[typing_extensions.Literal[mvn, ms, vn, mn]] = None, fbanks: Optional[ndarray] = None, conversion_approach: typing_extensions.Literal[Glasberg] = 'Glasberg') → ndarray[source]#

Compute the Gammatone-Frequency Cepstral Coefﬁcients (GFCC features) from an audio signal as described in [Jeevan] and [Xu].

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
num_ceps (int) – number of cepstra to return). (Default is 13).
pre_emph (bool) – apply pre-emphasis if 1. (Default is True).
pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).
window (SlidingWindow) – sliding window object. (Default is None).
nfilts (int) – the number of filters in the filter bank. (Default is 40).
nfft (int) – number of FFT points. (Default is 512).
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate / 2).
scale (str) – monotonicity behavior of the filter banks. (Default is “constant”).
dct_type (int) – type of DCT used. (Default is 2).
use_energy (int) – overwrite C0 with true log energy. (Default is 0).
lifter (int) – apply liftering if value given. (Default is None).
normalize (str) – apply normalization if type specified. (Default is None).
fbanks (numpy.ndarray) – filter bank matrix. (Default is None).
conversion_approach (str) – erb scale conversion approach. (Default is “Glasberg”).

Returns

2d array of GFCC features (num_frames x num_ceps)

Return type

(numpy.ndarray)

Raises

ParameterError – if nfilts < num_ceps

Tip

scale : can take the following options [“constant”, “ascendant”, “descendant”].
dct : can take the following options [1, 2, 3, 4].
normalize : can take the following options [“mvn”, “ms”, “vn”, “mn”].
conversion_approach : can take the following options [“Glasberg”]. Note that the use of different options than the default can lead to unexpected behavior/issues.

Note

../_images/gfccs.png — Architecture of the Gammatone frequency cepstral coefﬁcients extraction algorithm.#

References

Jeevan: : Jeevan, M., Dhingra, A., Hanmandlu, M., & Panigrahi, B. K. (2016). Robust Speaker Verification Using GFCC Based i-Vectors. Proceedings of the International Conference on Signal, Networks, Computing, and Systems, 85–91. doi:10.1007/978-81-322-3592-7_9
Xu: : Xu, H., Lin, L., Sun, X., & Jin, H. (2012). A New Algorithm for Auditory Feature Extraction. 2012 International Conference on Communication Systems and Network Technologies. doi:10.1109/csnt.2012.57

Examples

from scipy.io.wavfile import read
from spafe.features.gfcc import gfcc
from spafe.utils.preprocessing import SlidingWindow
from spafe.utils.vis import show_features

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute mfccs and mfes
gfccs  = gfcc(sig,
              fs=fs,
              pre_emph=1,
              pre_emph_coeff=0.97,
              window=SlidingWindow(0.03, 0.015, "hamming"),
              nfilts=128,
              nfft=2048,
              low_freq=0,
              high_freq=8000,
              normalize="mvn")

# visualize features
show_features(gfccs, "Gammatone Frequency Cepstral Coefﬁcients", "GFCC Index", "Frame Index")

spafe.features.cqcc

spafe.features.lfcc