We use cookies on this website to distinguish you from other users. We use this data to improve our content experience and for targeted advertising. By continuing to use this website you consent to our use of cookies. For more information, please see our Cookie Policy.

spafe.features.cqcc#

Description : Constant Q-transform Cepstral Coeﬃcients (CQCCs) extraction algorithm implementation.
Copyright (c) 2019-2024 Ayoub Malek. This source code is licensed under the terms of the BSD 3-Clause License. For a copy, see <https://github.com/SuperKogito/spafe/blob/master/LICENSE>.

spafe.features.cqcc.cqt_spectrogram(sig: ndarray, fs: int = 16000, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, number_of_octaves: int = 7, number_of_bins_per_octave: int = 24, spectral_threshold: float = 0.005, f0: float = 120, q_rate: float = 1.0)[source]#

Compute the Constant-Q Cepstral spectrogram from an audio signal as in [Todisco].

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
pre_emph (int) – apply pre-emphasis if 1. (Default is 1).
pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).
window (SlidingWindow) – sliding window object. (Default is None).
nfft (int) – number of FFT points. (Default is 512).
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).
number_of_octaves (int) – number of occtaves. (Default is 7).
number_of_bins_per_octave (int) – numbers of bins oer occtave. (Default is 24).
spectral_threshold (float) – spectral threshold. (Default is 0.005).
f0 (float) – fundamental frequency. (Default is 28).
q_rate (float) – number of FFT points. (Default is 1.0).

Returns

2d array of the spectrogram matrix (num_frames x num_ceps)

Return type

(numpy.ndarray)

Note

../_images/cqt_spectrogram.png — Architecture of Constant q-transform spectrogram computation algorithm.#

Examples

from spafe.features.cqcc import cqt_spectrogram
from spafe.utils.vis import show_spectrogram
from spafe.utils.preprocessing import SlidingWindow
from scipy.io.wavfile import read

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute spectrogram
qSpec = cqt_spectrogram(sig,
                        fs=fs,
                        pre_emph=0,
                        pre_emph_coeff=0.97,
                        window=SlidingWindow(0.03, 0.015, "hamming"),
                        nfft=2048,
                        low_freq=0,
                        high_freq=fs/2)

# visualize spectrogram
show_spectrogram(qSpec,
                 fs=fs,
                 xmin=0,
                 xmax=len(sig)/fs,
                 ymin=0,
                 ymax=(fs/2)/1000,
                 dbf=80.0,
                 xlabel="Time (s)",
                 ylabel="Frequency (kHz)",
                 title="CQT spectrogram (dB)",
                 cmap="jet")

spafe.features.cqcc.cqcc(sig, fs: int = 16000, num_ceps: int = 13, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, dct_type: int = 2, lifter: Optional[int] = None, normalize: Optional[typing_extensions.Literal[mvn, ms, vn, mn]] = None, number_of_octaves: int = 7, number_of_bins_per_octave: int = 24, resampling_ratio: float = 1.0, spectral_threshold: float = 0.005, f0: float = 120, q_rate: float = 1.0)[source]#

Compute the Constant-Q Cepstral Coeﬃcients (CQCC features) from an audio signal as described in [Todisco].

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
num_ceps (int) – number of cepstra to return. (Default is 13).
pre_emph (bool) – apply pre-emphasis if 1. (Default is 1).
pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).
win_len (float) – window length in sec. (Default is 0.025).
win_hop (float) – step between successive windows in sec. (Default is 0.01).
win_type (float) – window type to apply for the windowing. (Default is “hamming”).
nfft (int) – number of FFT points. (Default is 512).
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).
dct_type (int) – type of DCT used. (Default is 2).
lifter (int) – apply liftering if value given. (Default is None).
normalize (str) – normalization approach. (Default is None).
number_of_octaves (int) – number of occtaves. (Default is 7).
number_of_bins_per_octave (int) – numbers of bins oer occtave. (Default is 24).
resampling_ratio (float) – ratio to use for the uniform resampling. (Default is 1.00).
spectral_threshold (float) – spectral threshold. (Default is 0.005).
f0 (float) – fundamental frequency. (Default is 28).
q_rate (float) – number of FFT points. (Default is 1.0).

Returns

2d array of BFCC features (num_frames*resampling_ratio x num_ceps).

Return type

(numpy.ndarray)

Tip

dct : can take the following options [1, 2, 3, 4].
normalize : can take the following options [“mvn”, “ms”, “vn”, “mn”].

References

Todisco(1,2): : Todisco M., Héctor Delgado H., Evans N., Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification, Computer Speech & Language, Volume 45, 2017, Pages 516-535, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2017.01.001.

Note

../_images/cqccs.png — Architecture of constant q-transform cepstral coefﬁcients extraction algorithm.#

Examples

from scipy.io.wavfile import read
from spafe.features.cqcc import cqcc
from spafe.utils.preprocessing import SlidingWindow
from spafe.utils.vis import show_features

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute cqccs
cqccs  = cqcc(sig,
              fs=fs,
              pre_emph=1,
              pre_emph_coeff=0.97,
              window=SlidingWindow(0.03, 0.015, "hamming"),
              nfft=2048,
              low_freq=0,
              high_freq=fs/2,
              normalize="mvn")

# visualize features
show_features(cqccs, "Constant Q-Transform Cepstral Coefﬁcients", "CQCC Index", "Frame Index")

spafe.features.bfcc

spafe.features.gfcc