spafe.features.cqcc#

  • Description : Constant Q-transform Cepstral Coefficients (CQCCs) extraction algorithm implementation.

  • Copyright (c) 2019-2024 Ayoub Malek. This source code is licensed under the terms of the BSD 3-Clause License. For a copy, see <https://github.com/SuperKogito/spafe/blob/master/LICENSE>.

spafe.features.cqcc.cqt_spectrogram(sig: ndarray, fs: int = 16000, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, number_of_octaves: int = 7, number_of_bins_per_octave: int = 24, spectral_threshold: float = 0.005, f0: float = 120, q_rate: float = 1.0)[source]#

Compute the Constant-Q Cepstral spectrogram from an audio signal as in [Todisco].

Parameters
  • sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.

  • fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).

  • pre_emph (int) – apply pre-emphasis if 1. (Default is 1).

  • pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).

  • window (SlidingWindow) – sliding window object. (Default is None).

  • nfft (int) – number of FFT points. (Default is 512).

  • low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).

  • high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).

  • number_of_octaves (int) – number of occtaves. (Default is 7).

  • number_of_bins_per_octave (int) – numbers of bins oer occtave. (Default is 24).

  • spectral_threshold (float) – spectral threshold. (Default is 0.005).

  • f0 (float) – fundamental frequency. (Default is 28).

  • q_rate (float) – number of FFT points. (Default is 1.0).

Returns

2d array of the spectrogram matrix (num_frames x num_ceps)

Return type

(numpy.ndarray)

Note

../_images/cqt_spectrogram.png

Architecture of Constant q-transform spectrogram computation algorithm.#

Examples

from spafe.features.cqcc import cqt_spectrogram
from spafe.utils.vis import show_spectrogram
from spafe.utils.preprocessing import SlidingWindow
from scipy.io.wavfile import read

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute spectrogram
qSpec = cqt_spectrogram(sig,
                        fs=fs,
                        pre_emph=0,
                        pre_emph_coeff=0.97,
                        window=SlidingWindow(0.03, 0.015, "hamming"),
                        nfft=2048,
                        low_freq=0,
                        high_freq=fs/2)

# visualize spectrogram
show_spectrogram(qSpec,
                 fs=fs,
                 xmin=0,
                 xmax=len(sig)/fs,
                 ymin=0,
                 ymax=(fs/2)/1000,
                 dbf=80.0,
                 xlabel="Time (s)",
                 ylabel="Frequency (kHz)",
                 title="CQT spectrogram (dB)",
                 cmap="jet")
../_images/cqcc-1.png
spafe.features.cqcc.cqcc(sig, fs: int = 16000, num_ceps: int = 13, pre_emph: bool = True, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, dct_type: int = 2, lifter: Optional[int] = None, normalize: Optional[typing_extensions.Literal[mvn, ms, vn, mn]] = None, number_of_octaves: int = 7, number_of_bins_per_octave: int = 24, resampling_ratio: float = 1.0, spectral_threshold: float = 0.005, f0: float = 120, q_rate: float = 1.0)[source]#

Compute the Constant-Q Cepstral Coefficients (CQCC features) from an audio signal as described in [Todisco].

Parameters
  • sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.

  • fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).

  • num_ceps (int) – number of cepstra to return. (Default is 13).

  • pre_emph (bool) – apply pre-emphasis if 1. (Default is 1).

  • pre_emph_coeff (float) – pre-emphasis filter coefficient. (Default is 0.97).

  • win_len (float) – window length in sec. (Default is 0.025).

  • win_hop (float) – step between successive windows in sec. (Default is 0.01).

  • win_type (float) – window type to apply for the windowing. (Default is “hamming”).

  • nfft (int) – number of FFT points. (Default is 512).

  • low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).

  • high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).

  • dct_type (int) – type of DCT used. (Default is 2).

  • lifter (int) – apply liftering if value given. (Default is None).

  • normalize (str) – normalization approach. (Default is None).

  • number_of_octaves (int) – number of occtaves. (Default is 7).

  • number_of_bins_per_octave (int) – numbers of bins oer occtave. (Default is 24).

  • resampling_ratio (float) – ratio to use for the uniform resampling. (Default is 1.00).

  • spectral_threshold (float) – spectral threshold. (Default is 0.005).

  • f0 (float) – fundamental frequency. (Default is 28).

  • q_rate (float) – number of FFT points. (Default is 1.0).

Returns

2d array of BFCC features (num_frames*resampling_ratio x num_ceps).

Return type

(numpy.ndarray)

Tip

  • dct : can take the following options [1, 2, 3, 4].

  • normalize : can take the following options [“mvn”, “ms”, “vn”, “mn”].

References

Todisco(1,2)

: Todisco M., Héctor Delgado H., Evans N., Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification, Computer Speech & Language, Volume 45, 2017, Pages 516-535, ISSN 0885-2308, https://doi.org/10.1016/j.csl.2017.01.001.

Note

../_images/cqccs.png

Architecture of constant q-transform cepstral coefficients extraction algorithm.#

Examples
from scipy.io.wavfile import read
from spafe.features.cqcc import cqcc
from spafe.utils.preprocessing import SlidingWindow
from spafe.utils.vis import show_features

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute cqccs
cqccs  = cqcc(sig,
              fs=fs,
              pre_emph=1,
              pre_emph_coeff=0.97,
              window=SlidingWindow(0.03, 0.015, "hamming"),
              nfft=2048,
              low_freq=0,
              high_freq=fs/2,
              normalize="mvn")

# visualize features
show_features(cqccs, "Constant Q-Transform Cepstral Coefficients", "CQCC Index", "Frame Index")
../_images/cqcc-2.png