We use cookies on this website to distinguish you from other users. We use this data to improve our content experience and for targeted advertising. By continuing to use this website you consent to our use of cookies. For more information, please see our Cookie Policy.

spafe.features.rplp#

Description : (Rasta) Perceptual linear prediction coefficents (RPLPs/PLPs) extraction algorithm implementation.
Copyright (c) 2019-2024 Ayoub Malek. This source code is licensed under the terms of the BSD 3-Clause License. For a copy, see <https://github.com/SuperKogito/spafe/blob/master/LICENSE>.

spafe.features.rplp.plp(sig: ndarray, fs: int = 16000, order: int = 13, pre_emph: bool = False, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfilts: int = 24, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, scale: typing_extensions.Literal[ascendant, descendant, constant] = 'constant', lifter: Optional[int] = None, normalize: Optional[typing_extensions.Literal[mvn, ms, vn, mn]] = None, fbanks: Optional[ndarray] = None, conversion_approach: typing_extensions.Literal[Wang, Tjomov, Schroeder, Terhardt, Zwicker, Traunmueller] = 'Wang') → ndarray[source]#

Compute Perceptual linear prediction coefficents according to [Hermansky] and [Ajibola].

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
order (int) – number of cepstra to return. (Default is 13).
pre_emph (bool) – apply pre-emphasis if 1. (Default is 1).
pre_emph_coeff (float) – pre-emphasis filter coefﬁcient. (Default is 0.97).
window (SlidingWindow) – sliding window object. (Default is None).
nfilts (int) – the number of filters in the filter bank. (Default is 40).
nfft (int) – number of FFT points. (Default is 512).
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).
scale (str) – monotonicity behavior of the filter banks. (Default is “constant”).
lifter (int) – apply liftering if specified. (Default is None).
normalize (str) – apply normalization if approach specified. (Default is None).
fbanks (numpy.ndarray) – filter bank matrix. (Default is None).
conversion_approach (str) – bark scale conversion approach. (Default is “Wang”).

Returns

2d array of PLP features (num_frames x order)

Return type

(numpy.ndarray)

Tip

normalize : can take the following options [“mvn”, “ms”, “vn”, “mn”].
conversion_approach : can take the following options [“Tjomov”,”Schroeder”, “Terhardt”, “Zwicker”, “Traunmueller”, “Wang”]. Note that the use of different options than the ddefault can lead to unexpected behavior/issues.

Note

../_images/plps.png — Architecture of perceptual linear prediction coefﬁcients extraction algorithm.#

Examples

from scipy.io.wavfile import read
from spafe.features.rplp import plp
from spafe.utils.preprocessing import SlidingWindow
from spafe.utils.vis import show_features

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute plps
plps = plp(sig,
           fs=fs,
           pre_emph=0,
           pre_emph_coeff=0.97,
           window=SlidingWindow(0.03, 0.015, "hamming"),
           nfilts=128,
           nfft=1024,
           low_freq=0,
           high_freq=fs/2,
           lifter=0.9,
           normalize="mvn")

# visualize features
show_features(plps, "Perceptual linear predictions", "PLP Index", "Frame Index")

spafe.features.rplp.rplp(sig: ndarray, fs: int = 16000, order: int = 13, pre_emph: bool = False, pre_emph_coeff: float = 0.97, window: Optional[SlidingWindow] = None, nfilts: int = 24, nfft: int = 512, low_freq: float = 0, high_freq: Optional[float] = None, scale: typing_extensions.Literal[ascendant, descendant, constant] = 'constant', lifter: Optional[int] = None, normalize: Optional[typing_extensions.Literal[mvn, ms, vn, mn]] = None, fbanks: Optional[ndarray] = None, conversion_approach: typing_extensions.Literal[Wang, Tjomov, Schroeder, Terhardt, Zwicker, Traunmueller] = 'Wang') → ndarray[source]#

Compute rasta Perceptual linear prediction coefficents according to [Hermansky] and [Ajibola].

Parameters

sig (numpy.ndarray) – a mono audio signal (Nx1) from which to compute features.
fs (int) – the sampling frequency of the signal we are working with. (Default is 16000).
order (int) – number of cepstra to return. (Default is 13).
pre_emph (bool) – apply pre-emphasis if 1. (Default is True).
pre_emph_coeff (float) – pre-emphasis filter coefﬁcient. (Default is 0.97).
window (SlidingWindow) – sliding window object. (Default is None).
nfilts (int) – the number of filters in the filter bank. (Default is 40).
nfft (int) – number of FFT points. (Default is 512).
low_freq (float) – lowest band edge of mel filters (Hz). (Default is 0).
high_freq (float) – highest band edge of mel filters (Hz). (Default is samplerate/2).
scale (str) – monotonicity behavior of the filter banks. (Default is “constant”).
lifter (int) – apply liftering if specified. (Default is None).
normalize (str) – apply normalization if approach specified. (Default is None).
fbanks (numpy.ndarray) – filter bank matrix. (Default is None).
conversion_approach (str) – bark scale conversion approach. (Default is “Wang”).

Returns

2d array of rasta PLP features (num_frames x order)

Return type

(numpy.ndarray)

Tip

normalize : can take the following options [“mvn”, “ms”, “vn”, “mn”].
conversion_approach : can take the following options [“Tjomov”,”Schroeder”, “Terhardt”, “Zwicker”, “Traunmueller”, “Wang”]. Note that the use of different options than the ddefault can lead to unexpected behavior/issues.

Note

../_images/rplps.png — Architecture of rasta perceptual linear prediction coefﬁcients extraction algorithm.#

Examples

from scipy.io.wavfile import read
from spafe.features.rplp import rplp
from spafe.utils.preprocessing import SlidingWindow
from spafe.utils.vis import show_features

# read audio
fpath = "../../../tests/data/test.wav"
fs, sig = read(fpath)

# compute rplps
rplps = rplp(sig,
             fs=fs,
             pre_emph=0,
             pre_emph_coeff=0.97,
             window=SlidingWindow(0.03, 0.015, "hamming"),
             nfilts=128,
             nfft=1024,
             low_freq=0,
             high_freq=fs/2,
             lifter=0.9,
             normalize="mvn")

# visualize features
show_features(rplps, "Rasta perceptual linear predictions", "PLP Index", "Frame Index")

References

Ajibola(1,2): : Ajibola Alim, S., & Khair Alang Rashid, N. (2018). Some Commonly Used Speech Feature Extraction Algorithms. From Natural to Artificial Intelligence - Algorithms and Applications. doi:10.5772/intechopen.80419

spafe.features.psrcc

spafe.features.spfeats