Signal framing#
When it comes to non-stationary signals, spectral features in short parts/ sequences are of great use. Therefore, decomposing the signal into multiple ranges is the way to go about this type of features extraction. This technique is known as frame blocking or framing. The following blog explains why do we need framing and how to do it in python.
What is framing ?#
Frame blocking or framing is a fundamental signal processing technique that consists of dividing the original signal into \(\#F\) blocks often called frames with length \(N_f\) an overlap \(M\) and a framing hop \(H (H = N_f - M)\). Overlapping the frames help avoiding information loss in between adjacent frames.
Assuming a signal \(S = \sum_{n=0}^{N_s-1} x[n]\), this can be mathematically formulated as follows: \(S = \sum_{n=0}^{N_s} X[n] = \sum_{i=0}^{\#F-1} F[i]\) where:
\(S\): Discrete signal.
\(x[n]\): Signal samples in time domain.
\(N_s\) : Signal length in samples.
\(f[i]\): Signal frame.
\(\#F\) : Number of frames.
Why do we do need framing?#
Speech is a non-stationary signal, consequently its statistical properties are not constant over time. Therefore, its spectral features and other characteristic properties (for example: short-time energy, MFCC etc.) should be extracted from small blocks of the signal. This is based on the assumption that is the signal is stationary (i.e. its statistical properties are constant within this region) in this small frame 1. On top of it all, frame blocking is often used in real-time systems as it maximizes the efficiency of the system by distributing the fixed process overhead across many samples.
How to implement it?#
The first approach is pretty straight forward and can be summarized in the following steps:
Convert hop step and fame length from seconds to samples.
To understand this, we need to comprehend the terminology and the difference between a frame & a sample. Assume a discrete audio signal with a duration of 3 seconds and a sampling rate of 8 KHz. The sampling rate define the the number of samples/ signal measurements per second. 8 KHz translates to 8000 Hz = 8000 1/s = 8000 sample per second. So in this case, the signal has 3 x 8000 = 24 000 sample. On the other hand, a frame is a part of the signal or a series of consecutive samples. A 1 second-long, in this case, contains 1 x 8000 = 8000 samples. In order to convert the values of the framing hop and length from seconds to samples, we just need to multiply by sampling rate.
frame_hop_in_samples = sampling_rate x frame_hop_in_seconds
frame_length_in_samples = sampling_rate x frame_length_in_seconds
Compute the expected number of frames.
The expected number of frames can be compute using the following formula:
\begin{eqnarray} \#F = \left\lfloor \frac{N_s - M}{N_f - M} \right\rfloor \end{eqnarray}where:
\(\#F\) : Number of frames.
\(N_s\) : Signal length in samples.
\(N_f\) : Frame length in samples.
\(M\) : Frames overlap.
Pad signal if needed. Sometimes, it happens that the signal need to be padded in order to include all samples in frames with equal length. Therefore, we also compute the number of rest samples, which equivalent to the amount of samples missing from the last frame to be equally long as the other frames. This value will be used in the padding of the last frame, so that all frames will be equally long.
\begin{eqnarray} \#R = \left\lfloor (N_s - M) \bmod (N_f - M) \right\rfloor \end{eqnarray}where:
\(\#R\) : Number of rest samples.
\(N_s\) : Signal length in samples.
\(N_f\) : Frame length in samples.
\(M\) : Frames overlap.
Consequently, the number of samples to pad the signal is equal to \(N_f - \#R\) and \(\#F = \#F + 1\)
Compute frames indices. There are various ways to do this, but once you have the number frames and their lengths, it is only a matter of computing and where each frame starts and finishes.
Get frames
Code#
The previously listed steps, can be implemented in Python as follows:
1 import numpy as np
2
3
4 def framing(sig, fs=16000, win_len=0.025, win_hop=0.01):
5 """
6 transform a signal into a series of overlapping frames.
7
8 Args:
9 sig (array) : a mono audio signal (Nx1) from which to compute features.
10 fs (int) : the sampling frequency of the signal we are working with.
11 Default is 16000.
12 win_len (float) : window length in sec.
13 Default is 0.025.
14 win_hop (float) : step between successive windows in sec.
15 Default is 0.01.
16
17 Returns:
18 array of frames.
19 frame length.
20 """
21 # compute frame length and frame step (convert from seconds to samples)
22 frame_length = win_len * fs
23 frame_step = win_hop * fs
24 signal_length = len(sig)
25 frames_overlap = frame_length - frame_step
26
27 # Make sure that we have at least 1 frame+
28 num_frames = np.abs(signal_length - frames_overlap) // np.abs(frame_length - frames_overlap)
29 rest_samples = np.abs(signal_length - frames_overlap) % np.abs(frame_length - frames_overlap)
30
31 # Pad Signal to make sure that all frames have equal number of samples
32 # without truncating any samples from the original signal
33 if rest_samples != 0:
34 pad_signal_length = int(frame_step - rest_samples)
35 z = np.zeros((pad_signal_length))
36 pad_signal = np.append(sig, z)
37 num_frames += 1
38 else:
39 pad_signal = sig
40
41 # make sure to use integers as indices
42 frame_length = int(frame_length)
43 frame_step = int(frame_step)
44 num_frames = int(num_frames)
45
46 # compute indices
47 idx1 = np.tile(np.arange(0, frame_length), (num_frames, 1))
48 idx2 = np.tile(np.arange(0, num_frames * frame_step, frame_step),
49 (frame_length, 1)).T
50 indices = idx1 + idx2
51 frames = pad_signal[indices.astype(np.int32, copy=False)]
52 return frames
Alternatively, one can use the stride trick and use a sliding window technique that is already implemented in matlab to get a much faster framing. This is done like the following.
1 import numpy as np
2
3
4
5 def stride_trick(a, stride_length, stride_step):
6 """
7 apply framing using the stride trick from numpy.
8
9 Args:
10 a (array) : signal array.
11 stride_length (int) : length of the stride.
12 stride_step (int) : stride step.
13
14 Returns:
15 blocked/framed array.
16 """
17 nrows = ((a.size - stride_length) // stride_step) + 1
18 n = a.strides[0]
19 return np.lib.stride_tricks.as_strided(a,
20 shape=(nrows, stride_length),
21 strides=(stride_step*n, n))
22
23
24 def framing(sig, fs=16000, win_len=0.025, win_hop=0.01):
25 """
26 transform a signal into a series of overlapping frames (=Frame blocking).
27
28 Args:
29 sig (array) : a mono audio signal (Nx1) from which to compute features.
30 fs (int) : the sampling frequency of the signal we are working with.
31 Default is 16000.
32 win_len (float) : window length in sec.
33 Default is 0.025.
34 win_hop (float) : step between successive windows in sec.
35 Default is 0.01.
36
37 Returns:
38 array of frames.
39 frame length.
40
41 Notes:
42 ------
43 Uses the stride trick to accelerate the processing.
44 """
45 # run checks and assertions
46 if win_len < win_hop: print("ParameterError: win_len must be larger than win_hop.")
47
48 # compute frame length and frame step (convert from seconds to samples)
49 frame_length = win_len * fs
50 frame_step = win_hop * fs
51 signal_length = len(sig)
52 frames_overlap = frame_length - frame_step
53
54 # compute number of frames and left sample in order to pad if needed to make
55 # sure all frames have equal number of samples without truncating any samples
56 # from the original signal
57 rest_samples = np.abs(signal_length - frames_overlap) % np.abs(frame_length - frames_overlap)
58 pad_signal = np.append(sig, np.array([0] * int(frame_step - rest_samples) * int(rest_samples != 0.)))
59
60 # apply stride trick
61 frames = stride_trick(pad_signal, int(frame_length), int(frame_step))
62 return frames, frame_length
Conclusion#
This blog presented framing, which is a fundamental signal processing technique to that divides a signal into multiple, equally sized, blocks. The resulting blocks are considered stationary over time, which helps extract useful characterizing features of the signal. This operation can be implemented in python in a classical fashion or using the stride trick for a fast processing.
References and Further readings#
- 1
John R. Deller, John G. Proakis, and John H. Hansen. Discrete Time Processing of Speech Signals. Prentice Hall PTR, USA, 1st edition, 1987. ISBN 0023283017.