Let’s see how to manipulate audio data by adding noise, using NumPy.
Adding noise to audio data during training helps the model become more robust in real-world scenarios, where there might be background noise or interference. By exposing a model to a variety of noisy conditions, it learns to generalize better.
Augmenting audio data with noise prevents a model from memorizing specific patterns in the training data. This encourages the model to focus on more general features, which can lead to better generalization on unseen data:
import numpy as np
def add_noise(data, noise_factor):
noise = np.random.randn(len(data))
augmented_data = data + noise_factor * noise
Cast back to same data type
augmented_data = augmented_data.astype(type(data[0]))
return augmented_data
This code defines a function named add_noise that adds random noise to an input data array. The level of noise is controlled by the noise_factor parameter. The function generates random noise using NumPy, adds it to the original data, and then returns the augmented data. To ensure the data type consistency, the augmented data is cast back to the same type as the elements in the original data array. This function can be used for data augmentation, a technique commonly employed in machine learning to enhance model robustness by introducing variations in the training data.
Let’s test this function using sample audio data, as follows:
Sample audio data
sample_data = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
Sample noise factor
sample_noise_factor = 0.05
Apply augmentation
augmented_data = add_noise(sample_data, sample_noise_factor)
Print the original and augmented data
print(“Original Data:”, sample_data)
print(“Augmented Data:”, augmented_data)
Here is the output:
Figure 11.8 – Representation of the augmented data
Now, let’s retrain our CNN model using data augmentation for the classification of dogs and cats sounds that we saw in the Hands-on – labeling audio data using a CNN section:
Load and preprocess audio data
def load_and_preprocess_data(data_dir, classes, target_shape=(128, 128)):
data = []
labels = []
noise_factor = 0.05
for i, class_name in enumerate(classes):
class_dir = os.path.join(data_dir, class_name)
for filename in os.listdir(class_dir):
if filename.endswith(‘.wav’):
file_path = os.path.join(class_dir, filename)
audio_data, sample_rate = librosa.load(file_path, sr=None)
Apply noise manipulation
noise = np.random.randn(len(audio_data))
augmented_data = audio_data + noise_factor * noise
augmented_data = augmented_data.astype(type(audio_data[0]))
Perform preprocessing (e.g., convert to Mel spectrogram and resize)
mel_spectrogram = librosa.feature.melspectrogram( \
y=augmented_data, sr=sample_rate)
mel_spectrogram = resize( \
np.expand_dims(mel_spectrogram, axis=-1), target_shape)
print(mel_spectrogram)
data.append(mel_spectrogram)
labels.append(i)
return np.array(data), np.array(labels)
In this code, we introduced data augmentation by adding random noise (noise_factor * noise) to the audio data before spectrogram conversion. This helps improve the model’s robustness by exposing it to varied instances of the same class during training:
test_accuracy=model.evaluate(X_test,y_test,verbose=0)
print(test_accuracy[1])
Here is the output:
Figure 11.9 – Accuracy of the model