SoFunction
Updated on 2025-04-14

Detailed explanation of audio processing operations using Librosa in Python

1. Introduction

Librosa is a Python library for audio and music analysis that provides rich functionality to process and analyze audio signals. Whether it is music information retrieval, audio feature extraction, or audio visualization, Librosa is competent. This article will introduce the main features of Librosa in detail and show how to use them through code examples.

2. Install Librosa

Before you start, you need to install the Librosa library first. You can install it through the following command:

pip install librosa

3. Main functions and code examples

3.1 Loading audio files

Librosa can easily load audio files and convert them into NumPy arrays. The loaded audio data can be used for subsequent analysis and processing.

import librosa

# Load audio filesaudio_path = ''
y, sr = (audio_path)

print(f"Audio sampling rate: {sr}")
print(f"Audio data: {y}")

Explanation:

  • The () function is used to load an audio file and returns two values: y is the audio time series and sr is the sampling rate.
  • audio_path is the path to the audio file.
  • sr represents the number of samples per second, and y is a NumPy array containing audio samples.

3.2 Extract audio features

Librosa provides a variety of audio feature extraction methods, such as Mel frequency cepspectral coefficient (MFCC), chromaticity characteristics, spectral center of mass, etc.

3.2.1 Extracting MFCC features

import librosa
import numpy as np

# Load audio filesy, sr = ('')

# Extract MFCC featuresmfccs = (y=y, sr=sr, n_mfcc=13)

print(f"MFCCFeature shapes: {}")

Explanation:

  • The () function is used to extract MFCC features.
  • The n_mfcc parameter specifies the number of MFCC coefficients to be extracted.
  • mfccs is a two-dimensional array where each row corresponds to an MFCC coefficient and each column corresponds to a frame.

3.2.2 Extracting chromaticity features

# Extract chromaticity featureschroma = .chroma_stft(y=y, sr=sr)

print(f"Color characteristic shape: {}")

Explanation:

  • The .chroma_stft() function is used to extract chroma features.
  • The chromaticity feature represents the energy distribution of 12 different pitches in the audio signal.
  • chroma is a two-dimensional array where each row corresponds to a pitch category and each column corresponds to a frame.

3.3 Audio visualization

Librosa provides a variety of visualization tools to help users better understand audio data.

3.3.1 Drawing a waveform

import  as plt
import 

# Draw a waveform(figsize=(10, 4))
(y, sr=sr)
('Waveform Chart')
('time (Second)')
('amplitude')
()

Explanation:

  • The () function is used to draw an audio waveform diagram.
  • The figsize parameter sets the size of the image.
  • The waveform graph shows the amplitude of the audio signal over time.

3.3.2 Drawing the spectrum chart

# Calculate short-time Fourier transform (STFT)D = librosa.amplitude_to_db((y), ref=)

# Draw a spectrum chart(figsize=(10, 4))
(D, sr=sr, x_axis='time', y_axis='log')
(format='%+2.0f dB')
('Spectral Map')
()

Explanation:

  • The () function calculates the short-time Fourier transform (STFT) and converts the time domain signal into a frequency domain signal.
  • The librosa.amplitude_to_db() function converts amplitude into decibel (dB) units.
  • The () function is used to draw the spectrum graph.

3.4 Beat and rhythm analysis

Librosa can be used to analyze the beat and rhythm of audio.

3.4.1 Extract beat information

# Extract beat informationtempo, beat_frames = .beat_track(y=y, sr=sr)

print(f"The estimated rhythm (BPM): {tempo}")
print(f"Beat frame: {beat_frames}")

Explanation:

  • The .beat_track() function is used to estimate the rhythm (BPM) and beat position of the audio.
  • tempo is the estimated rhythm (beats per minute), and beat_frames is the detected beat frame.

3.4.2 Drawing beat charts

# Draw beatsbeat_times = librosa.frames_to_time(beat_frames, sr=sr)

(figsize=(10, 4))
(y, sr=sr, alpha=0.6)
(beat_times, -1, 1, color='r', linestyle='--', linewidth=2, alpha=0.9, label='Beat')
()
('Beat Picture')
()

Explanation:

  • The librosa.frames_to_time() function converts beat frames to time.
  • The () function draws a vertical red line on the waveform graph and marks the beat position.

3.5 Audio time stretching and pitch conversion

Librosa allows time stretching and pitch conversion to audio.

3.5.1 Time stretching

# Time stretchingy_stretch = .time_stretch(y, rate=1.5)

# Play stretched audioimport  as ipd
(y_stretch, rate=sr)

Explanation:

  • The .time_stretch() function is used to time stretch the audio.
  • The rate parameter specifies the ratio of stretching, greater than 1 means speeding up, and less than 1 means slowing down.

3.5.2 Pitch Change

# Pitch Changey_pitch = .pitch_shift(y, sr, n_steps=4)

# Play audio after pitch conversion(y_pitch, rate=sr)

Explanation:

  • The .pitch_shift() function is used to perform pitch transformations on audio.
  • The n_steps parameter specifies the semitone number of pitch changes, the positive number indicates the raised pitch, and the negative number indicates the lower pitch.

4. Summary

Librosa is a powerful audio processing library suitable for a variety of audio analysis tasks. This article introduces the main functions of Librosa, including audio loading, feature extraction, visualization, beat analysis, and time stretching and pitch transformation. With these features, users can easily perform audio signal processing and analysis.

Librosa's ease of use and abundant features make it one of the preferred tools in the audio processing world. Whether it is academic research or practical applications, Librosa can provide strong support. Hopefully, the code examples and explanations in this article can help you better understand and use the Librosa library.

The above is the detailed explanation of Python's audio processing operations using Librosa. For more information about Python's audio processing, please follow my other related articles!