Audio Detectors

PyOD detects audio anomalies through two paths: the lightweight EmbeddingOD.for_audio() (handcrafted acoustic features run through any detector, see Text and Image Detectors) and the dedicated AudioAE deep detector below. Install with pip install pyod[audio].

pyod.models.audio_ae module

AudioAE: a log-mel reconstruction autoencoder for audio anomaly detection.

Each clip is turned into overlapping log-mel context windows; a dense autoencoder is fit on the windows of the (mostly normal) training clips, and each clip is scored by its mean per-window reconstruction error. This is the DCASE-style audio anomaly detection baseline, expressed through PyOD’s AutoEncoder so the training loop and preprocessing are shared with the rest of the library.

class pyod.models.audio_ae.AudioAE(n_mels=64, context=5, hop_length=512, sr=22050, contamination=0.1, epoch_num=40, batch_size=1024, lr=0.001, hidden_neuron_list=None, device=None, random_state=42, verbose=0)[source]

Bases: BaseDetector

Log-mel reconstruction autoencoder for audio anomaly detection.

The detector extracts overlapping log-mel context windows from each clip, fits a dense autoencoder (PyOD’s AutoEncoder) on the windows of the training clips, and scores each clip by its mean per-window reconstruction error. Training assumes the input is mostly normal, the usual unsupervised setting.

Requires torch (for the autoencoder) and pyod[audio] (librosa, soundfile).

Parameters

n_melsint, optional (default=64)

Number of mel bands in the spectrogram.

contextint, optional (default=5)

Number of consecutive frames stacked into one autoencoder input window. The window dimensionality is n_mels * context.

hop_lengthint, optional (default=512)

STFT hop length in samples.

srint, optional (default=22050)

Target sample rate. File inputs are loaded at this rate; (waveform, sample_rate) tuples are resampled to it.

contaminationfloat, optional (default=0.1)

Expected proportion of outliers, used for the clip-level threshold and labels.

epoch_numint, optional (default=40)

Autoencoder training epochs.

batch_sizeint, optional (default=1024)

Autoencoder mini-batch size (over frames, not clips).

lrfloat, optional (default=1e-3)

Learning rate.

hidden_neuron_listlist of int or None, optional (default=None)

Encoder hidden sizes. None uses [128, 32, 8], which gives the DCASE-style 320-128-32-8 contraction for the default 320-dimensional window (n_mels=64, context=5).

devicestr or None, optional (default=None)

Torch device. None auto-selects.

random_stateint, optional (default=42)

Seed forwarded to the autoencoder.

verboseint, optional (default=0)

Autoencoder verbosity.

Attributes

decision_scores_numpy array of shape (n_clips,)

Clip-level outlier scores of the training data.

threshold_float

Score threshold based on contamination.

labels_numpy array of shape (n_clips,)

Binary labels of training clips (0: inlier, 1: outlier).

ae_AutoEncoder

The fitted frame-level autoencoder.

Examples

>>> import numpy as np
>>> from pyod.models.audio_ae import AudioAE
>>> clips = [np.random.RandomState(s).randn(22050) for s in range(20)]
>>> clf = AudioAE(epoch_num=5)
>>> clf.fit(clips)
>>> scores = clf.decision_function(clips)
decision_function(X)[source]

Predict clip-level anomaly scores for X.

Parameters

Xlist

Audio clips in the same formats accepted by fit.

Returns

anomaly_scores : numpy array of shape (n_clips,)

fit(X, y=None)[source]

Fit the frame autoencoder and score the training clips.

Parameters

Xlist

Audio clips as file paths, waveform arrays, or (waveform, sample_rate) tuples.

yIgnored

Not used, present for API consistency.

Returns

self : object

set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') AudioAE

Configure whether metadata should be requested to be passed to the predict_proba method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for method parameter in predict_proba.

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_confidence parameter in predict_proba.

Returns

selfobject

The updated object.

set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') AudioAE

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_confidence parameter in predict.

Returns

selfobject

The updated object.