Audio Detectors¶

PyOD detects audio anomalies through two paths: the lightweight EmbeddingOD.for_audio() (handcrafted acoustic features run through any detector, see Text and Image Detectors) and the dedicated AudioAE deep detector below. Install with pip install pyod[audio].

pyod.models.audio_ae module¶

AudioAE: a log-mel reconstruction autoencoder for audio anomaly detection.

Each clip is turned into overlapping log-mel context windows; a dense autoencoder is fit on the windows of the (mostly normal) training clips, and each clip is scored by its mean per-window reconstruction error. This is the DCASE-style audio anomaly detection baseline, expressed through PyOD’s AutoEncoder so the training loop and preprocessing are shared with the rest of the library.

class pyod.models.audio_ae.AudioAE(n_mels=64, context=5, hop_length=512, sr=22050, contamination=0.1, epoch_num=40, batch_size=1024, lr=0.001, hidden_neuron_list=None, device=None, random_state=42, verbose=0)[source]¶

Bases: BaseDetector

Log-mel reconstruction autoencoder for audio anomaly detection.

The detector extracts overlapping log-mel context windows from each clip, fits a dense autoencoder (PyOD’s AutoEncoder) on the windows of the training clips, and scores each clip by its mean per-window reconstruction error. Training assumes the input is mostly normal, the usual unsupervised setting.

Requires torch (for the autoencoder) and pyod[audio] (librosa, soundfile).

Parameters¶

n_melsint, optional (default=64): Number of mel bands in the spectrogram.
contextint, optional (default=5): Number of consecutive frames stacked into one autoencoder input window. The window dimensionality is n_mels * context.
hop_lengthint, optional (default=512): STFT hop length in samples.
srint, optional (default=22050): Target sample rate. File inputs are loaded at this rate; (waveform, sample_rate) tuples are resampled to it.
contaminationfloat, optional (default=0.1): Expected proportion of outliers, used for the clip-level threshold and labels.
epoch_numint, optional (default=40): Autoencoder training epochs.
batch_sizeint, optional (default=1024): Autoencoder mini-batch size (over frames, not clips).
lrfloat, optional (default=1e-3): Learning rate.
hidden_neuron_listlist of int or None, optional (default=None): Encoder hidden sizes. None uses [128, 32, 8], which gives the DCASE-style 320-128-32-8 contraction for the default 320-dimensional window (n_mels=64, context=5).
devicestr or None, optional (default=None): Torch device. None auto-selects.
random_stateint, optional (default=42): Seed forwarded to the autoencoder.
verboseint, optional (default=0): Autoencoder verbosity.

Attributes¶

decision_scores_numpy array of shape (n_clips,): Clip-level outlier scores of the training data.
threshold_float: Score threshold based on contamination.
labels_numpy array of shape (n_clips,): Binary labels of training clips (0: inlier, 1: outlier).
ae_AutoEncoder: The fitted frame-level autoencoder.

Examples¶

>>> import numpy as np
>>> from pyod.models.audio_ae import AudioAE
>>> clips = [np.random.RandomState(s).randn(22050) for s in range(20)]
>>> clf = AudioAE(epoch_num=5)
>>> clf.fit(clips)
>>> scores = clf.decision_function(clips)

decision_function(X)[source]¶

Predict clip-level anomaly scores for X.

Parameters¶

Xlist: Audio clips in the same formats accepted by fit.

Returns¶

anomaly_scores : numpy array of shape (n_clips,)

fit(X, y=None)[source]¶

Fit the frame autoencoder and score the training clips.

Parameters¶

Xlist: Audio clips as file paths, waveform arrays, or (waveform, sample_rate) tuples.
yIgnored: Not used, present for API consistency.

Returns¶

self : object

set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') → AudioAE¶

Configure whether metadata should be requested to be passed to the predict_proba method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict_proba.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for method parameter in predict_proba.
return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict_proba.

Returns¶

selfobject: The updated object.

set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') → AudioAE¶

Configure whether metadata should be requested to be passed to the predict method.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict.

Returns¶

selfobject: The updated object.