Audio Detectors¶
PyOD detects audio anomalies through two paths: the lightweight EmbeddingOD.for_audio() (handcrafted acoustic features run through any detector, see Text and Image Detectors) and the dedicated AudioAE deep detector below. Install with pip install pyod[audio].
pyod.models.audio_ae module¶
AudioAE: a log-mel reconstruction autoencoder for audio anomaly detection.
Each clip is turned into overlapping log-mel context windows; a dense
autoencoder is fit on the windows of the (mostly normal) training clips,
and each clip is scored by its mean per-window reconstruction error. This
is the DCASE-style audio anomaly detection baseline, expressed through
PyOD’s AutoEncoder so the training loop and preprocessing are shared
with the rest of the library.
- class pyod.models.audio_ae.AudioAE(n_mels=64, context=5, hop_length=512, sr=22050, contamination=0.1, epoch_num=40, batch_size=1024, lr=0.001, hidden_neuron_list=None, device=None, random_state=42, verbose=0)[source]¶
Bases:
BaseDetectorLog-mel reconstruction autoencoder for audio anomaly detection.
The detector extracts overlapping log-mel context windows from each clip, fits a dense autoencoder (PyOD’s
AutoEncoder) on the windows of the training clips, and scores each clip by its mean per-window reconstruction error. Training assumes the input is mostly normal, the usual unsupervised setting.Requires
torch(for the autoencoder) andpyod[audio](librosa,soundfile).Parameters¶
- n_melsint, optional (default=64)
Number of mel bands in the spectrogram.
- contextint, optional (default=5)
Number of consecutive frames stacked into one autoencoder input window. The window dimensionality is
n_mels * context.- hop_lengthint, optional (default=512)
STFT hop length in samples.
- srint, optional (default=22050)
Target sample rate. File inputs are loaded at this rate;
(waveform, sample_rate)tuples are resampled to it.- contaminationfloat, optional (default=0.1)
Expected proportion of outliers, used for the clip-level threshold and labels.
- epoch_numint, optional (default=40)
Autoencoder training epochs.
- batch_sizeint, optional (default=1024)
Autoencoder mini-batch size (over frames, not clips).
- lrfloat, optional (default=1e-3)
Learning rate.
- hidden_neuron_listlist of int or None, optional (default=None)
Encoder hidden sizes.
Noneuses[128, 32, 8], which gives the DCASE-style 320-128-32-8 contraction for the default 320-dimensional window (n_mels=64,context=5).- devicestr or None, optional (default=None)
Torch device.
Noneauto-selects.- random_stateint, optional (default=42)
Seed forwarded to the autoencoder.
- verboseint, optional (default=0)
Autoencoder verbosity.
Attributes¶
- decision_scores_numpy array of shape (n_clips,)
Clip-level outlier scores of the training data.
- threshold_float
Score threshold based on
contamination.- labels_numpy array of shape (n_clips,)
Binary labels of training clips (0: inlier, 1: outlier).
- ae_AutoEncoder
The fitted frame-level autoencoder.
Examples¶
>>> import numpy as np >>> from pyod.models.audio_ae import AudioAE >>> clips = [np.random.RandomState(s).randn(22050) for s in range(20)] >>> clf = AudioAE(epoch_num=5) >>> clf.fit(clips) >>> scores = clf.decision_function(clips)
- decision_function(X)[source]¶
Predict clip-level anomaly scores for X.
Parameters¶
- Xlist
Audio clips in the same formats accepted by
fit.
Returns¶
anomaly_scores : numpy array of shape (n_clips,)
- fit(X, y=None)[source]¶
Fit the frame autoencoder and score the training clips.
Parameters¶
- Xlist
Audio clips as file paths, waveform arrays, or
(waveform, sample_rate)tuples.- yIgnored
Not used, present for API consistency.
Returns¶
self : object
- set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') AudioAE¶
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters¶
- methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
methodparameter inpredict_proba.- return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_confidenceparameter inpredict_proba.
Returns¶
- selfobject
The updated object.
- set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') AudioAE¶
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters¶
- return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_confidenceparameter inpredict.
Returns¶
- selfobject
The updated object.