Text and Image Detectors¶

PyOD’s EmbeddingOD chains foundation model encoders (sentence-transformers, OpenAI, HuggingFace) with any PyOD detector for text and image anomaly detection. Rankings from NLP-ADBench.

See Layer 1: Text and Image Anomaly Detection for usage.

pyod.models.embedding module¶

EmbeddingOD and MultiModalOD: Anomaly detection via foundation model embeddings.

EmbeddingOD chains any embedding encoder with any PyOD detector, enabling anomaly detection on text, image, and other non-tabular data through PyOD’s standard API. MultiModalOD extends this to multi-modal data by running separate detectors per modality and fusing their scores.

class pyod.models.embedding.EmbeddingOD(encoder, detector='LUNAR', contamination=0.1, batch_size=32, cache_embeddings=False, reduce_dim=None, standardize=True, random_state=None)[source]¶

Bases: BaseDetector

Anomaly detection on raw data via embedding + detector pipeline.

Chains any embedding encoder with any PyOD detector. Encode raw data (text, images, or other modalities) into numeric embeddings, then apply outlier detection in the embedding space.

This implements the two-step approach shown to outperform end-to-end methods in NLP-ADBench (Li et al., EMNLP 2025) and TAD-Bench (Cao et al., 2025).

Parameters¶

encoderstr, BaseEncoder, SentenceTransformer instance, or callable

Embedding encoder. Accepts: - Registry shortcut: ‘all-MiniLM-L6-v2’, ‘text-embedding-3-small’,

‘dinov2-base’

HuggingFace model ID: ‘sentence-transformers/all-MiniLM-L6-v2’
Local filesystem path: ‘/path/to/local/weights’ — loaded without any network call, suitable for air-gapped environments.
Pre-instantiated SentenceTransformer: passed directly, no reload.
BaseEncoder instance
Callable: fn(X) -> np.ndarray of shape (n_samples, n_features)

detectorstr or BaseDetector, optional (default=’LUNAR’)

Any PyOD detector. String resolves to default-configured instance. Default is LUNAR (best performer in NLP-ADBench).

contaminationfloat, optional (default=0.1)

Expected proportion of outliers in the dataset. Must be in (0, 0.5].

batch_sizeint, optional (default=32)

Batch size for encoding.

cache_embeddingsbool, optional (default=False)

Cache training embeddings to avoid re-encoding. Recommended for API-based encoders (e.g., OpenAI).

reduce_dimint or None, optional (default=None)

If set, apply PCA to reduce embedding dimensionality before detection. Recommended for embeddings >1000 dims with distance-based detectors (KNN, LOF).

standardizebool, optional (default=True)

Apply StandardScaler to embeddings before detection. Matches the preprocessing pipeline in NLP-ADBench.

random_stateint, RandomState instance or None, optional (default=None)

Controls stochastic parts of EmbeddingOD. The seed is forwarded to (a) the dimensionality-reduction PCA when reduce_dim is set (PCA may pick a randomized SVD solver on high-dimensional embeddings) and (b) the string-resolved inner detector when that detector class declares an explicit random_state parameter (e.g., the default 'LUNAR' preset, or 'IForest'). It does NOT control the external encoder’s own inference (e.g., sentence-transformers, DINOv2), which is treated as deterministic given fixed weights. When ADEngine(random_state=...) builds a preset plan, the engine seed flows here automatically.

Attributes¶

decision_scores_numpy array of shape (n_samples,): Outlier scores of the training data. Higher is more abnormal.
threshold_float: Score threshold based on contamination.
labels_numpy array of shape (n_samples,): Binary labels of training data (0: inlier, 1: outlier).
encoder_BaseEncoder: The resolved encoder instance.
detector_BaseDetector: The resolved and fitted detector instance.

Examples¶

>>> from pyod.models.embedding import EmbeddingOD
>>> clf = EmbeddingOD(encoder='all-MiniLM-L6-v2', detector='KNN')
>>> clf.fit(train_texts)
>>> scores = clf.decision_function(test_texts)
>>> labels = clf.predict(test_texts)

# Air-gapped: local filesystem weights >>> clf = EmbeddingOD(encoder=’/path/to/local/weights’, detector=’KNN’) >>> clf.fit(texts)

# Pre-instantiated model (e.g., shared across multiple classifiers) >>> from sentence_transformers import SentenceTransformer >>> my_model = SentenceTransformer(‘all-MiniLM-L6-v2’) >>> clf = EmbeddingOD(encoder=my_model, detector=’IForest’) >>> clf.fit(texts)

decision_function(X)[source]¶

Predict raw anomaly scores for X.

Parameters¶

Xlist or array-like: Raw input data in the same format as fit().

Returns¶

anomaly_scoresnumpy array of shape (n_samples,): Anomaly scores. Higher is more abnormal.

fit(X, y=None)[source]¶

Fit detector on raw input data.

Encodes X into embeddings, applies preprocessing, then fits the inner detector.

Parameters¶

Xlist or array-like: Raw input data (e.g., list of strings for text, list of PIL Images for images).
yIgnored: Not used, present for API consistency.

Returns¶

selfobject: Fitted estimator.

classmethod for_audio(quality='balanced', **kwargs)[source]¶

Create an EmbeddingOD configured for audio anomaly detection.

Uses a handcrafted 74-dim acoustic feature encoder (20 MFCC, 12 chroma, and 5 spectral descriptors, each as mean and standard deviation over frames) followed by a classical PyOD detector. This embed-then-detect pattern with classical detectors is competitive on standard audio anomaly detection benchmarks and needs no GPU. Requires pyod[audio] (librosa, soundfile).

Input clips may be file paths, waveform arrays, or (waveform, sample_rate) tuples.

Parameters¶

qualitystr, optional (default=’balanced’)

‘fast’: handcrafted features + IForest.
‘balanced’: handcrafted features + KNN.
‘best’: handcrafted features + LUNAR (requires torch).

**kwargs

Override any EmbeddingOD parameter.

Returns¶

clf : EmbeddingOD

classmethod for_image(quality='balanced', **kwargs)[source]¶

Create an EmbeddingOD configured for image anomaly detection.

Configurations are informed by AnomalyDINO (WACV 2025).

Parameters¶

qualitystr, optional (default=’balanced’)

‘fast’: DINOv2-small (384d) + KNN.
‘balanced’: DINOv2-base (768d) + LOF.
‘best’: DINOv2-large (1024d) + KNN.

**kwargs

Override any EmbeddingOD parameter.

Returns¶

clf : EmbeddingOD

classmethod for_text(quality='balanced', **kwargs)[source]¶

Create an EmbeddingOD configured for text anomaly detection.

Configurations are informed by NLP-ADBench (EMNLP 2025).

Parameters¶

qualitystr, optional (default=’balanced’)

‘fast’: MiniLM encoder (384d) + KNN. No API key needed.
‘balanced’: mpnet encoder (768d) + LUNAR. No API key needed.
‘best’: OpenAI large (3072d) + LUNAR. Requires API key.

**kwargs

Override any EmbeddingOD parameter.

Returns¶

clf : EmbeddingOD

predict_proba(X, method='linear', return_confidence=False)[source]¶

Predict the probability of a sample being an outlier.

Overrides the base implementation to handle list inputs (raw data such as text or images) which do not have a .shape attribute.

Parameters¶

Xlist or array-like: Raw input data in the same format as fit().
methodstr, optional (default=’linear’): Probability conversion method. One of ‘linear’ or ‘unify’.
return_confidenceboolean, optional (default=False): If True, also return the confidence of prediction.

Returns¶

outlier_probability : numpy array of shape (n_samples, n_classes)

set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') → EmbeddingOD¶

Configure whether metadata should be requested to be passed to the predict_proba method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict_proba.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for method parameter in predict_proba.
return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict_proba.

Returns¶

selfobject: The updated object.

set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') → EmbeddingOD¶

Configure whether metadata should be requested to be passed to the predict method.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict.

Returns¶

selfobject: The updated object.

class pyod.models.embedding.MultiModalOD(modalities, combination='average', contamination=0.1, standardize_scores=True)[source]¶

Bases: BaseDetector

Multi-modal anomaly detection via score fusion.

Runs a separate detector per modality and combines their anomaly scores. Each modality can use a different detector and encoder. Score combination uses PyOD’s existing combination functions.

This is complementary to using MultiModalEncoder with EmbeddingOD (early/feature fusion). Score fusion is preferred when modalities have very different characteristics or when per-modality anomaly scores are independently meaningful.

Parameters¶

modalitiesdict of {str: BaseDetector}: Maps modality name to a detector. Each detector can be: - An EmbeddingOD instance (for text/image modalities) - Any BaseDetector instance (for tabular modalities)
combinationstr, optional (default=’average’): Score combination method. One of ‘average’, ‘maximization’, ‘median’.
contaminationfloat, optional (default=0.1): Expected proportion of outliers. Used for threshold and labels on the combined scores.
standardize_scoresbool, optional (default=True): Standardize per-modality scores to zero mean and unit variance before combination. Recommended when detectors produce scores on different scales.

Attributes¶

decision_scores_numpy array of shape (n_samples,): Combined outlier scores of the training data.
threshold_float: Score threshold based on contamination.
labels_numpy array of shape (n_samples,): Binary labels (0: inlier, 1: outlier).
detectors_dict of {str: BaseDetector}: The fitted detectors per modality.

Examples¶

>>> from pyod.models.embedding import EmbeddingOD, MultiModalOD
>>> from pyod.models.knn import KNN
>>> clf = MultiModalOD(modalities={
...     'text': EmbeddingOD(encoder='all-MiniLM-L6-v2', detector='KNN'),
...     'tabular': KNN(),
... })
>>> data = {'text': train_texts, 'tabular': X_train}
>>> clf.fit(data)
>>> scores = clf.decision_function(data)

decision_function(X)[source]¶

Predict combined anomaly scores for X.

Parameters¶

Xdict of {str: data}: Maps modality name to test data. A modality value of None means that modality is entirely missing for all test samples; its score is imputed as 0. When standardize_scores=True (default), 0 is the training mean, so the missing modality contributes “average” to the combined score. When standardize_scores=False, 0 is a raw score and may not be neutral; enable standardization for principled missing-data handling. Note that imputation reduces variance in the fused score compared to training, so predict() thresholds may be less calibrated. Use decision_function() and apply custom thresholds for best results with missing modalities.

Returns¶

anomaly_scores : numpy array of shape (n_samples,)

fit(X, y=None)[source]¶

Fit a detector per modality on the input data.

Parameters¶

Xdict of {str: data}: Maps modality name to training data. Keys must match the modalities dict.
yIgnored: Not used, present for API consistency.

Returns¶

selfobject: Fitted estimator.

set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') → MultiModalOD¶

Configure whether metadata should be requested to be passed to the predict_proba method.

The options for each parameter are:

True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict_proba.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for method parameter in predict_proba.
return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict_proba.

Returns¶

selfobject: The updated object.

set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') → MultiModalOD¶

Configure whether metadata should be requested to be passed to the predict method.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters¶

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for return_confidence parameter in predict.

Returns¶

selfobject: The updated object.