API CheatSheet

The full API Reference is split by modality: Tabular Detectors, Time Series Detectors, Graph Detectors, Text and Image Detectors, Audio Detectors, ADEngine, and Utility Functions. Below is a quick cheatsheet for the shared detector API:

Key Attributes of a fitted model:

  • pyod.models.base.BaseDetector.decision_scores_: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores. Outliers tend to have higher scores.

  • pyod.models.base.BaseDetector.labels_: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.

See base class definition below:

pyod.models.base module

Base class for all outlier detector models

class pyod.models.base.BaseDetector(contamination=0.1)[source]

Bases: BaseEstimator

Abstract class for all outlier detection algorithms.

Parameters

contaminationfloat in (0., 0.5), optional (default=0.1)

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

Attributes

decision_scores_numpy array of shape (n_samples,)

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

threshold_float

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

labels_int, either 0 or 1

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

compute_rejection_stats(T=32, delta=0.1, c_fp=1, c_fn=1, c_r=-1, verbose=False)[source]
Add reject option into the unsupervised detector.

This comes with guarantees: an estimate of the expected rejection rate (return_rejectrate=True), an upper bound of the rejection rate (return_ub_rejectrate= True), and an upper bound on the cost (return_ub_cost=True).

Parameters

T: int, optional(default=32)

It allows to set the rejection threshold to 1-2exp(-T). The higher the value of T, the more rejections are made.

delta: float, optional (default = 0.1)

The upper bound rejection rate holds with probability 1-delta.

c_fp, c_fn, c_r: floats (positive),

optional (default = [1,1, contamination]) costs for false positive predictions (c_fp), false negative predictions (c_fn) and rejections (c_r).

verbose: bool, optional (default = False)

If true, it prints the expected rejection rate, the upper bound rejection rate, and the upper bound of the cost.

Returns

expected_rejection_rate: float, the expected rejection rate; upperbound_rejection_rate: float, the upper bound for the rejection rate

satisfied with probability 1-delta;

upperbound_cost: float, the upper bound for the cost;

abstractmethod decision_function(X)[source]

Predict raw anomaly scores of X using the fitted detector.

The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

anomaly_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

abstractmethod fit(X, y=None)[source]

Fit detector. y is ignored in unsupervised methods.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

Returns

selfobject

Fitted estimator.

fit_predict(X, y=None)[source]

Fit detector first and then predict whether a particular sample is an outlier or not. y is ignored in unsupervised models.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

Returns

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.

fit_predict_score(X, y, scoring='roc_auc_score')[source]

Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

scoringstr, optional (default=’roc_auc_score’)

Evaluation metric:

  • ‘roc_auc_score’: ROC score

  • ‘prc_n_score’: Precision @ rank n score

Returns

score : float

Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters

deepbool, optional (default=True)

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsmapping of string to any

Parameter names mapped to their values.

predict(X, return_confidence=False)[source]

Predict if a particular sample is an outlier or not.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

Returns

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

confidencenumpy array of shape (n_samples,).

Only if return_confidence is set to True.

predict_confidence(X)[source]

Predict the model’s confidence in making the same prediction under slightly different training sets. See [MPVD20].

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

Returns

confidencenumpy array of shape (n_samples,)

For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

predict_proba(X, method='linear', return_confidence=False)[source]

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [MKKSZ11].

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

methodstr, optional (default=’linear’)

probability conversion method. It must be one of ‘linear’ or ‘unify’.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

Returns

outlier_probabilitynumpy array of shape (n_samples, n_classes)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

predict_with_rejection(X, T=32, return_stats=False, delta=0.1, c_fp=1, c_fn=1, c_r=-1)[source]
Predict if a particular sample is an outlier or not,

allowing the detector to reject (i.e., output = -2) low confidence predictions.

Parameters

Xnumpy array of shape (n_samples, n_features)

The input samples.

Tint, optional(default=32)

It allows to set the rejection threshold to 1-2exp(-T). The higher the value of T, the more rejections are made.

return_stats: bool, optional (default = False)

If true, it returns also three additional float values: the estimated rejection rate, the upper bound rejection rate, and the upper bound of the cost.

delta: float, optional (default = 0.1)

The upper bound rejection rate holds with probability 1-delta.

c_fp, c_fn, c_r: floats (positive), optional (default = [1,1, contamination])

costs for false positive predictions (c_fp), false negative predictions (c_fn) and rejections (c_r).

Returns

outlier_labelsnumpy array of shape (n_samples,)

For each observation, it tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers, 1 for outliers and -2 for rejection.

expected_rejection_rate: float, if return_stats is True; upperbound_rejection_rate: float, if return_stats is True; upperbound_cost: float, if return_stats is True;

set_params(**params)[source]

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns

self : object

set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') BaseDetector

Configure whether metadata should be requested to be passed to the predict_proba method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for method parameter in predict_proba.

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_confidence parameter in predict_proba.

Returns

selfobject

The updated object.

set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') BaseDetector

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_confidence parameter in predict.

Returns

selfobject

The updated object.