API CheatSheet#

The full API Reference is available at PyOD Documentation. Below is a quick cheatsheet for all detectors:

Key Attributes of a fitted model:

  • pyod.models.base.BaseDetector.decision_scores_: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores. Outliers tend to have higher scores.

  • pyod.models.base.BaseDetector.labels_: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.

See base class definition below:

pyod.models.base module#

Base class for all outlier detector models

class pyod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

Parameters#

contaminationfloat in (0., 0.5), optional (default=0.1)

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

Attributes#

decision_scores_numpy array of shape (n_samples,)

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

threshold_float

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

labels_int, either 0 or 1

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

abstract decision_function(X)[source]#

Predict raw anomaly scores of X using the fitted detector.

The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns#

anomaly_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

abstract fit(X, y=None)[source]#

Fit detector. y is ignored in unsupervised methods.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

Returns#

selfobject

Fitted estimator.

fit_predict(X, y=None)[source]#

Fit detector first and then predict whether a particular sample is an outlier or not. y is ignored in unsupervised models.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

Returns#

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.

fit_predict_score(X, y, scoring='roc_auc_score')[source]#

Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

scoringstr, optional (default=’roc_auc_score’)

Evaluation metric:

  • ‘roc_auc_score’: ROC score

  • ‘prc_n_score’: Precision @ rank n score

Returns#

score : float

Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.

get_params(deep=True)[source]#

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters#

deepbool, optional (default=True)

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns#

paramsmapping of string to any

Parameter names mapped to their values.

predict(X, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

Returns#

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

confidencenumpy array of shape (n_samples,).

Only if return_confidence is set to True.

predict_confidence(X)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [BPVD20].

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

Returns#

confidencenumpy array of shape (n_samples,)

For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

predict_proba(X, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [BKKSZ11].

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

methodstr, optional (default=’linear’)

probability conversion method. It must be one of ‘linear’ or ‘unify’.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

Returns#

outlier_probabilitynumpy array of shape (n_samples, n_classes)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns#

self : object