API CheatSheet#

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

  • pyod.models.base.BaseDetector.decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.

  • pyod.models.base.BaseDetector.labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.

See base class definition below:

pyod.models.base module#

Base class for all outlier detector models

class pyod.models.base.BaseDetector(contamination=0.1)[source]#

Bases: object

Abstract class for all outlier detection algorithms.

pyod would stop supporting Python 2 in the future. Consider move to Python 3.5+.

contaminationfloat in (0., 0.5), optional (default=0.1)

The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_numpy array of shape (n_samples,)

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

threshold_float

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

labels_int, either 0 or 1

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

abstract decision_function(X)[source]#

Predict raw anomaly scores of X using the fitted detector.

The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Xnumpy array of shape (n_samples, n_features)

The input samples. Sparse matrices are accepted only if they are supported by the base estimator.

anomaly_scoresnumpy array of shape (n_samples,)

The anomaly score of the input samples.

abstract fit(X, y=None)[source]#

Fit detector. y is ignored in unsupervised methods.

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

selfobject

Fitted estimator.

fit_predict(X, y=None)[source]#

DEPRECATED

Fit detector first and then predict whether a particular sample

is an outlier or not. y is ignored in unsupervised models.

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.

fit_predict_score(X, y, scoring='roc_auc_score')[source]#

DEPRECATED

Fit the detector, predict on samples, and evaluate the model by

predefined metrics, e.g., ROC.

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

scoringstr, optional (default=’roc_auc_score’)

Evaluation metric:

  • ‘roc_auc_score’: ROC score

  • ‘prc_n_score’: Precision @ rank n score

score : float

Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.

get_params(deep=True)[source]#

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

deepbool, optional (default=True)

If True, will return the parameters for this estimator and contained subobjects that are estimators.

paramsmapping of string to any

Parameter names mapped to their values.

predict(X, return_confidence=False)[source]#

Predict if a particular sample is an outlier or not.

Xnumpy array of shape (n_samples, n_features)

The input samples.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

confidencenumpy array of shape (n_samples,).

Only if return_confidence is set to True.

predict_confidence(X)[source]#

Predict the model’s confidence in making the same prediction under slightly different training sets. See [BPVD20].

Xnumpy array of shape (n_samples, n_features)

The input samples.

confidencenumpy array of shape (n_samples,)

For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].

predict_proba(X, method='linear', return_confidence=False)[source]#

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [BKKSZ11].

Xnumpy array of shape (n_samples, n_features)

The input samples.

methodstr, optional (default=’linear’)

probability conversion method. It must be one of ‘linear’ or ‘unify’.

return_confidenceboolean, optional(default=False)

If True, also return the confidence of prediction.

outlier_probabilitynumpy array of shape (n_samples, n_classes)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

self : object