API CheatSheet

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

Note : fit_predict() and fit_predict_score() are deprecated in V0.6.9 due to consistency issue and will be removed in V0.8.0. To get the binary labels of the training data X_train, one should call clf.fit(X_train) and use pyod.models.base.BaseDetector.labels_, instead of calling clf.predict(X_train).

See base class definition below:

pyod.models.base module

Base class for all outlier detector models

class pyod.models.base.BaseDetector(contamination=0.1)[source]

Bases: object

Abstract class for all outlier detection algorithms.

pyod would stop supporting Python 2 in the future. Consider move to Python 3.5+.

Parameters

contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_

The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.

Type

numpy array of shape (n_samples,)

threshold_

The threshold is based on contamination. It is the n_samples * contamination most abnormal samples in decision_scores_. The threshold is calculated for generating binary outlier labels.

Type

float

labels_

The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying threshold_ on decision_scores_.

Type

int, either 0 or 1

abstract decision_function(X)[source]

Predict raw anomaly scores of X using the fitted detector.

The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters

X (numpy array of shape (n_samples, n_features)) – The input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns

anomaly_scores – The anomaly score of the input samples.

Return type

numpy array of shape (n_samples,)

abstract fit(X, y=None)[source]

Fit detector. y is ignored in unsupervised methods.

Parameters
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns

self – Fitted estimator.

Return type

object

fit_predict(X, y=None)[source]

DEPRECATED

Fit detector first and then predict whether a particular sample

is an outlier or not. y is ignored in unsupervised models.

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

outlier_labelsnumpy array of shape (n_samples,)

For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.

fit_predict_score(X, y, scoring='roc_auc_score')[source]

DEPRECATED

Fit the detector, predict on samples, and evaluate the model by

predefined metrics, e.g., ROC.

Xnumpy array of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

scoringstr, optional (default=’roc_auc_score’)

Evaluation metric:

  • ‘roc_auc_score’: ROC score

  • ‘prc_n_score’: Precision @ rank n score

score : float

Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.

get_params(deep=True)[source]

Get parameters for this estimator.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Parameters

deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

predict(X)[source]

Predict if a particular sample is an outlier or not.

Parameters

X (numpy array of shape (n_samples, n_features)) – The input samples.

Returns

outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

Return type

numpy array of shape (n_samples,)

predict_proba(X, method='linear')[source]

Predict the probability of a sample being outlier. Two approaches are possible:

  1. simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.

  2. use unifying scores, see [BKKSZ11].

Parameters
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.

Returns

outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1].

Return type

numpy array of shape (n_samples,)

set_params(**params)[source]

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.

Returns

self

Return type

object