API CheatSheet¶
The following APIs are applicable for all detector models for easy use.
pyod.models.base.BaseDetector.fit()
: Fit detector. y is ignored in unsupervised methods.pyod.models.base.BaseDetector.decision_function()
: Predict raw anomaly score of X using the fitted detector.pyod.models.base.BaseDetector.predict()
: Predict if a particular sample is an outlier or not using the fitted detector.pyod.models.base.BaseDetector.predict_proba()
: Predict the probability of a sample being outlier using the fitted detector.pyod.models.base.BaseDetector.fit_predict()
: [Deprecated in V0.6.9] Fit detector first and then predict whether a particular sample is an outlier or not.pyod.models.base.BaseDetector.fit_predict_score()
: [Deprecated in V0.6.9] Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_
: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.pyod.models.base.BaseDetector.labels_
: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
Note : fit_predict() and fit_predict_score() are deprecated in V0.6.9 due
to consistency issue and will be removed in V0.8.0. To get the binary labels
of the training data X_train, one should call clf.fit(X_train) and use
pyod.models.base.BaseDetector.labels_
, instead of calling clf.predict(X_train).
See base class definition below:
pyod.models.base module¶
Base class for all outlier detector models

class
pyod.models.base.
BaseDetector
(contamination=0.1)[source]¶ Bases:
object
Abstract class for all outlier detection algorithms.
pyod would stop supporting Python 2 in the future. Consider move to Python 3.5+.
 Parameters
contamination (float in (0., 0.5), optional (default=0.1)) – The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.

decision_scores_
¶ The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
 Type
numpy array of shape (n_samples,)

threshold_
¶ The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels. Type

labels_
¶ The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
. Type
int, either 0 or 1

abstract
decision_function
(X)[source]¶ Predict raw anomaly scores of X using the fitted detector.
The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
 Parameters
X (numpy array of shape (n_samples, n_features)) – The input samples. Sparse matrices are accepted only if they are supported by the base estimator.
 Returns
anomaly_scores – The anomaly score of the input samples.
 Return type
numpy array of shape (n_samples,)

abstract
fit
(X, y=None)[source]¶ Fit detector. y is ignored in unsupervised methods.
 Parameters
X (numpy array of shape (n_samples, n_features)) – The input samples.
y (Ignored) – Not used, present for API consistency by convention.
 Returns
self – Fitted estimator.
 Return type

fit_predict
(X, y=None)[source]¶ DEPRECATED
 Fit detector first and then predict whether a particular sample
is an outlier or not. y is ignored in unsupervised models.
 Xnumpy array of shape (n_samples, n_features)
The input samples.
 yIgnored
Not used, present for API consistency by convention.
 outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.

fit_predict_score
(X, y, scoring='roc_auc_score')[source]¶ DEPRECATED
 Fit the detector, predict on samples, and evaluate the model by
predefined metrics, e.g., ROC.
 Xnumpy array of shape (n_samples, n_features)
The input samples.
 yIgnored
Not used, present for API consistency by convention.
 scoringstr, optional (default=’roc_auc_score’)
Evaluation metric:
‘roc_auc_score’: ROC score
‘prc_n_score’: Precision @ rank n score
score : float
Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.

get_params
(deep=True)[source]¶ Get parameters for this estimator.
See http://scikitlearn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
 Parameters
deep (bool, optional (default=True)) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
params – Parameter names mapped to their values.
 Return type
mapping of string to any

predict
(X)[source]¶ Predict if a particular sample is an outlier or not.
 Parameters
X (numpy array of shape (n_samples, n_features)) – The input samples.
 Returns
outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
 Return type
numpy array of shape (n_samples,)

predict_proba
(X, method='linear')[source]¶ Predict the probability of a sample being outlier. Two approaches are possible:
simply use Minmax conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [BKKSZ11].
 Parameters
X (numpy array of shape (n_samples, n_features)) – The input samples.
method (str, optional (default='linear')) – probability conversion method. It must be one of ‘linear’ or ‘unify’.
 Returns
outlier_labels – For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1].
 Return type
numpy array of shape (n_samples,)

set_params
(**params)[source]¶ Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.See http://scikitlearn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
 Returns
self
 Return type