API CheatSheet#
The full API Reference is available at PyOD Documentation. Below is a quick cheatsheet for all detectors:
pyod.models.base.BaseDetector.fit()
: The parameter y is ignored in unsupervised methods.pyod.models.base.BaseDetector.decision_function()
: Predict raw anomaly scores for X using the fitted detector.pyod.models.base.BaseDetector.predict()
: Determine whether a sample is an outlier or not as binary labels using the fitted detector.pyod.models.base.BaseDetector.predict_proba()
: Estimate the probability of a sample being an outlier using the fitted detector.pyod.models.base.BaseDetector.predict_confidence()
: Assess the model’s confidence on a per-sample basis (applicable in predict and predict_proba) [#Perini2020Quantifying]_.
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_
: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores. Outliers tend to have higher scores.pyod.models.base.BaseDetector.labels_
: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.
See base class definition below:
pyod.models.base module#
Base class for all outlier detector models
- class pyod.models.base.BaseDetector(contamination=0.1)[source]#
Bases:
object
Abstract class for all outlier detection algorithms.
Parameters#
- contaminationfloat in (0., 0.5), optional (default=0.1)
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
Attributes#
- decision_scores_numpy array of shape (n_samples,)
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- threshold_float
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- labels_int, either 0 or 1
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.
- abstract decision_function(X)[source]#
Predict raw anomaly scores of X using the fitted detector.
The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns#
- anomaly_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- abstract fit(X, y=None)[source]#
Fit detector. y is ignored in unsupervised methods.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
Returns#
- selfobject
Fitted estimator.
- fit_predict(X, y=None)[source]#
Fit detector first and then predict whether a particular sample is an outlier or not. y is ignored in unsupervised models.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
Returns#
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.
- fit_predict_score(X, y, scoring='roc_auc_score')[source]#
Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
- scoringstr, optional (default=’roc_auc_score’)
Evaluation metric:
‘roc_auc_score’: ROC score
‘prc_n_score’: Precision @ rank n score
Returns#
score : float
Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.
- get_params(deep=True)[source]#
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Parameters#
- deepbool, optional (default=True)
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns#
- paramsmapping of string to any
Parameter names mapped to their values.
- predict(X, return_confidence=False)[source]#
Predict if a particular sample is an outlier or not.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns#
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
- confidencenumpy array of shape (n_samples,).
Only if return_confidence is set to True.
- predict_confidence(X)[source]#
Predict the model’s confidence in making the same prediction under slightly different training sets. See [BPVD20].
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
Returns#
- confidencenumpy array of shape (n_samples,)
For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- predict_proba(X, method='linear', return_confidence=False)[source]#
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [BKKSZ11].
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- methodstr, optional (default=’linear’)
probability conversion method. It must be one of ‘linear’ or ‘unify’.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns#
- outlier_probabilitynumpy array of shape (n_samples, n_classes)
For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- set_params(**params)[source]#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Returns#
self : object