API CheatSheet#
The following APIs are applicable for all detector models for easy use.
pyod.models.base.BaseDetector.fit()
: Fit detector. y is ignored in unsupervised methods.pyod.models.base.BaseDetector.decision_function()
: Predict raw anomaly score of X using the fitted detector.pyod.models.base.BaseDetector.predict()
: Predict if a particular sample is an outlier or not using the fitted detector.pyod.models.base.BaseDetector.predict_proba()
: Predict the probability of a sample being outlier using the fitted detector.pyod.models.base.BaseDetector.predict_confidence()
: Predict the model’s sample-wise confidence (available in predict and predict_proba).
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_
: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.pyod.models.base.BaseDetector.labels_
: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
See base class definition below:
pyod.models.base module#
Base class for all outlier detector models
- class pyod.models.base.BaseDetector(contamination=0.1)[source]#
Bases:
object
Abstract class for all outlier detection algorithms.
pyod would stop supporting Python 2 in the future. Consider move to Python 3.5+.
Parameters#
- contaminationfloat in (0., 0.5), optional (default=0.1)
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
Attributes#
- decision_scores_numpy array of shape (n_samples,)
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- threshold_float
The threshold is based on
contamination
. It is then_samples * contamination
most abnormal samples indecision_scores_
. The threshold is calculated for generating binary outlier labels.- labels_int, either 0 or 1
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_
ondecision_scores_
.
- abstract decision_function(X)[source]#
Predict raw anomaly scores of X using the fitted detector.
The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns#
- anomaly_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- abstract fit(X, y=None)[source]#
Fit detector. y is ignored in unsupervised methods.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
Returns#
- selfobject
Fitted estimator.
- fit_predict(X, y=None)[source]#
DEPRECATED
- Fit detector first and then predict whether a particular sample
is an outlier or not. y is ignored in unsupervised models.
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.
- fit_predict_score(X, y, scoring='roc_auc_score')[source]#
DEPRECATED
- Fit the detector, predict on samples, and evaluate the model by
predefined metrics, e.g., ROC.
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
- scoringstr, optional (default=’roc_auc_score’)
Evaluation metric:
‘roc_auc_score’: ROC score
‘prc_n_score’: Precision @ rank n score
score : float
Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.
- get_params(deep=True)[source]#
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Parameters#
- deepbool, optional (default=True)
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns#
- paramsmapping of string to any
Parameter names mapped to their values.
- predict(X, return_confidence=False)[source]#
Predict if a particular sample is an outlier or not.
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns#
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
- confidencenumpy array of shape (n_samples,).
Only if return_confidence is set to True.
- predict_confidence(X)[source]#
Predict the model’s confidence in making the same prediction under slightly different training sets. See [BPVD20].
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
Returns#
- confidencenumpy array of shape (n_samples,)
For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- predict_proba(X, method='linear', return_confidence=False)[source]#
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [BKKSZ11].
Parameters#
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- methodstr, optional (default=’linear’)
probability conversion method. It must be one of ‘linear’ or ‘unify’.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns#
- outlier_probabilitynumpy array of shape (n_samples, n_classes)
For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- set_params(**params)[source]#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Returns#
self : object