API CheatSheet¶
The full API Reference is split by modality: Tabular Detectors, Time Series Detectors, Graph Detectors, Text and Image Detectors, Audio Detectors, ADEngine, and Utility Functions. Below is a quick cheatsheet for the shared detector API:
pyod.models.base.BaseDetector.fit(): The parameter y is ignored in unsupervised methods.pyod.models.base.BaseDetector.decision_function(): Predict raw anomaly scores for X using the fitted detector.pyod.models.base.BaseDetector.predict(): Determine whether a sample is an outlier or not as binary labels using the fitted detector.pyod.models.base.BaseDetector.predict_proba(): Estimate the probability of a sample being an outlier using the fitted detector.pyod.models.base.BaseDetector.predict_confidence(): Assess the model’s confidence on a per-sample basis (applicable in predict and predict_proba) [APVD20].
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_: Outlier scores of the training data. Higher scores typically indicate more abnormal behavior. Outliers usually have higher scores. Outliers tend to have higher scores.pyod.models.base.BaseDetector.labels_: Binary labels of the training data, where 0 indicates inliers and 1 indicates outliers/anomalies.
See base class definition below:
pyod.models.base module¶
Base class for all outlier detector models
- class pyod.models.base.BaseDetector(contamination=0.1)[source]¶
Bases:
BaseEstimatorAbstract class for all outlier detection algorithms.
Parameters¶
- contaminationfloat in (0., 0.5), optional (default=0.1)
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function.
Attributes¶
- decision_scores_numpy array of shape (n_samples,)
The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted.
- threshold_float
The threshold is based on
contamination. It is then_samples * contaminationmost abnormal samples indecision_scores_. The threshold is calculated for generating binary outlier labels.- labels_int, either 0 or 1
The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies. It is generated by applying
threshold_ondecision_scores_.
- compute_rejection_stats(T=32, delta=0.1, c_fp=1, c_fn=1, c_r=-1, verbose=False)[source]¶
- Add reject option into the unsupervised detector.
This comes with guarantees: an estimate of the expected rejection rate (return_rejectrate=True), an upper bound of the rejection rate (return_ub_rejectrate= True), and an upper bound on the cost (return_ub_cost=True).
Parameters¶
- T: int, optional(default=32)
It allows to set the rejection threshold to 1-2exp(-T). The higher the value of T, the more rejections are made.
- delta: float, optional (default = 0.1)
The upper bound rejection rate holds with probability 1-delta.
- c_fp, c_fn, c_r: floats (positive),
optional (default = [1,1, contamination]) costs for false positive predictions (c_fp), false negative predictions (c_fn) and rejections (c_r).
- verbose: bool, optional (default = False)
If true, it prints the expected rejection rate, the upper bound rejection rate, and the upper bound of the cost.
Returns¶
expected_rejection_rate: float, the expected rejection rate; upperbound_rejection_rate: float, the upper bound for the rejection rate
satisfied with probability 1-delta;
upperbound_cost: float, the upper bound for the cost;
- abstractmethod decision_function(X)[source]¶
Predict raw anomaly scores of X using the fitted detector.
The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples. Sparse matrices are accepted only if they are supported by the base estimator.
Returns¶
- anomaly_scoresnumpy array of shape (n_samples,)
The anomaly score of the input samples.
- abstractmethod fit(X, y=None)[source]¶
Fit detector. y is ignored in unsupervised methods.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
Returns¶
- selfobject
Fitted estimator.
- fit_predict(X, y=None)[source]¶
Fit detector first and then predict whether a particular sample is an outlier or not. y is ignored in unsupervised models.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
Returns¶
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
Deprecated since version 0.6.9: fit_predict will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency.
- fit_predict_score(X, y, scoring='roc_auc_score')[source]¶
Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
- scoringstr, optional (default=’roc_auc_score’)
Evaluation metric:
‘roc_auc_score’: ROC score
‘prc_n_score’: Precision @ rank n score
Returns¶
score : float
Deprecated since version 0.6.9: fit_predict_score will be removed in pyod 0.8.0.; it will be replaced by calling fit function first and then accessing labels_ attribute for consistency. Scoring could be done by calling an evaluation method, e.g., AUC ROC.
- get_metadata_routing()¶
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Returns¶
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Parameters¶
- deepbool, optional (default=True)
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns¶
- paramsmapping of string to any
Parameter names mapped to their values.
- predict(X, return_confidence=False)[source]¶
Predict if a particular sample is an outlier or not.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns¶
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.
- confidencenumpy array of shape (n_samples,).
Only if return_confidence is set to True.
- predict_confidence(X)[source]¶
Predict the model’s confidence in making the same prediction under slightly different training sets. See [MPVD20].
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
Returns¶
- confidencenumpy array of shape (n_samples,)
For each observation, tells how consistently the model would make the same prediction if the training set was perturbed. Return a probability, ranging in [0,1].
- predict_proba(X, method='linear', return_confidence=False)[source]¶
Predict the probability of a sample being outlier. Two approaches are possible:
simply use Min-max conversion to linearly transform the outlier scores into the range of [0,1]. The model must be fitted first.
use unifying scores, see [MKKSZ11].
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- methodstr, optional (default=’linear’)
probability conversion method. It must be one of ‘linear’ or ‘unify’.
- return_confidenceboolean, optional(default=False)
If True, also return the confidence of prediction.
Returns¶
- outlier_probabilitynumpy array of shape (n_samples, n_classes)
For each observation, tells whether or not it should be considered as an outlier according to the fitted model. Return the outlier probability, ranging in [0,1]. Note it depends on the number of classes, which is by default 2 classes ([proba of normal, proba of outliers]).
- predict_with_rejection(X, T=32, return_stats=False, delta=0.1, c_fp=1, c_fn=1, c_r=-1)[source]¶
- Predict if a particular sample is an outlier or not,
allowing the detector to reject (i.e., output = -2) low confidence predictions.
Parameters¶
- Xnumpy array of shape (n_samples, n_features)
The input samples.
- Tint, optional(default=32)
It allows to set the rejection threshold to 1-2exp(-T). The higher the value of T, the more rejections are made.
- return_stats: bool, optional (default = False)
If true, it returns also three additional float values: the estimated rejection rate, the upper bound rejection rate, and the upper bound of the cost.
- delta: float, optional (default = 0.1)
The upper bound rejection rate holds with probability 1-delta.
- c_fp, c_fn, c_r: floats (positive), optional (default = [1,1, contamination])
costs for false positive predictions (c_fp), false negative predictions (c_fn) and rejections (c_r).
Returns¶
- outlier_labelsnumpy array of shape (n_samples,)
For each observation, it tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers, 1 for outliers and -2 for rejection.
expected_rejection_rate: float, if return_stats is True; upperbound_rejection_rate: float, if return_stats is True; upperbound_cost: float, if return_stats is True;
- set_params(**params)[source]¶
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>so that it’s possible to update each component of a nested object.See http://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html and sklearn/base.py for more information.
Returns¶
self : object
- set_predict_proba_request(*, method: bool | None | str = '$UNCHANGED$', return_confidence: bool | None | str = '$UNCHANGED$') BaseDetector¶
Configure whether metadata should be requested to be passed to the
predict_probamethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredict_probaif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict_proba.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters¶
- methodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
methodparameter inpredict_proba.- return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_confidenceparameter inpredict_proba.
Returns¶
- selfobject
The updated object.
- set_predict_request(*, return_confidence: bool | None | str = '$UNCHANGED$') BaseDetector¶
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters¶
- return_confidencestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_confidenceparameter inpredict.
Returns¶
- selfobject
The updated object.