Welcome to PyOD documentation!¶
Deployment & Documentation & Stats
Build Status & Code Coverage & Maintainability
PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection. Since 2017, PyOD has been successfully used in various academic researches [AZH18a][AZH18b] and commercial products. PyOD is featured for:
- Unified APIs, detailed documentation, and interactive examples across various algorithms.
- Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
- Optimized performance with JIT and parallelization when possible, using numba and parallelization.
- Compatible with both Python 2 & 3 (scikit-learn compatible as well).
Important Notes: PyOD contains some neural network based models, e.g., AutoEncoders, which are implemented in keras. However, PyOD would NOT install keras and/or tensorflow automatically. This reduces the risk of damaging your local installations. So you should install keras and a back-end lib like tensorflow, if you want It is fairly easy to install and an instruction is provided here.
PyOD toolkit consists of three major groups of functionalities: (i) outlier detection algorithms; (ii) outlier ensemble frameworks and (iii) outlier detection utility functions.
Individual Detection Algorithms:
- Linear Models for Outlier Detection:
- PCA: Principal Component Analysis (use the sum of weighted projected distances to the eigenvector hyperplane as outlier scores) [ASCSC03]:
- MCD: Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) [ARD99][AHR04]:
- One-Class Support Vector Machines [AMP03]:
- Proximity-Based Outlier Detection Models:
- LOF: Local Outlier Factor [ABKNS00]:
- CBLOF: Clustering-Based Local Outlier Factor [AHXD03]:
- LOCI: Local Correlation Integral [APKGF03]:
- kNN: k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score) [ARRS00][AAP02]:
- Average kNN (use the average distance to k nearest neighbors as
- the outlier score):
- Median kNN (use the median distance to k nearest neighbors as the outlier score):
- HBOS: Histogram-based Outlier Score [AGD12]:
- Probabilistic Models for Outlier Detection:
- Outlier Ensembles and Combination Frameworks
- Neural Networks and Deep Learning Models (implemented in Keras):
Outlier Detector/Scores Combination Frameworks:
- Feature Bagging: build various detectors on random selected features [ALK05]:
- Average & Weighted Average: simply combine scores by averaging [AAS15]:
- Maximization: simply combine scores by taking the maximum across all base detectors [AAS15]:
- Average of Maximum (AOM) [AAS15]:
- Maximum of Average (MOA) [AAS15]:
- Threshold Sum (Thresh) [AAS15]
For Jupyter Notebooks, please navigate to “/notebooks/Compare All Models.ipynb”
Key APIs & Attributes¶
The following APIs are applicable for all detector models for easy use.
pyod.models.base.BaseDetector.fit(): Fit detector.
pyod.models.base.BaseDetector.fit_predict(): Fit detector and predict if a particular sample is an outlier or not.
pyod.models.base.BaseDetector.fit_predict_evaluate(): Fit, predict and then evaluate with predefined metrics (ROC and precision @ rank n).
pyod.models.base.BaseDetector.decision_function(): Predict anomaly score of X of the base classifiers.
pyod.models.base.BaseDetector.predict(): Predict if a particular sample is an outlier or not. The model must be fitted first.
pyod.models.base.BaseDetector.predict_proba(): Predict the probability of a sample being outlier. The model must be fitted first.
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.
pyod.models.base.BaseDetector.labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.