Welcome to PyOD documentation!¶
Deployment & Documentation & Stats
Build Status & Coverage & Maintainability & License
PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection.
PyOD includes more than 30 detection algorithms, from classical LOF (SIGMOD 2000) to the latest COPOD (ICDM 2020). Since 2017, PyOD [AZNL19] has been successfully used in numerous academic researches and commercial products [AGSW19][ALCJ+19][AWDL+19][AZNHL19]. It is also well acknowledged by the machine learning community with various dedicated posts/tutorials, including Analytics Vidhya, Towards Data Science, KDnuggets, Computer Vision News, and awesomemachinelearning.
PyOD is featured for:
Unified APIs, detailed documentation, and interactive examples across various algorithms.
Advanced models, including classical ones from scikitlearn, latest deep learning methods, and emerging algorithms like COPOD.
Optimized performance with JIT and parallelization when possible, using numba and joblib.
Compatible with both Python 2 & 3.
Note on Python 2.7: The maintenance of Python 2.7 will be stopped by January 1, 2020 (see official announcement) To be consistent with the Python change and PyOD’s dependent libraries, e.g., scikitlearn, we will stop supporting Python 2.7 in the near future (dates are still to be decided). We encourage you to use Python 3.5 or newer for the latest functions and bug fixes. More information can be found at Moving to require Python 3.
API Demo:
# train the COPOD detector
from pyod.models.copod import COPOD
clf = COPOD()
clf.fit(X_train)
# get outlier scores
y_train_scores = clf.decision_scores_ # raw outlier scores
y_test_scores = clf.decision_function(X_test) # outlier scores
Citing PyOD:
PyOD paper is published in JMLR (machine learning opensource software track). If you use PyOD in a scientific publication, we would appreciate citations to the following paper:
@article{zhao2019pyod,
author = {Zhao, Yue and Nasrullah, Zain and Li, Zheng},
title = {PyOD: A Python Toolbox for Scalable Outlier Detection},
journal = {Journal of Machine Learning Research},
year = {2019},
volume = {20},
number = {96},
pages = {17},
url = {http://jmlr.org/papers/v20/19011.html}
}
or:
Zhao, Y., Nasrullah, Z. and Li, Z., 2019. PyOD: A Python Toolbox for Scalable Outlier Detection. Journal of machine learning research (JMLR), 20(96), pp.17.
Key Links and Resources:
Implemented Algorithms¶
PyOD toolkit consists of three major functional groups:
(i) Individual Detection Algorithms :
Linear Models for Outlier Detection:
Type 
Abbr 
Algorithm 
Year 
Class 
Ref 

Linear Model 
PCA 
Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 
2003 

Linear Model 
MCD 
Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 
1999 

Linear Model 
OCSVM 
OneClass Support Vector Machines 
2001 

Linear Model 
LMDD 
Deviationbased Outlier Detection (LMDD) 
1996 

ProximityBased 
LOF 
Local Outlier Factor 
2000 

ProximityBased 
COF 
ConnectivityBased Outlier Factor 
2002 

ProximityBased 
CBLOF 
ClusteringBased Local Outlier Factor 
2003 

ProximityBased 
LOCI 
LOCI: Fast outlier detection using the local correlation integral 
2003 

ProximityBased 
HBOS 
Histogrambased Outlier Score 
2012 

ProximityBased 
kNN 
k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score 
2000 

ProximityBased 
AvgKNN 
Average kNN (use the average distance to k nearest neighbors as the outlier score) 
2002 

ProximityBased 
MedKNN 
Median kNN (use the median distance to k nearest neighbors as the outlier score) 
2002 

ProximityBased 
SOD 
Subspace Outlier Detection 
2009 

Probabilistic 
ABOD 
AngleBased Outlier Detection 
2008 

Probabilistic 
FastABOD 
Fast AngleBased Outlier Detection using approximation 
2008 

Probabilistic 
COPOD 
COPOD: CopulaBased Outlier Detection 
2020 

Probabilistic 
MAD 
Median Absolute Deviation (MAD) 
1993 

Probabilistic 
SOS 
Stochastic Outlier Selection 
2012 

Outlier Ensembles 
IForest 
Isolation Forest 
2008 

Outlier Ensembles 
Feature Bagging 
2005 

Outlier Ensembles 
LSCP 
LSCP: Locally Selective Combination of Parallel Outlier Ensembles 
2019 

Outlier Ensembles 
XGBOD 
Extreme Boosting Based Outlier Detection (Supervised) 
2018 

Outlier Ensembles 
LODA 
Lightweight Online Detector of Anomalies 
2016 

Neural Networks 
AutoEncoder 
Fully connected AutoEncoder (use reconstruction error as the outlier score) 
2015 

Neural Networks 
VAE 
Variational AutoEncoder (use reconstruction error as the outlier score) 
2013 

Neural Networks 
BetaVAE 
Variational AutoEncoder (all customized loss term by varying gamma and capacity) 
2018 

Neural Networks 
SO_GAAL 
SingleObjective Generative Adversarial Active Learning 
2019 

Neural Networks 
MO_GAAL 
MultipleObjective Generative Adversarial Active Learning 
2019 
(ii) Outlier Ensembles & Outlier Detector Combination Frameworks:
Type 
Abbr 
Algorithm 
Year 
Ref 


Outlier Ensembles 
Feature Bagging 
2005 

Outlier Ensembles 
LSCP 
LSCP: Locally Selective Combination of Parallel Outlier Ensembles 
2019 

Outlier Ensembles 
XGBOD 
Extreme Boosting Based Outlier Detection (Supervised) 
2018 

Outlier Ensembles 
LODA 
Lightweight Online Detector of Anomalies 
2016 

Combination 
Average 
Simple combination by averaging the scores 
2015 

Combination 
Weighted Average 
Simple combination by averaging the scores with detector weights 
2015 

Combination 
Maximization 
Simple combination by taking the maximum scores 
2015 

Combination 
AOM 
Average of Maximum 
2015 

Combination 
MOA 
Maximum of Average 
2015 

Combination 
Median 
Simple combination by taking the median of the scores 
2015 

Combination 
majority Vote 
Simple combination by taking the majority vote of the labels (weights can be used) 
2015 
(iii) Utility Functions:
Type 
Name 
Function 

Data 
Synthesized data generation; normal data is generated by a multivariate Gaussian and outliers are generated by a uniform distribution 

Data 
Synthesized data generation in clusters; more complex data patterns can be created with multiple clusters 

Stat 
Calculate the weighted Pearson correlation of two samples 

Utility 
Turn raw outlier scores into binary labels by assign 1 to top n outlier scores 

Utility 
calculate precision @ rank n 
The comparison among of implemented models is made available below (Figure, compare_all_models.py, Interactive Jupyter Notebooks). For Jupyter Notebooks, please navigate to “/notebooks/Compare All Models.ipynb”.
Check the latest benchmark. You could replicate this process by running benchmark.py.
API Cheatsheet & Reference¶
The following APIs are applicable for all detector models for easy use.
pyod.models.base.BaseDetector.fit()
: Fit detector. y is ignored in unsupervised methods.pyod.models.base.BaseDetector.decision_function()
: Predict raw anomaly score of X using the fitted detector.pyod.models.base.BaseDetector.predict()
: Predict if a particular sample is an outlier or not using the fitted detector.pyod.models.base.BaseDetector.predict_proba()
: Predict the probability of a sample being outlier using the fitted detector.pyod.models.base.BaseDetector.fit_predict()
: [Deprecated in V0.6.9] Fit detector first and then predict whether a particular sample is an outlier or not.pyod.models.base.BaseDetector.fit_predict_score()
: [Deprecated in V0.6.9] Fit the detector, predict on samples, and evaluate the model by predefined metrics, e.g., ROC.
Key Attributes of a fitted model:
pyod.models.base.BaseDetector.decision_scores_
: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.pyod.models.base.BaseDetector.labels_
: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.
Note : fit_predict() and fit_predict_score() are deprecated in V0.6.9 due
to consistency issue and will be removed in V0.8.0. To get the binary labels
of the training data X_train, one should call clf.fit(X_train) and use
pyod.models.base.BaseDetector.labels_
, instead of calling clf.predict(X_train).
References
 AAgg15
Charu C Aggarwal. Outlier analysis. In Data mining, 75–79. Springer, 2015.
 AAS15(1,2,3,4,5,6,7)
Charu C Aggarwal and Saket Sathe. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter, 17(1):24–47, 2015.
 AAP02(1,2,3)
Fabrizio Angiulli and Clara Pizzuti. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery, 15–27. Springer, 2002.
 AAAR96
Andreas Arning, Rakesh Agrawal, and Prabhakar Raghavan. A linear method for deviation detection in large databases. In KDD, volume 1141, 972–981. 1996.
 ABKNS00
Markus M Breunig, HansPeter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying densitybased local outliers. In ACM sigmod record, volume 29, 93–104. ACM, 2000.
 ABHP+18
Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in betavae. arXiv preprint arXiv:1804.03599, 2018.
 AGD12
Markus Goldstein and Andreas Dengel. Histogrambased outlier score (hbos): a fast unsupervised anomaly detection algorithm. KI2012: Poster and Demo Track, pages 59–63, 2012.
 AGSW19
Parikshit Gopalan, Vatsal Sharan, and Udi Wieder. Pidforest: anomaly detection via partial identification. In Advances in Neural Information Processing Systems, 15783–15793. 2019.
 AHR04
Johanna Hardin and David M Rocke. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Computational Statistics & Data Analysis, 44(4):625–638, 2004.
 AHXD03
Zengyou He, Xiaofei Xu, and Shengchun Deng. Discovering clusterbased local outliers. Pattern Recognition Letters, 24(910):1641–1650, 2003.
 AIH93
Boris Iglewicz and David Caster Hoaglin. How to detect and handle outliers. Volume 16. Asq Press, 1993.
 AJHuszarPvdH12
JHM Janssens, Ferenc Huszár, EO Postma, and HJ van den Herik. Stochastic outlier selection. Technical Report, Technical report TiCC TR 2012001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands, 2012.
 AKW13
Diederik P Kingma and Max Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 AKZ+08(1,2)
HansPeter Kriegel, Arthur Zimek, and others. Anglebased outlier detection in highdimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 444–452. ACM, 2008.
 ALK05(1,2)
Aleksandar Lazarevic and Vipin Kumar. Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 157–166. ACM, 2005.
 ALCJ+19
Dan Li, Dacheng Chen, Baihong Jin, Lei Shi, Jonathan Goh, and SeeKiong Ng. Madgan: multivariate anomaly detection for time series data with generative adversarial networks. In International Conference on Artificial Neural Networks, 703–716. Springer, 2019.
 ALZB+20
Zheng Li, Yue Zhao, Nicola Botta, Cezar Ionescu, and Xiyang Hu. COPOD: copulabased outlier detection. In IEEE International Conference on Data Mining (ICDM). IEEE, 2020.
 ALTZ08
Fei Tony Liu, Kai Ming Ting, and ZhiHua Zhou. Isolation forest. In Data Mining, 2008. ICDM‘08. Eighth IEEE International Conference on, 413–422. IEEE, 2008.
 ALTZ12
Fei Tony Liu, Kai Ming Ting, and ZhiHua Zhou. Isolationbased anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1):3, 2012.
 ALLZ+19(1,2)
Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, and Xiangnan He. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering, 2019.
 APKGF03
Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B Gibbons, and Christos Faloutsos. Loci: fast outlier detection using the local correlation integral. In Data Engineering, 2003. Proceedings. 19th International Conference on, 315–326. IEEE, 2003.
 APevny16(1,2)
Tomáš Pevn`y. Loda: lightweight online detector of anomalies. Machine Learning, 102(2):275–304, 2016.
 ARRS00(1,2,3)
Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, volume 29, 427–438. ACM, 2000.
 ARD99
Peter J Rousseeuw and Katrien Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999.
 AScholkopfPST+01
Bernhard Schölkopf, John C Platt, John ShaweTaylor, Alex J Smola, and Robert C Williamson. Estimating the support of a highdimensional distribution. Neural computation, 13(7):1443–1471, 2001.
 ASCSC03
MeiLing Shyu, ShuChing Chen, Kanoksri Sarinnapakorn, and LiWu Chang. A novel anomaly detection scheme based on principal component classifier. Technical Report, MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003.
 ATCFC02
Jian Tang, Zhixiang Chen, Ada WaiChee Fu, and David W Cheung. Enhancing effectiveness of outlier detections for low density patterns. In PacificAsia Conference on Knowledge Discovery and Data Mining, 535–548. Springer, 2002.
 AWDL+19
Xuhong Wang, Ying Du, Shijie Lin, Ping Cui, Yuntian Shen, and Yupu Yang. Advae: a selfadversarial variational autoencoder with gaussian anomaly prior knowledge for anomaly detection. KnowledgeBased Systems, 2019.
 AZH18(1,2)
Yue Zhao and Maciej K Hryniewicki. Xgbod: improving supervised outlier detection with unsupervised representation learning. In International Joint Conference on Neural Networks (IJCNN). IEEE, 2018.
 AZNHL19(1,2,3)
Yue Zhao, Zain Nasrullah, Maciej K Hryniewicki, and Zheng Li. LSCP: locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining, SDM 2019, 585–593. Calgary, Canada, May 2019. SIAM. URL: https://doi.org/10.1137/1.9781611975673.66, doi:10.1137/1.9781611975673.66.
 AZNL19
Yue Zhao, Zain Nasrullah, and Zheng Li. PyOD: a python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20(96):1–7, 2019.