Welcome to PyOD documentation!

Deployment & Documentation & Stats

PyPI version Documentation Status Binder GitHub stars GitHub forks Downloads Downloads

Build Status & Code Coverage & Maintainability

Build status Build Status Coverage Status https://circleci.com/gh/yzhao062/pyod.svg?style=svg Maintainability

PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection. Since 2017, PyOD has been successfully used in various academic researches and commercial products [AZH18a][AZH18b][AZHNL19]. PyOD is featured for:

  • Unified APIs, detailed documentation, and interactive examples across various algorithms.
  • Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
  • Optimized performance with JIT and parallelization when possible, using numba and joblib.
  • Compatible with both Python 2 & 3 (scikit-learn compatible as well).

Important Notes: PyOD contains neural network based models, e.g., AutoEncoders, which are implemented in Keras. However, PyOD would NOT install Keras and/or TensorFlow automatically. This reduces the risk of damaging your local copies. If you want to use neural net based models, you should install Keras and back-end libraries like TensorFlow manually. An instruction is provided: neural-net FAQ. Similarly, some models, e.g., XGBOD, depend on xgboost, which would NOT be installed by default.

Key Links and Resources:


Quick Introduction

PyOD toolkit consists of three major groups of functionalities:

(i) Individual Detection Algorithms :

  1. Linear Models for Outlier Detection:
Type Abbr Algorithm Year Class Ref
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 pyod.models.pca.PCA [ASCSC03]
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 pyod.models.mcd.MCD [ARD99][AHR04]
Linear Model OCSVM One-Class Support Vector Machines 2003 pyod.models.ocsvm.OCSVM [AMP03]
Proximity-Based LOF Local Outlier Factor 2000 pyod.models.lof.LOF [ABKNS00]
Proximity-Based CBLOF Clustering-Based Local Outlier Factor 2003 pyod.models.cblof.CBLOF [AHXD03]:
Proximity-Based LOCI LOCI: Fast outlier detection using the local correlation integral 2003 pyod.models.loci.LOCI [APKGF03]
Proximity-Based HBOS Histogram-based Outlier Score 2012 pyod.models.hbos.HBOS [AGD12]
Proximity-Based kNN k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score 2000 pyod.models.knn.KNN [ARRS00][AAP02]
Proximity-Based AvgKNN Average kNN (use the average distance to k nearest neighbors as the outlier score) 2002 pyod.models.knn.KNN [ARRS00][AAP02]
Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 pyod.models.knn.KNN [ARRS00][AAP02]
Probabilistic ABOD Angle-Based Outlier Detection 2008 pyod.models.abod.ABOD [AKZ+08]
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 pyod.models.abod.ABOD [AKZ+08]
Probabilistic SOS Stochastic Outlier Selection 2012 pyod.models.sos.SOS [AJHuszarPvdH12]
Outlier Ensembles IForest Isolation Forest 2008 pyod.models.iforest.IForest [ALTZ08][ALTZ12]
Outlier Ensembles   Feature Bagging 2005 pyod.models.feature_bagging.FeatureBagging [ALK05]
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 pyod.models.lscp.LSCP [AZHNL19]
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection (Supervised) 2018 pyod.models.xgbod.XGBOD [AZH18a]
Neural Networks AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score) 2015 pyod.models.auto_encoder.AutoEncoder [AAgg15]
Neural Networks SO_GAAL Single-Objective Generative Adversarial Active Learning 2019 pyod.models.so_gaal.SO_GAAL [ALLZ+18]
Neural Networks MO_GAAL Multiple-Objective Generative Adversarial Active Learning 2019 pyod.models.mo_gaal.MO_GAAL [ALLZ+18]

(ii) Outlier Ensembles & Outlier Detector Combination Frameworks:

Type Abbr Algorithm Year Ref  
Outlier Ensembles   Feature Bagging 2005 pyod.models.feature_bagging.FeatureBagging [ALK05]
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 pyod.models.lscp.LSCP [AZHNL19]
Combination Average Simple combination by averaging the scores 2015 pyod.models.combination.average() [AAS15]
Combination Weighted Average Simple combination by averaging the scores with detector weights 2015 pyod.models.combination.average() [AAS15]
Combination Maximization Simple combination by taking the maximum scores 2015 pyod.models.combination.maximization() [AAS15]
Combination AOM Average of Maximum 2015 pyod.models.combination.aom() [AAS15]
Combination MOA Maximization of Average 2015 pyod.models.combination.moa() [AAS15]

(iii) Utility Functions:

Type Name Function
Data pyod.utils.data.generate_data() Synthesized data generation; normal data is generated by a multivariate Gaussian and outliers are generated by a uniform distribution
Stat pyod.utils.stat_models.wpearsonr() Calculate the weighted Pearson correlation of two samples
Utility pyod.utils.utility.get_label_n() Turn raw outlier scores into binary labels by assign 1 to top n outlier scores
Utility pyod.utils.utility.precision_n_scores() calculate precision @ rank n

Comparison of all implemented models are made available below (Code, Jupyter Notebooks):

For Jupyter Notebooks, please navigate to “/notebooks/Compare All Models.ipynb”

Comparison of all implemented models

Key APIs & Attributes

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model: