Welcome to PyOD documentation!

Deployment & Documentation & Stats

PyPI version Documentation Status Binder GitHub stars GitHub forks Downloads Downloads

Build Status & Code Coverage & Maintainability

Build status Build Status Coverage Status https://circleci.com/gh/yzhao062/pyod.svg?style=svg Maintainability

PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection. Since 2017, PyOD [AZNL19] has been successfully used in various academic researches and commercial products [AZH18a][AZH18b][AZNHL19]. PyOD is featured for:

  • Unified APIs, detailed documentation, and interactive examples across various algorithms.
  • Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
  • Optimized performance with JIT and parallelization when possible, using numba and joblib.
  • Compatible with both Python 2 & 3 (scikit-learn compatible as well).

Important Notes: PyOD contains neural network based models, e.g., AutoEncoders, which are implemented in Keras. However, PyOD would NOT install Keras and/or TensorFlow automatically. This reduces the risk of damaging your local copies. If you want to use neural net based models, you should install Keras and back-end libraries like TensorFlow manually. An instruction is provided: neural-net FAQ. Similarly, some models, e.g., XGBOD, depend on xgboost, which would NOT be installed by default.

Key Links and Resources:


Quick Introduction

PyOD toolkit consists of three major groups of functionalities:

(i) Individual Detection Algorithms :

  1. Linear Models for Outlier Detection:
Type Abbr Algorithm Year Class Ref
Linear Model PCA Principal Component Analysis (the sum of weighted projected distances to the eigenvector hyperplanes) 2003 pyod.models.pca.PCA [ASCSC03]
Linear Model MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) 1999 pyod.models.mcd.MCD [ARD99][AHR04]
Linear Model OCSVM One-Class Support Vector Machines 2003 pyod.models.ocsvm.OCSVM [AMP03]
Proximity-Based LOF Local Outlier Factor 2000 pyod.models.lof.LOF [ABKNS00]
Proximity-Based CBLOF Clustering-Based Local Outlier Factor 2003 pyod.models.cblof.CBLOF [AHXD03]:
Proximity-Based LOCI LOCI: Fast outlier detection using the local correlation integral 2003 pyod.models.loci.LOCI [APKGF03]
Proximity-Based HBOS Histogram-based Outlier Score 2012 pyod.models.hbos.HBOS [AGD12]
Proximity-Based kNN k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score 2000 pyod.models.knn.KNN [ARRS00][AAP02]
Proximity-Based AvgKNN Average kNN (use the average distance to k nearest neighbors as the outlier score) 2002 pyod.models.knn.KNN [ARRS00][AAP02]
Proximity-Based MedKNN Median kNN (use the median distance to k nearest neighbors as the outlier score) 2002 pyod.models.knn.KNN [ARRS00][AAP02]
Probabilistic ABOD Angle-Based Outlier Detection 2008 pyod.models.abod.ABOD [AKZ+08]
Probabilistic FastABOD Fast Angle-Based Outlier Detection using approximation 2008 pyod.models.abod.ABOD [AKZ+08]
Probabilistic SOS Stochastic Outlier Selection 2012 pyod.models.sos.SOS [AJHuszarPvdH12]
Outlier Ensembles IForest Isolation Forest 2008 pyod.models.iforest.IForest [ALTZ08][ALTZ12]
Outlier Ensembles   Feature Bagging 2005 pyod.models.feature_bagging.FeatureBagging [ALK05]
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 pyod.models.lscp.LSCP [AZNHL19]
Outlier Ensembles XGBOD Extreme Boosting Based Outlier Detection (Supervised) 2018 pyod.models.xgbod.XGBOD [AZH18a]
Neural Networks AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score) 2015 pyod.models.auto_encoder.AutoEncoder [AAgg15]
Neural Networks SO_GAAL Single-Objective Generative Adversarial Active Learning 2019 pyod.models.so_gaal.SO_GAAL [ALLZ+18]
Neural Networks MO_GAAL Multiple-Objective Generative Adversarial Active Learning 2019 pyod.models.mo_gaal.MO_GAAL [ALLZ+18]

(ii) Outlier Ensembles & Outlier Detector Combination Frameworks:

Type Abbr Algorithm Year Ref  
Outlier Ensembles   Feature Bagging 2005 pyod.models.feature_bagging.FeatureBagging [ALK05]
Outlier Ensembles LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles 2019 pyod.models.lscp.LSCP [AZNHL19]
Combination Average Simple combination by averaging the scores 2015 pyod.models.combination.average() [AAS15]
Combination Weighted Average Simple combination by averaging the scores with detector weights 2015 pyod.models.combination.average() [AAS15]
Combination Maximization Simple combination by taking the maximum scores 2015 pyod.models.combination.maximization() [AAS15]
Combination AOM Average of Maximum 2015 pyod.models.combination.aom() [AAS15]
Combination MOA Maximization of Average 2015 pyod.models.combination.moa() [AAS15]

(iii) Utility Functions:

Type Name Function
Data pyod.utils.data.generate_data() Synthesized data generation; normal data is generated by a multivariate Gaussian and outliers are generated by a uniform distribution
Stat pyod.utils.stat_models.wpearsonr() Calculate the weighted Pearson correlation of two samples
Utility pyod.utils.utility.get_label_n() Turn raw outlier scores into binary labels by assign 1 to top n outlier scores
Utility pyod.utils.utility.precision_n_scores() calculate precision @ rank n

Comparison of all implemented models are made available below (Code, Jupyter Notebooks):

For Jupyter Notebooks, please navigate to “/notebooks/Compare All Models.ipynb”

Comparison of all implemented models

Key APIs & Attributes

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model: