Welcome to PyOD documentation!

Deployment & Documentation & Stats

PyPI version Documentation Status Binder GitHub stars GitHub forks Downloads Downloads

Build Status & Code Coverage & Maintainability

Build status Build Status Coverage Status https://circleci.com/gh/yzhao062/pyod.svg?style=svg Maintainability

PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. This exciting yet challenging field is commonly referred as Outlier Detection or Anomaly Detection. Since 2017, PyOD has been successfully used in various academic researches [AZH18a][AZH18b] and commercial products. PyOD is featured for:

  • Unified APIs, detailed documentation, and interactive examples across various algorithms.
  • Advanced models, including Neural Networks/Deep Learning and Outlier Ensembles.
  • Optimized performance with JIT and parallelization when possible, using numba and parallelization.
  • Compatible with both Python 2 & 3 (scikit-learn compatible as well).

Important Notes: PyOD contains some neural network based models, e.g., AutoEncoders, which are implemented in keras. However, PyOD would NOT install keras and/or tensorflow automatically. This reduces the risk of damaging your local installations. So you should install keras and a back-end lib like tensorflow, if you want It is fairly easy to install and an instruction is provided here.

Important Functionalities

PyOD toolkit consists of three major groups of functionalities: (i) outlier detection algorithms; (ii) outlier ensemble frameworks and (iii) outlier detection utility functions.

Individual Detection Algorithms:

  1. Linear Models for Outlier Detection:
  1. PCA: Principal Component Analysis (use the sum of weighted projected distances to the eigenvector hyperplane as outlier scores) [ASCSC03]: pyod.models.pca.PCA
  2. MCD: Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) [ARD99][AHR04]: pyod.models.mcd.MCD
  3. One-Class Support Vector Machines [AMP03]: pyod.models.ocsvm.OCSVM
  1. Proximity-Based Outlier Detection Models:
  1. LOF: Local Outlier Factor [ABKNS00]: pyod.models.lof.LOF
  2. CBLOF: Clustering-Based Local Outlier Factor [AHXD03]: pyod.models.cblof.CBLOF
  3. LOCI: Local Correlation Integral [APKGF03]: pyod.models.loci.LOCI
  4. kNN: k Nearest Neighbors (use the distance to the kth nearest neighbor as the outlier score) [ARRS00][AAP02]: pyod.models.knn.KNN
  5. Average kNN (use the average distance to k nearest neighbors as
    the outlier score): pyod.models.knn.KNN
  6. Median kNN (use the median distance to k nearest neighbors as the outlier score): pyod.models.knn.KNN
  7. HBOS: Histogram-based Outlier Score [AGD12]: pyod.models.hbos.HBOS
  1. Probabilistic Models for Outlier Detection:
  1. ABOD: Angle-Based Outlier Detection [AKZ+08]: pyod.models.abod.ABOD
  2. FastABOD: Fast Angle-Based Outlier Detection using approximation [AKZ+08]: pyod.models.abod.ABOD
  3. SOS: Stochastic Outlier Selection [AJHuszarPvdH12]: pyod.models.sos.SOS
  1. Outlier Ensembles and Combination Frameworks
  1. Neural Networks and Deep Learning Models (implemented in Keras):
  1. AutoEncoder with Fully Connected NN [AAgg15]: pyod.models.auto_encoder.AutoEncoder
FAQ regarding AutoEncoder in PyOD and debugging advices: known issues

Outlier Detector/Scores Combination Frameworks:

  1. Feature Bagging: build various detectors on random selected features [ALK05]: pyod.models.feature_bagging.FeatureBagging
  2. Average & Weighted Average: simply combine scores by averaging [AAS15]: pyod.models.combination.average()
  3. Maximization: simply combine scores by taking the maximum across all base detectors [AAS15]: pyod.models.combination.maximization()
  4. Average of Maximum (AOM) [AAS15]: pyod.models.combination.aom()
  5. Maximum of Average (MOA) [AAS15]: pyod.models.combination.moa()
  6. Threshold Sum (Thresh) [AAS15]

Comparison of all implemented models are made available below (Code, Jupyter Notebooks):

For Jupyter Notebooks, please navigate to “/notebooks/Compare All Models.ipynb”

Comparison of all implemented models

Key APIs & Attributes

The following APIs are applicable for all detector models for easy use.

Key Attributes of a fitted model:

  • pyod.models.base.BaseDetector.decision_scores_: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.
  • pyod.models.base.BaseDetector.labels_: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.