# Welcome to PyOD documentation!¶

**Deployment & Documentation & Stats**

**Build Status & Code Coverage & Maintainability**

PyOD is a comprehensive and scalable **Python toolkit** for **detecting outlying objects** in
multivariate data. This exciting yet challenging field is commonly referred as
Outlier Detection
or Anomaly Detection.
Since 2017, PyOD has been successfully used in various academic researches [AZH18a][AZH18b] and commercial products.
PyOD is featured for:

**Unified APIs, detailed documentation, and interactive examples**across various algorithms.**Advanced models**, including**Neural Networks/Deep Learning**and**Outlier Ensembles**.**Optimized performance with JIT and parallelization**when possible, using numba and parallelization.**Compatible with both Python 2 & 3**(scikit-learn compatible as well).

**Important Notes**:
PyOD contains some neural network based models, e.g., AutoEncoders, which are
implemented in keras. However, PyOD would **NOT** install **keras** and/or **tensorflow** automatically. This
reduces the risk of damaging your local installations.
So you should install keras and a back-end lib like tensorflow, if you want
It is fairly easy to install and an instruction is provided here.

## Key Links and Resources¶

# Important Functionalities¶

PyOD toolkit consists of three major groups of functionalities: (i) outlier detection algorithms; (ii) outlier ensemble frameworks and (iii) outlier detection utility functions.

**Individual Detection Algorithms**:

- Linear Models for Outlier Detection:

PCA: Principal Component Analysis(use the sum of weighted projected distances to the eigenvector hyperplane as outlier scores) [ASCSC03]:`pyod.models.pca.PCA`

MCD: Minimum Covariance Determinant(use the mahalanobis distances as the outlier scores) [ARD99][AHR04]:`pyod.models.mcd.MCD`

One-Class Support Vector Machines[AMP03]:`pyod.models.ocsvm.OCSVM`

- Proximity-Based Outlier Detection Models:

LOF: Local Outlier Factor[ABKNS00]:`pyod.models.lof.LOF`

CBLOF: Clustering-Based Local Outlier Factor[AHXD03]:`pyod.models.cblof.CBLOF`

LOCI: Local Correlation Integral[APKGF03]:`pyod.models.loci.LOCI`

kNN: k Nearest Neighbors(use the distance to the kth nearest neighbor as the outlier score) [ARRS00][AAP02]:`pyod.models.knn.KNN`

Average kNN(use the average distance to k nearest neighbors as- the outlier score):
`pyod.models.knn.KNN`

Median kNN(use the median distance to k nearest neighbors as the outlier score):`pyod.models.knn.KNN`

HBOS: Histogram-based Outlier Score[AGD12]:`pyod.models.hbos.HBOS`

- Probabilistic Models for Outlier Detection:

ABOD: Angle-Based Outlier Detection[AKZ+08]:`pyod.models.abod.ABOD`

FastABOD: Fast Angle-Based Outlier Detection using approximation[AKZ+08]:`pyod.models.abod.ABOD`

SOS: Stochastic Outlier Selection[AJHuszarPvdH12]:`pyod.models.sos.SOS`

- Outlier Ensembles and Combination Frameworks

Isolation Forest[ALTZ08][ALTZ12]:`pyod.models.iforest.IForest`

Feature Bagging[ALK05]:`pyod.models.feature_bagging.FeatureBagging`

- Neural Networks and Deep Learning Models (implemented in Keras):

AutoEncoder with Fully Connected NN[AAgg15]:`pyod.models.auto_encoder.AutoEncoder`

FAQ regarding AutoEncoder in PyOD and debugging advices: known issues

**Outlier Detector/Scores Combination Frameworks**:

Feature Bagging: build various detectors on random selected features [ALK05]:`pyod.models.feature_bagging.FeatureBagging`

Average&Weighted Average: simply combine scores by averaging [AAS15]:`pyod.models.combination.average()`

Maximization: simply combine scores by taking the maximum across all base detectors [AAS15]:`pyod.models.combination.maximization()`

Average of Maximum (AOM)[AAS15]:`pyod.models.combination.aom()`

Maximum of Average (MOA)[AAS15]:`pyod.models.combination.moa()`

Threshold Sum (Thresh)[AAS15]

**Comparison of all implemented models** are made available below
(Code, Jupyter Notebooks):

For Jupyter Notebooks, please navigate to **“/notebooks/Compare All Models.ipynb”**

# Key APIs & Attributes¶

The following APIs are applicable for all detector models for easy use.

`pyod.models.base.BaseDetector.fit()`

: Fit detector.`pyod.models.base.BaseDetector.fit_predict()`

: Fit detector and predict if a particular sample is an outlier or not.`pyod.models.base.BaseDetector.fit_predict_evaluate()`

: Fit, predict and then evaluate with predefined metrics (ROC and precision @ rank n).`pyod.models.base.BaseDetector.decision_function()`

: Predict anomaly score of X of the base classifiers.`pyod.models.base.BaseDetector.predict()`

: Predict if a particular sample is an outlier or not. The model must be fitted first.`pyod.models.base.BaseDetector.predict_proba()`

: Predict the probability of a sample being outlier. The model must be fitted first.

Key Attributes of a fitted model:

`pyod.models.base.BaseDetector.decision_scores_`

: The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores.`pyod.models.base.BaseDetector.labels_`

: The binary labels of the training data. 0 stands for inliers and 1 for outliers/anomalies.

# Quick Links¶

References

[AAgg15] | Charu C Aggarwal. Outlier analysis. In Data mining, 75–79. Springer, 2015. |

[AAS15] | (1, 2, 3, 4, 5) Charu C Aggarwal and Saket Sathe. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter, 17(1):24–47, 2015. |

[AAP02] | Fabrizio Angiulli and Clara Pizzuti. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery, 15–27. Springer, 2002. |

[ABKNS00] | Markus M Breunig, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander. Lof: identifying density-based local outliers. In ACM sigmod record, volume 29, 93–104. ACM, 2000. |

[AGD12] | Markus Goldstein and Andreas Dengel. Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. KI-2012: Poster and Demo Track, pages 59–63, 2012. |

[AHR04] | Johanna Hardin and David M Rocke. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Computational Statistics & Data Analysis, 44(4):625–638, 2004. |

[AHXD03] | Zengyou He, Xiaofei Xu, and Shengchun Deng. Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9-10):1641–1650, 2003. |

[AJHuszarPvdH12] | JHM Janssens, Ferenc Huszár, EO Postma, and HJ van den Herik. Stochastic outlier selection. Technical Report, Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands, 2012. |

[AKZ+08] | (1, 2) Hans-Peter Kriegel, Arthur Zimek, and others. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 444–452. ACM, 2008. |

[ALK05] | (1, 2) Aleksandar Lazarevic and Vipin Kumar. Feature bagging for outlier detection. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 157–166. ACM, 2005. |

[ALTZ08] | Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In Data Mining, 2008. ICDM‘08. Eighth IEEE International Conference on, 413–422. IEEE, 2008. |

[ALTZ12] | Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1):3, 2012. |

[AMP03] | Junshui Ma and Simon Perkins. Time-series novelty detection using one-class support vector machines. In Neural Networks, 2003. Proceedings of the International Joint Conference on, volume 3, 1741–1745. IEEE, 2003. |

[APKGF03] | Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B Gibbons, and Christos Faloutsos. Loci: fast outlier detection using the local correlation integral. In Data Engineering, 2003. Proceedings. 19th International Conference on, 315–326. IEEE, 2003. |

[ARRS00] | Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim. Efficient algorithms for mining outliers from large data sets. In ACM Sigmod Record, volume 29, 427–438. ACM, 2000. |

[ARD99] | Peter J Rousseeuw and Katrien Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212–223, 1999. |

[ASCSC03] | Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang. A novel anomaly detection scheme based on principal component classifier. Technical Report, MIAMI UNIV CORAL GABLES FL DEPT OF ELECTRICAL AND COMPUTER ENGINEERING, 2003. |

[AZH18a] | Yue Zhao and Maciej K Hryniewicki. Xgbod: improving supervised outlier detection with unsupervised representation learning. In Neural Networks, 2018. Proceedings of the International Joint Conference on. IEEE, 2018. |

[AZH18b] | Yue Zhao and Maciej K. Hryniewicki. Dcso: dynamic combination of detector scores for outlier ensembles. In ACM SIGKDD Workshop on Outlier Detection De-constructed (ODD v5.0). ACM, 2018. |