Layer 1: Tabular Anomaly Detection¶
PyOD has 43 tabular detectors covering probabilistic, linear, proximity, ensemble, and deep learning approaches. All use the same fit/predict/decision_function API.
from pyod.models.iforest import IForest
clf = IForest()
clf.fit(X_train)
y_train_scores = clf.decision_scores_
y_test_scores = clf.decision_function(X_test)
Recommended Starting Points¶
Based on ADBench (NeurIPS 2022, 57 datasets, 30 algorithms):
All Tabular Examples¶
Probabilistic: ECOD, COPOD, ABOD, MAD, SOS, QMCD, KDE, Sampling, GMM
Linear Models: PCA, KPCA, MCD, CD, OCSVM, LMDD
Proximity-Based: LOF, COF, CBLOF, LOCI, HBOS, HDBSCAN, KNN, SOD, ROD
Outlier Ensembles: IForest, INNE, DIF, Feature Bagging, LSCP, XGBOD, LODA, SUOD
Neural Networks: AutoEncoder, VAE, DeepSVDD, SO_GAAL, MO_GAAL, AnoGAN, ALAD, AE1SVM, DevNet
Example Walkthrough¶
Full example: knn_example.py
Import and generate data:
from pyod.models.knn import KNN
from pyod.utils.data import generate_data, evaluate_print
contamination = 0.1
X_train, X_test, y_train, y_test = generate_data(
n_train=200, n_test=100, contamination=contamination)
Fit and predict:
clf = KNN()
clf.fit(X_train)
y_train_pred = clf.labels_ # 0: inlier, 1: outlier
y_train_scores = clf.decision_scores_ # raw scores
y_test_pred = clf.predict(X_test)
y_test_scores = clf.decision_function(X_test)
Evaluate:
evaluate_print('KNN', y_test, y_test_scores)
# KNN ROC:0.9989, precision @ rank n:0.9