Benchmarks

PyOD’s detector catalog is backed by three peer-reviewed benchmark suites. The ADEngine routing rules pull their recommendations directly from these studies, so the suggestions users get from Layer 2 and Layer 3 are tied to reproducible evidence.

ADBench (Tabular)

ADBench [AHHH+22] is a 45-page study evaluating 30 anomaly detection algorithms on 57 tabular benchmark datasets (NeurIPS 2022). It is the de-facto reference for PyOD’s tabular detector routing.

ADBench organization

For a simpler visualization, see the comparison driver compare_all_models.py.

Comparison of all tabular detectors

TSB-AD (Time Series)

TSB-AD [ALP24] is a time-series anomaly detection benchmark of 40 algorithms across 1,070 datasets (NeurIPS 2024). PyOD ships 5 ADEngine-routed stable time-series detectors (TimeSeriesOD, MatrixProfile, SpectralResidual, KShape #2, LSTMAD), selected by ADEngine based on TSB-AD rankings, plus 2 experimental implementations (SAND, AnomalyTransformer) that are available via direct class import but not yet included in routing. See Layer 1: Time Series Anomaly Detection for usage.

BOND (Graph)

BOND [ALDZ+22] benchmarks 14 graph anomaly detection algorithms on 14 datasets (NeurIPS 2022). PyOD’s graph detectors (DOMINANT #1 deep, CoLA #2 deep, CONAD, AnomalyDAE, GUIDE, Radar, ANOMALOUS, SCAN) are routed by ADEngine based on BOND results. See Layer 1: Graph Anomaly Detection for usage.

NLP-ADBench (Text)

NLP-ADBench evaluates 19 methods on 8 text datasets. A key finding is that a two-step approach (foundation model embeddings + an unsupervised detector) beats end-to-end NLP anomaly detection. PyOD implements this as EmbeddingOD. See Layer 1: Text and Image Anomaly Detection for usage.