ADEngine

pyod.utils.ad_engine.ADEngine is PyOD’s anomaly detection lifecycle engine. It provides three layers of capability:

  • Knowledge queries – list detectors, explain detectors, get benchmarks

  • Detection lifecycle – profile, plan, run, analyze, explain, iterate, report

  • Session workflow (V3) – start → plan → run → analyze → iterate → report with typed state

See Layer 2: ADEngine Lifecycle Orchestration for usage examples and Layer 3: Agentic Investigation for the agentic workflow.

pyod.utils.ad_engine module

ADEngine: anomaly detection lifecycle engine.

Handles data profiling, detection planning, detector construction, and knowledge queries. Works as a standalone Python API (no LLM required) or as the backend for MCP/agent interfaces.

class pyod.utils.ad_engine.ADEngine(knowledge_dir: str | None = None, random_state: int | None = None)[source]

Bases: object

Anomaly detection lifecycle engine.

Parameters

knowledge_dirstr or None

Path to knowledge base directory. If None, uses bundled.

random_stateint or None, optional

Random seed forwarded to every detector that declares an explicit random_state parameter when the engine instantiates it from a plan. Detectors without random_state in their signature (e.g., ABOD, KNN, LOF, SOD) are deterministic by construction (distance, angle, or density based, with no internal sampling) and need no seed. With this set, the shallow-detector pipeline is reproducible: a run-to-run audit of the shipped shallow detectors found every one either honors the seed or is deterministic by construction, with no nondeterministic cases. Deep detectors additionally depend on framework-level seeding (e.g., torch.manual_seed). Set this to a fixed integer for byte-identical flagged sets across re-runs on the same input.

analyze(state: InvestigationState) InvestigationState[source]

Analyze detection results with quality assessment.

Computes per-detector analysis, consensus analysis, quality metrics (separation, agreement, stability), and selects the best detector.

Parameters

state : InvestigationState

Returns

state : InvestigationState

analyze_results(result: dict, X: Any = None, top_k: int = 10) dict[source]

Analyze detection results.

Parameters

resultdict

Output of run_detection().

Xarray-like or None

Original training data for feature-level analysis.

top_kint

Number of top anomalies to return.

Returns

analysis : dict

build_detector(plan: dict) Any[source]

Build and return an unfitted detector from a plan.

Parameters

plandict (DetectionPlan)

Output of plan_detection().

Returns

detector : BaseDetector

compare_detectors(names: list[str] | None = None, data_type: str | None = None, top_k: int = 3) list[dict][source]

Compare detectors.

When names is provided, returns explanations for those detectors in input order.

When names is omitted and data_type has a benchmark-backed ranking in the KB, returns up to top_k detectors ranked by that benchmark, then appends remaining shipped detectors in catalog order until top_k is reached. Two ranking sources are supported: top-level overall_top_5 for benchmarks whose names match PyOD detector names (currently tabular via ADBench); per-detector benchmark_rank metadata when the benchmark lists paper method names (currently time_series via TSB-AD, sorted ascending by the best matching rank key). For modalities without an applicable ranking (graph, text, image, multimodal) or when no data_type is given, falls back to the catalog order from list_detectors.

Parameters

nameslist of str or None

Explicit list of detector names to compare.

data_typestr or None

Filter by data type.

top_kint

Number of detectors to return when not using explicit names.

Returns

comparison : list of dict

contamination_diagnostics(state: InvestigationState, threshold_sweep: list[float] | None = None) dict[source]

Diagnostic helper for contamination calibration.

Reports the contamination value the run actually used, the actual flagged rate from the consensus, the score-percentile distribution, and (optionally) a threshold sweep showing what fraction would be flagged at each candidate contamination value. The agent can use these numbers to choose a sensible next contamination before iterating.

This helper does NOT estimate contamination automatically and does NOT mutate state. It is purely a read-only diagnostic the agent uses to inform a subsequent engine.iterate(state, {‘action’: ‘adjust_contamination’, ‘value’: <rate>}) call.

Parameters

stateInvestigationState

Must be in the ‘analyzed’ phase.

threshold_sweeplist of float or None

Optional sequence of candidate contamination values in (0, 1). For each value c, the result includes the corresponding threshold (the (1 - c) quantile of consensus scores) and the resulting flagged rate. Use this to preview how the flagged set would change before deciding to iterate. Values outside (0, 1) are skipped.

Returns

diagnosticsdict

Keys:

  • effective_contamination (float or None): contamination value from the primary plan’s params, or None if the plan has no contamination set.

  • flagged_rate (float): actual fraction flagged by the consensus labels.

  • score_percentiles (dict[int, float]): consensus-score percentiles at the 50th, 75th, 90th, 95th, and 99th.

  • threshold_sweep (list of dict, optional): present only when threshold_sweep was passed; each entry has contamination, threshold, and flagged_rate.

detect(X_train: Any, X_test: Any = None, data_type: str | None = None, priority: str = 'balanced') dict[source]

One-shot anomaly detection: profile -> plan -> run -> analyze.

Parameters

X_trainarray-like

Training data.

X_testarray-like or None

Optional test data.

data_typestr or None

Explicit data type override.

prioritystr

‘speed’, ‘accuracy’, or ‘balanced’.

Returns

resultdict

Output of run_detection() enriched with analysis. Compatible with all Tier B methods (analyze_results, explain_findings, suggest_next_step, generate_report).

explain_detector(name: str) dict[source]

Explain a detector.

Parameters

namestr

Detector short name (e.g. ‘ECOD’).

Returns

info : dict

explain_findings(result: dict, indices: list[int] | None = None, top_k: int = 5, X: Any = None, feature_names: list[str] | None = None) list[dict][source]

Explain why specific samples were flagged as anomalies.

Parameters

resultdict

Output of run_detection().

indiceslist of int or None

Specific sample indices. If None, explains top-k.

top_kint

Number of top anomalies to explain if indices is None.

Xarray-like or None

Original data for feature-level explanations.

feature_nameslist of str or None

Optional feature labels in column order, threaded through to feature_contributions so each contributing feature has a human-readable name. When omitted, names default to f'feature_{column_index}'.

Returns

explanationslist of dict

Each entry has 'index', 'score', 'percentile', 'label', 'narrative'. When X is provided, also includes 'contributing_features': a list of dicts with 'feature', 'name', 'value', 'mean', 'z_score', and 'direction'.

generate_report(result: dict, analysis: dict, format: str = 'text') str[source]

Generate a summary report.

Parameters

resultdict

Output of run_detection().

analysisdict

Output of analyze_results().

formatstr

‘text’ (markdown) or ‘json’.

Returns

report : str

get_benchmarks(benchmark: str = 'all') dict[source]

Get benchmark results.

Parameters

benchmarkstr

Benchmark name, or ‘all’ for everything.

Returns

benchmarks : dict

get_kb_for_routing(profile: dict, top_k: int = 3, constraints: dict | None = None) dict[source]

Return a structured KB snapshot for caller-driven detector selection.

This is the agent-facing companion to plan_detection(). plan_detection consumes the KB through hand-coded rules and returns a single plan; get_kb_for_routing exposes the KB directly so a caller (LLM agent, MCP tool client, …) can reason over each detector’s strengths, weaknesses, complexity, and benchmark rank, then call make_plan() to commit a plan.

Parameters

profiledict

Output of profile_data(). Must include data_type; n_samples / n_features are passed through unchanged.

top_kint, default 3

The number of detectors the caller intends to select. The KB snapshot itself is returned in full (filtered + sorted); the field is included in the returned dict so the response-format hint can reference it.

constraintsdict or None, optional

{'exclude_detectors': list[str], 'data_type_strict': bool}. exclude_detectors is a hard filter. data_type_strict (default True) drops detectors whose KB data_types field does not include profile['data_type'].

Returns

dict

{'task_profile': {...}, 'available_detectors': [...], 'top_k_requested': int, 'response_format_hint': str, 'n_available': int}.

Notes

Pure function; no LLM calls, no state mutation.

investigate(X: Any, data_type: str | None = None, priority: str = 'balanced') InvestigationState[source]

One-shot investigation: start → plan → run → analyze.

Parameters

Xarray-like

Input data.

data_type : str or None priority : str

Returns

state : InvestigationState

iterate(state: InvestigationState, feedback: str | dict) InvestigationState[source]

Iterate based on feedback.

Structured dicts execute immediately. NL strings are parsed with confidence; ambiguous feedback triggers 'confirm_with_user'.

Most actions require phase 'analyzed'. The 'recover' action also accepts phase 'detected' so the agent can substitute failed detectors immediately after run() without first calling analyze().

Parameters

state : InvestigationState feedback : str or dict

Returns

state : InvestigationState

list_detectors(data_type: str | None = None, status: str = 'shipped') list[dict][source]

List available detectors.

Parameters

data_typestr or None

Filter by data type (e.g. ‘tabular’, ‘text’).

statusstr

Filter by status. Use ‘all’ to list everything.

Returns

detectors : list of dict

make_plan(detector_choices: list, justifications: list | None = None, params: list | None = None) dict[source]

Commit a caller-driven detector plan and return a DetectionPlan.

Companion to get_kb_for_routing(). The caller (LLM agent, rule engine, human script) selects len(detector_choices) detectors and this method validates names against the KB, fills per-detector defaults, and packages the result as a pyod.utils._kb_router.make_plan()-shaped dict so existing consumers (build_detector, run, downstream MCP clients) keep working unchanged.

Parameters

detector_choiceslist of str

Ordered list of detector class names. detector_choices[0] is the primary; the rest become alternatives in plan order. Length must be >= 1. Names must match KB entries (case-sensitive) with status='shipped'; otherwise ValueError is raised.

justificationslist of str, optional

Parallel to detector_choices. One short sentence per choice. None is accepted and yields autogenerated reasons.

paramslist of dict, optional

Parallel to detector_choices. Per-detector constructor kwargs. None -> KB defaults overlaid with the engine’s contamination resolution.

Returns

dict

Closed-schema DetectionPlan: {'detector_name', 'params', 'reason', 'evidence', 'confidence', 'alternatives', 'note'}.

Raises

ValueError

If detector_choices is empty or any name is unknown / not status='shipped' in the KB.

plan(state: InvestigationState, priority: str = 'balanced', constraints: dict | None = None) InvestigationState[source]

Plan detection: select top-N detectors.

Wraps plan_detection() and extracts primary + alternatives into state.plans (up to 3 detectors, v1 limit).

Parameters

state : InvestigationState priority : str constraints : dict or None

Returns

state : InvestigationState

plan_detection(profile: dict, priority: str = 'balanced', constraints: dict | None = None, *, top_k: int = 3, llm_client=None, llm_strict: bool | None = None) dict[source]

Plan a detection pipeline.

Parameters

profiledict

Output of profile_data().

prioritystr

‘speed’, ‘accuracy’, or ‘balanced’.

constraintsdict or None

Optional: {‘exclude_detectors’: […]}

top_kint, default 3

Number of detectors in the returned plan (primary + top_k - 1 alternatives). Default 3 preserves the v3.5.2 behaviour (valid[1:3] produced two alternatives plus the primary). Values < 1 are clamped to 1.

llm_clientcallable or None, default None

Optional (prompt: str) -> str callable (see pyod.utils._llm.LLMCallable). When provided, routing consults the LLM with the KB context and parses its response into a plan via pyod.utils._llm.parse_routing_response(). If the LLM call or parser raises, falls back to rule routing with a RuntimeWarning (see llm_strict). When None (default), v3.5.2 rule routing is unchanged.

llm_strictbool or None, default None

Per-call control for LLM-routing failure mode. True re-raises any exception from llm_client or the response parser; False falls back to rule routing with a RuntimeWarning; None defers to the PYOD3_LLM_STRICT environment variable ("1" re-raises, anything else falls back). The explicit kwarg takes precedence so concurrent callers in the same process can choose independently.

Returns

plan : dict (DetectionPlan, closed schema)

profile_data(X: Any, data_type: str | None = None) dict[source]

Profile the input data.

Parameters

Xarray-like, list, or dict

Input data.

data_typestr or None

Explicit override. One of ‘tabular’, ‘text’, ‘image’, ‘audio’, ‘time_series’, ‘multimodal’, ‘graph’.

Returns

profile : dict

report(state: InvestigationState, format: str = 'text') str | dict[source]

Generate investigation report.

Text format wraps generate_report() for best detector, prepending session-level context. JSON format returns a native dict.

Parameters

state : InvestigationState format : str

‘text’ or ‘json’.

Returns

report : str or dict

run(state: InvestigationState) InvestigationState[source]

Run detection with all planned detectors.

Wraps run_detection() per plan. Computes consensus via rank normalization and majority vote. Records errors per detector without stopping.

Parameters

state : InvestigationState

Returns

state : InvestigationState

run_detection(X_train: Any, plan: dict, X_test: Any = None) dict[source]

Execute a detection plan.

Parameters

X_trainarray-like

Training data.

plandict (DetectionPlan)

Output of plan_detection().

X_testarray-like or None

Optional test data.

Returns

resultdict

Keys: ‘plan’, ‘scores_train’, ‘labels_train’, ‘threshold’, ‘n_anomalies’, ‘anomaly_ratio’, ‘detector’, ‘runtime_seconds’, ‘score_summary’. If X_test: also ‘scores_test’, ‘labels_test’.

start(X: Any, data_type: str | None = None) InvestigationState[source]

Start an investigation session.

Profiles the data and returns an InvestigationState.

Parameters

Xarray-like, Data, list, or dict

Input data (any modality).

data_typestr or None

Explicit type override.

Returns

state : InvestigationState

suggest_next_step(result: dict, analysis: dict, feedback: str | None = None) dict[source]

Suggest what to try next.

Parameters

resultdict

Output of run_detection().

analysisdict

Output of analyze_results().

feedbackstr or None

User feedback like ‘too many false positives’.

Returns

suggestiondict

Keys: ‘action’, ‘reason’, optionally ‘new_plan’, ‘threshold_adjustment’.

validate(state: InvestigationState, y: Any) dict[source]

Hindsight validation of consensus and per-detector results.

Computes label-based metrics from y against the consensus labels and each successful detector, plus a consensus-vs-best-detector diagnostic so the agent can see whether consensus actually helped.

Pure functional; does not mutate state. Use after analyze when held-out labels become available (e.g., a labeled cohort opened post-hoc for hindsight evaluation). For routine unsupervised detection runs, this method is unnecessary.

Parameters

stateInvestigationState

Must be in the ‘analyzed’ phase.

yarray-like, shape (n_samples,)

Held-out binary labels (0 = inlier, 1 = anomaly). Length must match the consensus.

Returns

validationdict

Keys:

  • consensus (dict): label_metrics for the consensus labels and scores.

  • per_detector (dict[str, dict]): label_metrics per successful detector, keyed by detector name.

  • best_detector (dict or None): label_metrics for the detector picked by analyze as best (or None when state.analysis does not name one).

  • consensus_vs_best (dict): comparison summary with keys consensus_f1, best_detector_f1 (or None), and consensus_helped (True if consensus F1 is at least the best-detector F1; None when no best detector).

  • false_positives (list[int]): row indices flagged by consensus but inlier in y.

  • false_negatives (list[int]): row indices not flagged by consensus but anomaly in y.

Raises

ValueError

If state is not in ‘analyzed’ phase, if the consensus is missing (all detectors failed), or if len(y) does not match the consensus length.

pyod.utils.investigation module

Investigation state for ADEngine session workflow.

class pyod.utils.investigation.InvestigationState(phase: str, iteration: int = 0, history: list = <factory>, data: object = None, profile: dict = <factory>, plans: list = <factory>, results: list = <factory>, consensus: dict = None, analysis: dict = None, quality: dict = None, next_action: dict = <factory>)[source]

Bases: object

Typed state object for an ADEngine investigation session.

Tracks the full workflow: profiling, planning, detection, analysis, and iteration. Each session method updates the state and sets next_action to guide the agent.

Attributes

phasestr

One of PHASES: ‘profiled’, ‘planned’, ‘detected’, ‘analyzed’.

iterationint

Current iteration (0 = first run).

historylist

List of HistoryEntry dicts.

dataobject

Reference to input data (not copied).

profiledict

Output of profile_data().

planslist

List of DetectionPlan dicts (top-N).

resultslist

List of DetectorResult dicts.

consensusdict or None

ConsensusResult dict.

analysisdict or None

InvestigationAnalysis dict.

qualitydict or None

QualityAssessment dict.

next_actiondict

NextAction dict guiding the agent.

analysis: dict = None
consensus: dict = None
data: object = None
history: list
iteration: int = 0
next_action: dict
phase: str
plans: list
profile: dict
quality: dict = None
results: list

pyod.utils.knowledge module

Knowledge base for PyOD’s intelligent agent layer.

Loads structured JSON files containing algorithm metadata, benchmark results, routing rules, and paper citations.

class pyod.utils.knowledge.KnowledgeBase(knowledge_dir=None)[source]

Bases: object

Loader and accessor for PyOD’s structured knowledge base.

Reads JSON files from the knowledge directory and provides query methods for algorithm metadata, benchmarks, and routing.

Parameters

knowledge_dirstr or None

Path to knowledge directory. If None, uses the bundled directory shipped with PyOD.

property algorithms
property benchmarks
get_algorithm(name)[source]

Get algorithm metadata by name. Returns None if not found.

list_by_data_type(data_type, status='shipped')[source]

List algorithms supporting a given data type.

list_by_status(status)[source]

List algorithms with a given status.

property papers
property routing_rules