ADEngine¶

pyod.utils.ad_engine.ADEngine is PyOD’s anomaly detection lifecycle engine. It provides three layers of capability:

Knowledge queries – list detectors, explain detectors, get benchmarks
Detection lifecycle – profile, plan, run, analyze, explain, iterate, report
Session workflow (V3) – start → plan → run → analyze → iterate → report with typed state

See Layer 2: ADEngine Lifecycle Orchestration for usage examples and Layer 3: Agentic Investigation for the agentic workflow.

pyod.utils.ad_engine module¶

ADEngine: anomaly detection lifecycle engine.

Handles data profiling, detection planning, detector construction, and knowledge queries. Works as a standalone Python API (no LLM required) or as the backend for MCP/agent interfaces.

class pyod.utils.ad_engine.ADEngine(knowledge_dir: str | None = None, random_state: int | None = None)[source]¶

Bases: object

Anomaly detection lifecycle engine.

Parameters¶

knowledge_dirstr or None: Path to knowledge base directory. If None, uses bundled.
random_stateint or None, optional: Random seed forwarded to every detector that declares an explicit random_state parameter when the engine instantiates it from a plan. Detectors without random_state in their signature (e.g., ABOD, KNN, LOF, SOD) are deterministic by construction (distance, angle, or density based, with no internal sampling) and need no seed. With this set, the shallow-detector pipeline is reproducible: a run-to-run audit of the shipped shallow detectors found every one either honors the seed or is deterministic by construction, with no nondeterministic cases. Deep detectors additionally depend on framework-level seeding (e.g., torch.manual_seed). Set this to a fixed integer for byte-identical flagged sets across re-runs on the same input.

analyze(state: InvestigationState) → InvestigationState[source]¶

Analyze detection results with quality assessment.

Computes per-detector analysis, consensus analysis, quality metrics (separation, agreement, stability), and selects the best detector.

Parameters¶

state : InvestigationState

Returns¶

state : InvestigationState

analyze_results(result: dict, X: Any = None, top_k: int = 10) → dict[source]¶

Analyze detection results.

Parameters¶

resultdict: Output of run_detection().
Xarray-like or None: Original training data for feature-level analysis.
top_kint: Number of top anomalies to return.

Returns¶

analysis : dict

build_detector(plan: dict) → Any[source]¶

Build and return an unfitted detector from a plan.

Parameters¶

plandict (DetectionPlan): Output of plan_detection().

Returns¶

detector : BaseDetector

compare_detectors(names: list[str] | None = None, data_type: str | None = None, top_k: int = 3) → list[dict][source]¶

Compare detectors.

When names is provided, returns explanations for those detectors in input order.

When names is omitted and data_type has a benchmark-backed ranking in the KB, returns up to top_k detectors ranked by that benchmark, then appends remaining shipped detectors in catalog order until top_k is reached. Two ranking sources are supported: top-level overall_top_5 for benchmarks whose names match PyOD detector names (currently tabular via ADBench); per-detector benchmark_rank metadata when the benchmark lists paper method names (currently time_series via TSB-AD, sorted ascending by the best matching rank key). For modalities without an applicable ranking (graph, text, image, multimodal) or when no data_type is given, falls back to the catalog order from list_detectors.

Parameters¶

nameslist of str or None: Explicit list of detector names to compare.
data_typestr or None: Filter by data type.
top_kint: Number of detectors to return when not using explicit names.

Returns¶

comparison : list of dict

contamination_diagnostics(state: InvestigationState, threshold_sweep: list[float] | None = None) → dict[source]¶

Diagnostic helper for contamination calibration.

Reports the contamination value the run actually used, the actual flagged rate from the consensus, the score-percentile distribution, and (optionally) a threshold sweep showing what fraction would be flagged at each candidate contamination value. The agent can use these numbers to choose a sensible next contamination before iterating.

This helper does NOT estimate contamination automatically and does NOT mutate state. It is purely a read-only diagnostic the agent uses to inform a subsequent engine.iterate(state, {‘action’: ‘adjust_contamination’, ‘value’: <rate>}) call.

Parameters¶

stateInvestigationState: Must be in the ‘analyzed’ phase.
threshold_sweeplist of float or None: Optional sequence of candidate contamination values in (0, 1). For each value c, the result includes the corresponding threshold (the (1 - c) quantile of consensus scores) and the resulting flagged rate. Use this to preview how the flagged set would change before deciding to iterate. Values outside (0, 1) are skipped.

Returns¶

diagnosticsdict

Keys:

effective_contamination (float or None): contamination value from the primary plan’s params, or None if the plan has no contamination set.
flagged_rate (float): actual fraction flagged by the consensus labels.
score_percentiles (dict[int, float]): consensus-score percentiles at the 50th, 75th, 90th, 95th, and 99th.
threshold_sweep (list of dict, optional): present only when threshold_sweep was passed; each entry has contamination, threshold, and flagged_rate.

detect(X_train: Any, X_test: Any = None, data_type: str | None = None, priority: str = 'balanced') → dict[source]¶

One-shot anomaly detection: profile -> plan -> run -> analyze.

Parameters¶

X_trainarray-like: Training data.
X_testarray-like or None: Optional test data.
data_typestr or None: Explicit data type override.
prioritystr: ‘speed’, ‘accuracy’, or ‘balanced’.

Returns¶

resultdict: Output of run_detection() enriched with analysis. Compatible with all Tier B methods (analyze_results, explain_findings, suggest_next_step, generate_report).

explain_detector(name: str) → dict[source]¶

Explain a detector.

Parameters¶

namestr: Detector short name (e.g. ‘ECOD’).

Returns¶

info : dict

explain_findings(result: dict, indices: list[int] | None = None, top_k: int = 5, X: Any = None, feature_names: list[str] | None = None) → list[dict][source]¶

Explain why specific samples were flagged as anomalies.

Parameters¶

resultdict: Output of run_detection().
indiceslist of int or None: Specific sample indices. If None, explains top-k.
top_kint: Number of top anomalies to explain if indices is None.
Xarray-like or None: Original data for feature-level explanations.
feature_nameslist of str or None: Optional feature labels in column order, threaded through to feature_contributions so each contributing feature has a human-readable name. When omitted, names default to f'feature_{column_index}'.

Returns¶

explanationslist of dict: Each entry has 'index', 'score', 'percentile', 'label', 'narrative'. When X is provided, also includes 'contributing_features': a list of dicts with 'feature', 'name', 'value', 'mean', 'z_score', and 'direction'.

generate_report(result: dict, analysis: dict, format: str = 'text') → str[source]¶

Generate a summary report.

Parameters¶

resultdict: Output of run_detection().
analysisdict: Output of analyze_results().
formatstr: ‘text’ (markdown) or ‘json’.

Returns¶

report : str

get_benchmarks(benchmark: str = 'all') → dict[source]¶

Get benchmark results.

Parameters¶

benchmarkstr: Benchmark name, or ‘all’ for everything.

Returns¶

benchmarks : dict

get_kb_for_routing(profile: dict, top_k: int = 3, constraints: dict | None = None) → dict[source]¶

Return a structured KB snapshot for caller-driven detector selection.

This is the agent-facing companion to plan_detection(). plan_detection consumes the KB through hand-coded rules and returns a single plan; get_kb_for_routing exposes the KB directly so a caller (LLM agent, MCP tool client, …) can reason over each detector’s strengths, weaknesses, complexity, and benchmark rank, then call make_plan() to commit a plan.

Parameters¶

profiledict: Output of profile_data(). Must include data_type; n_samples / n_features are passed through unchanged.
top_kint, default 3: The number of detectors the caller intends to select. The KB snapshot itself is returned in full (filtered + sorted); the field is included in the returned dict so the response-format hint can reference it.
constraintsdict or None, optional: {'exclude_detectors': list[str], 'data_type_strict': bool}. exclude_detectors is a hard filter. data_type_strict (default True) drops detectors whose KB data_types field does not include profile['data_type'].

Returns¶

dict: {'task_profile': {...}, 'available_detectors': [...], 'top_k_requested': int, 'response_format_hint': str, 'n_available': int}.

Notes¶

Pure function; no LLM calls, no state mutation.

investigate(X: Any, data_type: str | None = None, priority: str = 'balanced') → InvestigationState[source]¶

One-shot investigation: start → plan → run → analyze.

Parameters¶

Xarray-like: Input data.

data_type : str or None priority : str

Returns¶

state : InvestigationState

iterate(state: InvestigationState, feedback: str | dict) → InvestigationState[source]¶

Iterate based on feedback.

Structured dicts execute immediately. NL strings are parsed with confidence; ambiguous feedback triggers 'confirm_with_user'.

Most actions require phase 'analyzed'. The 'recover' action also accepts phase 'detected' so the agent can substitute failed detectors immediately after run() without first calling analyze().

Parameters¶

state : InvestigationState feedback : str or dict

Returns¶

state : InvestigationState

list_detectors(data_type: str | None = None, status: str = 'shipped') → list[dict][source]¶

List available detectors.

Parameters¶

data_typestr or None: Filter by data type (e.g. ‘tabular’, ‘text’).
statusstr: Filter by status. Use ‘all’ to list everything.

Returns¶

detectors : list of dict

make_plan(detector_choices: list, justifications: list | None = None, params: list | None = None) → dict[source]¶

Commit a caller-driven detector plan and return a DetectionPlan.

Companion to get_kb_for_routing(). The caller (LLM agent, rule engine, human script) selects len(detector_choices) detectors and this method validates names against the KB, fills per-detector defaults, and packages the result as a pyod.utils._kb_router.make_plan()-shaped dict so existing consumers (build_detector, run, downstream MCP clients) keep working unchanged.

Parameters¶

detector_choiceslist of str: Ordered list of detector class names. detector_choices[0] is the primary; the rest become alternatives in plan order. Length must be >= 1. Names must match KB entries (case-sensitive) with status='shipped'; otherwise ValueError is raised.
justificationslist of str, optional: Parallel to detector_choices. One short sentence per choice. None is accepted and yields autogenerated reasons.
paramslist of dict, optional: Parallel to detector_choices. Per-detector constructor kwargs. None -> KB defaults overlaid with the engine’s contamination resolution.

Returns¶

dict: Closed-schema DetectionPlan: {'detector_name', 'params', 'reason', 'evidence', 'confidence', 'alternatives', 'note'}.

Raises¶

ValueError: If detector_choices is empty or any name is unknown / not status='shipped' in the KB.

plan(state: InvestigationState, priority: str = 'balanced', constraints: dict | None = None) → InvestigationState[source]¶

Plan detection: select top-N detectors.

Wraps plan_detection() and extracts primary + alternatives into state.plans (up to 3 detectors, v1 limit).

Parameters¶

state : InvestigationState priority : str constraints : dict or None

Returns¶

state : InvestigationState

plan_detection(profile: dict, priority: str = 'balanced', constraints: dict | None = None, *, top_k: int = 3, llm_client=None, llm_strict: bool | None = None) → dict[source]¶

Plan a detection pipeline.

Parameters¶

profiledict: Output of profile_data().
prioritystr: ‘speed’, ‘accuracy’, or ‘balanced’.
constraintsdict or None: Optional: {‘exclude_detectors’: […]}
top_kint, default 3: Number of detectors in the returned plan (primary + top_k - 1 alternatives). Default 3 preserves the v3.5.2 behaviour (valid[1:3] produced two alternatives plus the primary). Values < 1 are clamped to 1.
llm_clientcallable or None, default None: Optional (prompt: str) -> str callable (see pyod.utils._llm.LLMCallable). When provided, routing consults the LLM with the KB context and parses its response into a plan via pyod.utils._llm.parse_routing_response(). If the LLM call or parser raises, falls back to rule routing with a RuntimeWarning (see llm_strict). When None (default), v3.5.2 rule routing is unchanged.
llm_strictbool or None, default None: Per-call control for LLM-routing failure mode. True re-raises any exception from llm_client or the response parser; False falls back to rule routing with a RuntimeWarning; None defers to the PYOD3_LLM_STRICT environment variable ("1" re-raises, anything else falls back). The explicit kwarg takes precedence so concurrent callers in the same process can choose independently.

Returns¶

plan : dict (DetectionPlan, closed schema)

profile_data(X: Any, data_type: str | None = None) → dict[source]¶

Profile the input data.

Parameters¶

Xarray-like, list, or dict: Input data.
data_typestr or None: Explicit override. One of ‘tabular’, ‘text’, ‘image’, ‘audio’, ‘time_series’, ‘multimodal’, ‘graph’.

Returns¶

profile : dict

report(state: InvestigationState, format: str = 'text') → str | dict[source]¶

Generate investigation report.

Text format wraps generate_report() for best detector, prepending session-level context. JSON format returns a native dict.

Parameters¶

state : InvestigationState format : str

‘text’ or ‘json’.

Returns¶

report : str or dict

run(state: InvestigationState) → InvestigationState[source]¶

Run detection with all planned detectors.

Wraps run_detection() per plan. Computes consensus via rank normalization and majority vote. Records errors per detector without stopping.

Parameters¶

state : InvestigationState

Returns¶

state : InvestigationState

run_detection(X_train: Any, plan: dict, X_test: Any = None) → dict[source]¶

Execute a detection plan.

Parameters¶

X_trainarray-like: Training data.
plandict (DetectionPlan): Output of plan_detection().
X_testarray-like or None: Optional test data.

Returns¶

resultdict: Keys: ‘plan’, ‘scores_train’, ‘labels_train’, ‘threshold’, ‘n_anomalies’, ‘anomaly_ratio’, ‘detector’, ‘runtime_seconds’, ‘score_summary’. If X_test: also ‘scores_test’, ‘labels_test’.

start(X: Any, data_type: str | None = None) → InvestigationState[source]¶

Start an investigation session.

Profiles the data and returns an InvestigationState.

Parameters¶

Xarray-like, Data, list, or dict: Input data (any modality).
data_typestr or None: Explicit type override.

Returns¶

state : InvestigationState

suggest_next_step(result: dict, analysis: dict, feedback: str | None = None) → dict[source]¶

Suggest what to try next.

Parameters¶

resultdict: Output of run_detection().
analysisdict: Output of analyze_results().
feedbackstr or None: User feedback like ‘too many false positives’.

Returns¶

suggestiondict: Keys: ‘action’, ‘reason’, optionally ‘new_plan’, ‘threshold_adjustment’.

validate(state: InvestigationState, y: Any) → dict[source]¶

Hindsight validation of consensus and per-detector results.

Computes label-based metrics from y against the consensus labels and each successful detector, plus a consensus-vs-best-detector diagnostic so the agent can see whether consensus actually helped.

Pure functional; does not mutate state. Use after analyze when held-out labels become available (e.g., a labeled cohort opened post-hoc for hindsight evaluation). For routine unsupervised detection runs, this method is unnecessary.

Parameters¶

stateInvestigationState: Must be in the ‘analyzed’ phase.
yarray-like, shape (n_samples,): Held-out binary labels (0 = inlier, 1 = anomaly). Length must match the consensus.

Returns¶

validationdict

Keys:

consensus (dict): label_metrics for the consensus labels and scores.
per_detector (dict[str, dict]): label_metrics per successful detector, keyed by detector name.
best_detector (dict or None): label_metrics for the detector picked by analyze as best (or None when state.analysis does not name one).
consensus_vs_best (dict): comparison summary with keys consensus_f1, best_detector_f1 (or None), and consensus_helped (True if consensus F1 is at least the best-detector F1; None when no best detector).
false_positives (list[int]): row indices flagged by consensus but inlier in y.
false_negatives (list[int]): row indices not flagged by consensus but anomaly in y.

Raises¶

ValueError: If state is not in ‘analyzed’ phase, if the consensus is missing (all detectors failed), or if len(y) does not match the consensus length.

pyod.utils.investigation module¶

Investigation state for ADEngine session workflow.

class pyod.utils.investigation.InvestigationState(phase: str, iteration: int = 0, history: list = <factory>, data: object = None, profile: dict = <factory>, plans: list = <factory>, results: list = <factory>, consensus: dict = None, analysis: dict = None, quality: dict = None, next_action: dict = <factory>)[source]¶

Bases: object

Typed state object for an ADEngine investigation session.

Tracks the full workflow: profiling, planning, detection, analysis, and iteration. Each session method updates the state and sets next_action to guide the agent.

Attributes¶

phasestr: One of PHASES: ‘profiled’, ‘planned’, ‘detected’, ‘analyzed’.
iterationint: Current iteration (0 = first run).
historylist: List of HistoryEntry dicts.
dataobject: Reference to input data (not copied).
profiledict: Output of profile_data().
planslist: List of DetectionPlan dicts (top-N).
resultslist: List of DetectorResult dicts.
consensusdict or None: ConsensusResult dict.
analysisdict or None: InvestigationAnalysis dict.
qualitydict or None: QualityAssessment dict.
next_actiondict: NextAction dict guiding the agent.

analysis: dict = None¶

consensus: dict = None¶

data: object = None¶

history: list¶

iteration: int = 0¶

next_action: dict¶

phase: str¶

plans: list¶

profile: dict¶

quality: dict = None¶

results: list¶

pyod.utils.knowledge module¶

Knowledge base for PyOD’s intelligent agent layer.

Loads structured JSON files containing algorithm metadata, benchmark results, routing rules, and paper citations.

class pyod.utils.knowledge.KnowledgeBase(knowledge_dir=None)[source]¶

Bases: object

Loader and accessor for PyOD’s structured knowledge base.

Reads JSON files from the knowledge directory and provides query methods for algorithm metadata, benchmarks, and routing.

Parameters¶

knowledge_dirstr or None: Path to knowledge directory. If None, uses the bundled directory shipped with PyOD.

property algorithms¶

property benchmarks¶

get_algorithm(name)[source]¶: Get algorithm metadata by name. Returns None if not found.

list_by_data_type(data_type, status='shipped')[source]¶: List algorithms supporting a given data type.

list_by_status(status)[source]¶: List algorithms with a given status.

property papers¶

property routing_rules¶