Model Save and Load =================== PyOD ships a small, versioned wrapper around ``joblib`` that solves two recurring pain points: cross-sklearn-version compatibility for saved models, and the absence of any record of *what* a saved model was fit with. The recommended API lives in :mod:`pyod.utils.persistence`. Quick Start ----------- .. code-block:: python from pyod.models.iforest import IForest from pyod.utils.persistence import save, load clf = IForest().fit(X_train) # Save with a versioned envelope. save(clf, "clf.pyod.joblib", metadata={"dataset": "demo"}) # Later, in a possibly different environment: clf = load("clf.pyod.joblib") # Or get the envelope back alongside the model: clf, env = load("clf.pyod.joblib", return_metadata=True) print(env["sklearn_version"], env["saved_at"]) The complete example in `examples/save_load_model_example.py `_ also covers the legacy ``joblib.dump`` / ``joblib.load`` flow as a secondary alternative. Trust Boundary -------------- ``pickle`` and ``joblib`` deserialize arbitrary Python code. Load only from sources you trust. This applies equally to raw ``joblib.load``, raw ``pickle.load``, :func:`~pyod.utils.persistence.load`, and :func:`~pyod.utils.persistence.compat_load`. The new wrapper does not change this security model; it does not sandbox the unpickling step. Why a Versioned Wrapper ----------------------- Saving a fitted detector with plain ``joblib.dump`` writes the model and nothing else. When a downstream user later calls ``joblib.load``, the running environment's sklearn, numpy, scipy, joblib, and Python versions may differ from the save environment in ways that change predictions or break loading outright. Users on PyOD have reported this exact failure mode (see issue `#519 `_) when sklearn evolves its internal pickle layout; the error message is ``ValueError: node array from the pickle has an incompatible dtype``. :func:`~pyod.utils.persistence.save` records the dependency versions in effect at save time alongside the model. :func:`~pyod.utils.persistence.load` reads that envelope and emits a clear warning when any binary-format dependency drifts, so the issue surfaces at load time rather than during a later prediction incident. The schema is documented and stable; future PyOD releases will read envelopes written by earlier ones. Loading Legacy Pickles ---------------------- If you already have artifacts saved with raw ``joblib.dump`` and they fail to load with the dtype-mismatch error, :func:`~pyod.utils.persistence.compat_load` repairs the most common case: sklearn introduced a new Tree-node field (``missing_go_to_left`` in 1.3) and old pickles do not carry it. ``compat_load`` patches joblib's unpickler so the saved Tree state is realigned to the running sklearn's dtype before sklearn's own ``__setstate__`` raises. .. code-block:: python from pyod.utils.persistence import compat_load clf = compat_load("legacy.joblib") # Re-save under the new envelope to avoid repeating the dance: from pyod.utils.persistence import save save(clf, "legacy_resaved.pyod.joblib") You usually do not need to call ``compat_load`` directly. :func:`~pyod.utils.persistence.load` falls through to ``compat_load`` automatically when ``joblib.load`` raises the documented dtype error, and routes the recovered model through the same envelope or legacy handler: .. code-block:: python from pyod.utils.persistence import load clf = load("legacy.joblib") # transparently recovers from dtype drift The fall-through emits a ``UserWarning`` so the recovery does not go unnoticed. Re-save with :func:`~pyod.utils.persistence.save` (or re-fit on the current sklearn) to remove the dependency on the compat path. Decision Tree ~~~~~~~~~~~~~ :: Saving a new model? -> use save(clf, path) Loading a model and load(path) works without warnings? -> done Loading a model and load(path) succeeds with a "recovered" warning? -> the artifact was repaired via compat_load; re-save with save() Loading a model and load(path) raises? -> if the error is about Tree-node dtype, try compat_load directly and check whether the warning recommends re-fit. If it cannot recover, re-fit on the current sklearn. Cross-Sklearn-Version Compatibility ----------------------------------- The most common cross-version failure is the sklearn Tree node dtype evolving across minor releases. sklearn 1.3 added a ``missing_go_to_left`` field to its Tree node struct; older pickles omit that field, and loading them on 1.3 or later raises ``ValueError: node array from the pickle has an incompatible dtype``. :func:`~pyod.utils.persistence.compat_load` is the supported escape hatch for this case. It is allowlist-driven and conservative: * Missing fields in the saved dtype that PyOD has documented a safe default for (currently only ``missing_go_to_left = 0``, the pre-1.3 "do not route on missingness" behavior) are zero-filled. * Missing fields without a documented default raise ``ValueError`` rather than silently inventing a value. * Field-level dtype changes beyond byte order (kind, signedness, itemsize, shape) raise ``ValueError`` rather than silently casting. * Byte-order-only differences are realigned safely. Two caveats apply. First, ``compat_load`` is best-effort: predictions on inputs that contain missing values may differ from what the original training would have produced, because zero-filled defaults for fields like ``missing_go_to_left`` need not match what the original training would have implied. The durable fix is to re-fit on the current sklearn. Second, ``compat_load`` only repairs the Tree node dtype. Other cross-version sklearn changes (newly required private cached state, newly added class attributes) are out of scope. If ``compat_load`` succeeds but predictions still fail with a different sklearn error, re-fit on the current sklearn. Troubleshooting ~~~~~~~~~~~~~~~ ================================================================== ================================================================== Error text starts with Recommended action ================================================================== ================================================================== ``node array from the pickle has an incompatible dtype`` Try :func:`~pyod.utils.persistence.compat_load`. If it succeeds, re-save with :func:`~pyod.utils.persistence.save`. If it raises, re-fit. ``InconsistentVersionWarning`` (only a warning, not an error) Safe to ignore; sklearn is reminding you the save and run versions differ. Re-save or re-fit when convenient. Other sklearn unpickling errors The artifact is incompatible beyond what ``compat_load`` repairs. Re-fit on the current sklearn. ================================================================== ================================================================== Strict Mode ----------- For version-pinned production environments, pass ``strict=True`` to :func:`~pyod.utils.persistence.load`: .. code-block:: python from pyod.utils.persistence import load clf = load("prod.pyod.joblib", strict=True) Under strict mode, any drift in sklearn, joblib, numpy, or scipy raises ``ValueError`` rather than emitting a warning. Drift in the Python version does not raise because it is informational only. Strict mode also rejects raw legacy artifacts (no envelope to compare against) and refuses to return a model that required a ``compat_load`` repair: strict callers must either re-save under the current environment or re-fit. Reading Envelope Metadata ------------------------- ``load(path, return_metadata=True)`` returns a ``(model, envelope)`` tuple where ``envelope`` is the full envelope dict minus the ``model`` field: .. code-block:: python from pyod.utils.persistence import load clf, env = load("clf.pyod.joblib", return_metadata=True) print(env["pyod_version"], env["sklearn_version"]) print(env["saved_at"], env["model_class"]) print(env["metadata"]) # whatever you passed to save(... metadata=...) A future PyOD release plans a true header-only ``inspect_artifact`` (reading metadata without unpickling the model), paired with a ``.pyod`` zip container that separates metadata from the model payload. Until that ships, ``load(..., return_metadata=True)`` is the supported way to introspect a saved artifact, and it does unpickle the model. Neural Network Models --------------------- Saving deep-learning detectors that wrap ``torch.nn.Module`` (e.g., ``AutoEncoder``, ``DeepSVDD``, ``VAE``) has separate constraints that this module does not yet address; see issues `#88 `_ and `#328 `_ for the current workaround.