Experiment tracking

This guide covers logging a Repository revision as a tracked artifact with provenance, for Weights & Biases and MLflow, and the reproducibility that follows from the revision id.

log_revision records the revision itself, not a copy of the data. A logged run pins the exact commit (or tag) and the vendored lexicon manifest hash the records were generated against, so the dataset behind a run can always be rebuilt. The backend libraries are imported lazily, so importing lairs.integrations.tracking never pulls in wandb or mlflow.

Logging a revision

from lairs.integrations.tracking import log_revision
from lairs.store.repository import Repository

repo = Repository.open("/path/to/repo")
artifact = log_revision(repo.inner, "v1", backend="wandb")
print(artifact)                          # "lairs-revision:v1"

log_revision takes the underlying didactic repository handle, which the lairs Repository wrapper exposes as repo.inner. backend is a keyword-only argument and is "wandb" or "mlflow". The function assembles a ProvenanceBundle and returns the tracked artifact identifier as "<name>:<revision>".

An unrecognized backend raises ValueError. A backend whose optional dependency is missing raises ImportError directing the caller to install lairs[tracking]. The probe runs before any logging, so the error is raised only when that backend is used.

What the provenance pins

The ProvenanceBundle carries:

The two manifest fields are read from the vendored manifest packaged with lairs, so a logged run always records the schema version its records were generated against, independent of the caller.

Reproducibility from the revision id

The revision id is the reproducibility anchor. A tag pins an exact, byte-reproducible set of record values in the Repository, so rebuilding the dataset behind a run is: open the same Repository, resolve the logged revision, and materialize. Pairing the revision with the lexicon manifest hash also pins the schema, so the generated models match the records.

from lairs.store.repository import Repository

repo = Repository.open("/path/to/repo")
repo.resolve("v1")                       # the logged revision resolves to a commit

Requirements

log_revision needs the lairs[tracking] extra (wandb, mlflow) for the backend used. Installing only one backend's library is enough to log to that backend, and the other is probed only when selected.

See also