Store¶
The store holds records and their derived views: an in-memory pool with cross-reference resolution, a schematic version-control repository, the rebuildable Arrow views, and a content-addressed blob cache.
Pool¶
An in-memory pool keyed by AT-URI, resolving cross-refs and back-refs over the loaded record set.
lairs.store.pool ¶
In-memory model pool with cross-reference resolution.
Maps AT-URIs to model instances and resolves cross-refs and back-refs, built
on top of didactic's :class:didactic.api.ModelPool and back-ref machinery.
The pool is the default surface for "load a corpus and work with it now". It
keeps every loaded record indexed by its AT-URI, resolves the AT-URI strings
that records carry as cross-references back to model instances, and answers
back-reference queries (which records point at a given target). Resolution
degrades gracefully: when a referenced AT-URI is not present in the pool the
string is preserved and resolve reports the absence rather than raising on
a missing target.
ModelPool ¶
An in-memory pool of records keyed by AT-URI.
Wraps a :class:didactic.api.ModelPool (which indexes instances by their
concrete class) with an AT-URI index, so records can be looked up by their
canonical address and their cross-references resolved.
| ATTRIBUTE | DESCRIPTION |
|---|---|
inner |
The underlying class-indexed pool that backs
TYPE:
|
uris ¶
Return the AT-URIs of every record in the pool.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The AT-URIs, in insertion order. |
models ¶
Return every model instance held in the pool.
| RETURNS | DESCRIPTION |
|---|---|
list of didactic.api.Model
|
The models, in insertion order. |
add ¶
Add a model to the pool under its AT-URI.
Adding the same AT-URI twice replaces the earlier instance in the AT-URI index. The model is also registered with the underlying class-indexed didactic pool.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the record.
TYPE:
|
model
|
The decoded model instance.
TYPE:
|
get ¶
Return the model stored under uri, or None if absent.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI to look up.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Model or None
|
The model, or |
resolve ¶
Resolve a reference to its target model.
A reference is an AT-URI string (the same form held by dx.Ref fields
and free AT-URI fields). Resolution degrades gracefully: when the target
is not loaded, None is returned and the caller may keep using the
AT-URI string.
| PARAMETER | DESCRIPTION |
|---|---|
ref
|
The AT-URI to resolve.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Model or None
|
The target model, or |
refs_of ¶
Return the AT-URIs that the record at uri points at.
Walks the dumped record generically, collecting every AT-URI-shaped string nested anywhere in its fields (including embedded objects, arrays, and union members). Duplicate targets are reported once, in first-seen order, and a record never lists itself.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the referring record.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The distinct AT-URIs referenced by the record, or an empty list when the record is absent. |
backrefs ¶
List the models that reference a target.
Scans every record in the pool and returns those that hold the target AT-URI anywhere in their fields. A target that is itself absent from the pool still resolves its back-references, so inbound links to a not-yet-loaded record are discoverable.
| PARAMETER | DESCRIPTION |
|---|---|
target
|
The AT-URI of the referenced record.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list of didactic.api.Model
|
The referring models, in insertion order. |
backref_uris ¶
List the AT-URIs of the records that reference a target.
| PARAMETER | DESCRIPTION |
|---|---|
target
|
The AT-URI of the referenced record.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The AT-URIs of the referring records, in insertion order. |
Repository¶
A wrapper over didactic.api.Repository for Layers records: a corpus
snapshot is a commit and a named dataset version is a tag.
lairs.store.repository ¶
didactic Repository wrapper: the on-disk schematic version control.
Wraps :class:didactic.api.Repository (panproto-backed, content-addressed,
git-like) so a corpus snapshot is a commit and a named dataset version is a tag,
giving reproducibility, provenance, and cheap diffing.
add stages a Model class (or a panproto.Schema) and records the
structural schema, while add_data stages a record's value as committed data
keyed by AT-URI and associated with that schema. lairs writes each record's
value as JSON under records/, stages it as committed data under its AT-URI,
and stages the record type's Model schema alongside it, so one commit captures
both. A corpus snapshot is a single commit; the values committed at a revision
are read back through data_at under their AT-URI keys, so a tag pins an
exact, byte-reproducible set of values, and a revision-to-revision diff compares
the values folded over each revision's commit ancestry.
Removal is representable: forget stages a tombstone under a record's AT-URI,
which the ancestry fold drops, so a record present at a base revision can be
absent at a descendant head and a diff reports it in removed.
didactic 0.9.0 exposes tag creation (create_tag and friends), the
committed-data write (add_data), and the committed-data read (data_at)
on the public Repository surface, all of which this wrapper uses directly.
RecordDiff ¶
Bases: Model
A structural diff of the record set between two revisions.
| ATTRIBUTE | DESCRIPTION |
|---|---|
added |
AT-URIs present at the head revision but not the base revision.
TYPE:
|
removed |
AT-URIs present at the base revision but not the head revision.
TYPE:
|
changed |
AT-URIs present in both revisions whose stored value differs.
TYPE:
|
Repository ¶
A wrapper over a didactic Repository for Layers records.
The repository is created on disk with :meth:init or reopened with
:meth:open; the constructor takes an already-open inner handle and is
mostly for internal use.
| PARAMETER | DESCRIPTION |
|---|---|
inner
|
An already-open didactic repository handle.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
inner |
The wrapped didactic repository.
TYPE:
|
path |
The repository working directory.
TYPE:
|
init
classmethod
¶
init(path: Path) -> Repository
Initialise a new repository at path.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Directory in which to create the repository.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Repository
|
A handle to the newly initialised repository. |
open
classmethod
¶
open(path: Path) -> Repository
Open an existing repository at path.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Directory containing an existing repository.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Repository
|
A handle to the existing repository. |
save ¶
Stage a record value and its schema for the next commit.
The record value is written as JSON into the working tree, indexed by its AT-URI, and staged as committed data so a revision pins the exact value. The record type's Model schema is staged alongside it, so the commit captures both the data and its structure.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the record.
TYPE:
|
model
|
The record value to persist.
TYPE:
|
forget ¶
Stage a record's removal for the next commit.
Removes the AT-URI from the working-tree index and its JSON file, and
stages a tombstone as committed data under the same key. The tombstone
is folded by :meth:_state_at, so the record is absent from the
reconstructed value set at the committed revision and a
revision-to-revision :meth:diff reports it in removed.
Staging a tombstone requires a schema to bind to, so the repository must already have at least one commit (the record's type schema was committed when it was first saved).
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the record to remove.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
KeyError
|
If the AT-URI is not present in the working tree. |
staged_uris ¶
Return the AT-URIs currently present in the working tree.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The AT-URIs of records written to the working tree, sorted. |
load ¶
Load a record value from the working tree by AT-URI.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the record to load.
TYPE:
|
model_cls
|
The Model class to validate the stored JSON against.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Model or None
|
The validated model, or |
load_raw ¶
load_raw(uri: str) -> JsonValue | None
Load a stored record value as raw JSON by AT-URI.
| PARAMETER | DESCRIPTION |
|---|---|
uri
|
The AT-URI of the record to load.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
JsonValue or None
|
The decoded JSON value, or |
commit ¶
Commit the staged records as a corpus snapshot.
| PARAMETER | DESCRIPTION |
|---|---|
message
|
The commit message.
TYPE:
|
author
|
The commit author, in conventional
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The new revision identifier. |
head ¶
Return the current head revision, or None for an empty repository.
| RETURNS | DESCRIPTION |
|---|---|
str or None
|
The head commit id, or |
log ¶
log() -> list[dict[str, JsonValue]]
Return the commit log, newest first.
| RETURNS | DESCRIPTION |
|---|---|
list of dict
|
One commit-record dict per commit, newest first. |
tag ¶
Tag a revision as a named dataset version.
A tag pins the exact record values committed at the revision, giving a reproducible named version.
| PARAMETER | DESCRIPTION |
|---|---|
name
|
The tag name.
TYPE:
|
revision
|
The revision to tag; defaults to the current head.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no revision is given and the repository has no commits. |
tags ¶
Return the list of tags.
| RETURNS | DESCRIPTION |
|---|---|
list of (str, str)
|
One |
resolve ¶
Resolve a ref expression to a commit id.
| PARAMETER | DESCRIPTION |
|---|---|
ref
|
A branch name, tag name, or commit-id prefix.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The full commit id. |
diff ¶
diff(base: str, head: str) -> RecordDiff
Diff the committed record values between two revisions.
The value set committed at each revision is reconstructed from the
committed data read with data_at, keyed by AT-URI, and the two sets
are compared by content.
| PARAMETER | DESCRIPTION |
|---|---|
base
|
The base revision (ref expression).
TYPE:
|
head
|
The head revision (ref expression).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
RecordDiff
|
The added, removed, and changed AT-URIs between the revisions. |
content_at ¶
content_at(ref: str) -> dict[str, JsonValue]
Return the committed record values at a revision, keyed by AT-URI.
Each value is the JSON-decoded record content folded over the
revision's commit ancestry, so a record removed by a tombstone at or
before ref is absent. This is the decoded counterpart to the raw
byte state behind :meth:diff, suitable for a field-level comparison
of a record across two revisions.
| PARAMETER | DESCRIPTION |
|---|---|
ref
|
The revision to read (ref expression).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict of str to JsonValue
|
The decoded record values at the revision, keyed by AT-URI. |
schema_diff ¶
schema_diff(
old: type[Model], new: type[Model]
) -> dict[str, JsonValue]
Compute a structural diff between two record-type schemas.
This wraps :func:didactic.api.diff, which compares two Model classes
(for example two generated record types across a Layers version bump),
not two revisions of the same record's values.
| PARAMETER | DESCRIPTION |
|---|---|
old
|
The base schema class.
TYPE:
|
new
|
The head schema class.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
The structural schema diff. |
Workspace ¶
Workspace(repository: Repository)
A record-type-aware grouping over a :class:Repository.
Indexes the AT-URIs in a repository by their collection NSID so that per-record-type listing and history are cheap, mirroring the way a corpus is a graph of many record types.
| PARAMETER | DESCRIPTION |
|---|---|
repository
|
The repository to index.
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
repository |
The wrapped repository.
TYPE:
|
by_nsid ¶
Group the working-tree AT-URIs by collection NSID.
| RETURNS | DESCRIPTION |
|---|---|
dict of str to list of str
|
A mapping from collection NSID to the sorted AT-URIs of that type. |
nsids ¶
Return the collection NSIDs present in the workspace.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The distinct collection NSIDs, sorted. |
uris_of ¶
Return the AT-URIs of records of a given collection NSID.
| PARAMETER | DESCRIPTION |
|---|---|
nsid
|
The collection NSID to select.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
The sorted AT-URIs of that record type. |
Arrow views¶
Derived, rebuildable columnar views with anchors flattened into typed
columns. These are never the source of truth and can be regenerated from
the record store with materialize.
lairs.store.arrow ¶
Arrow/Parquet materialized views over the record store.
Derived, rebuildable columnar views for ML-speed access, with anchors flattened
into typed columns. These views are never the source of truth: they are computed
from the record store and can always be regenerated with :func:materialize.
The flattening is driven by generic model field access (model_dump), so it
works against the abstract :class:didactic.api.Model interface today and
against the real generated record models once they land. Polymorphic anchors are
resolved into a fixed set of typed columns (anchor_kind, byte_start,
byte_end, token_id, token_index, token_indexes, t_start_ms,
t_end_ms, bbox_x, bbox_y, bbox_w, bbox_h, page,
ext_source) so a consumer can filter and scan without re-dispatching the
union per row.
Every one of the seven generated :class:~lairs.records._generated.defs.Anchor
variants (textSpan, tokenRef, tokenRefSequence, temporalSpan,
spatioTemporalAnchor, pageAnchor, externalTarget) projects to a
distinguishable anchor_kind so no variant collapses into an anchorless row.
The view set mirrors the appview's normalization: an expressions table (one
row per expression), an annotations table (one row per
(layer_uri, annotation_index) produced by exploding each layer's
annotations array), plus segmentations, media, and edges tables.
Only top-level scalar fields and the flattened anchor columns reach these views;
non-anchor nested arrays and objects (for example features or per-keyframe
data) are intentionally dropped by the flatten-to-typed-columns boundary (see
:func:_scalar_columns).
RecordLike ¶
Bases: Protocol
The minimal record shape the Arrow views consume.
Any :class:didactic.api.Model satisfies this protocol; the views never
depend on a concrete generated type, only on the ability to dump a record to
a JSON string. The JSON form is used (rather than the shallow model_dump)
so nested models, tuples, and union members all normalise to plain JSON
containers that the flattening can descend into uniformly.
flatten_anchor ¶
Flatten a polymorphic anchor mapping into typed columns.
Recognises every :class:~lairs.records._generated.defs.Anchor variant by
the fields it carries and projects each into the fixed :data:ANCHOR_COLUMNS,
leaving unrelated columns unset. Unknown or absent anchors yield an
all-None row with an anchor_kind of None, so the resulting column
set is uniform across rows regardless of which anchor variant each record
uses.
The seven recognised variants map to these anchor_kind values:
textSpan->"span"(byte_start,byte_end)tokenRef->"tokenRef"(token_id,token_index)tokenRefSequence->"tokenRefSequence"(token_id,token_indexes)temporalSpan->"temporalSpan"(t_start_ms,t_end_ms)spatioTemporalAnchor->"spatioTemporalAnchor"(t_start_ms,t_end_ms)pageAnchor->"pageAnchor"(page, nestedbyte_start/byte_endandbbox_*)externalTarget->"externalTarget"(ext_source)
| PARAMETER | DESCRIPTION |
|---|---|
anchor
|
A dumped anchor value, or
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
A mapping over :data: |
records_to_table ¶
records_to_table(records: Iterable[RecordLike]) -> Table
Flatten a sequence of records into an Arrow table.
Each record's scalar fields become columns, and any anchor field is
expanded into the typed :data:ANCHOR_COLUMNS. The column union across all
records is used, with missing values filled as None, so heterogeneous
records share one schema.
| PARAMETER | DESCRIPTION |
|---|---|
records
|
The records to flatten; anchors become typed columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Table
|
The flattened columnar view. |
expressions_table ¶
expressions_table(records: Iterable[RecordLike]) -> Table
Build the expressions view: one row per expression record.
| PARAMETER | DESCRIPTION |
|---|---|
records
|
The expression records.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Table
|
One row per expression, anchors flattened into typed columns. |
annotations_table ¶
annotations_table(
layers: Iterable[tuple[str, RecordLike]],
) -> Table
Build the annotations view by exploding each layer's annotations array.
Produces one row per (layer_uri, annotation_index), mirroring the
appview's PG normalization. Each annotation's scalar fields become columns,
its anchor is flattened into the typed columns, and layer_uri plus
annotation_index identify the source.
| PARAMETER | DESCRIPTION |
|---|---|
layers
|
Pairs of layer AT-URI and the layer record. The record is expected to
carry an
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Table
|
One row per exploded annotation. |
materialize ¶
materialize(
repo: Repository,
out_dir: Path,
*,
views: Mapping[str, Table] | None = None,
) -> list[Path]
Materialize named Arrow views into Parquet files.
The views are derived, rebuildable outputs and never the source of truth.
When views is omitted the repository's record store is read and grouped
by collection NSID, with each NSID written as its own Parquet file; callers
that have already built the normalized expressions / annotations /
segmentations / media / edges tables can pass them explicitly.
Each view is written as a single <name>.parquet file with pyarrow's
default write options. Compression, row-group sizing, and partitioning are
not parameterized; a caller needing those should write the returned tables
with :func:pyarrow.parquet.write_table directly.
| PARAMETER | DESCRIPTION |
|---|---|
repo
|
The repository whose record store is materialized when
TYPE:
|
out_dir
|
The output directory for the Parquet views; created if absent.
TYPE:
|
views
|
Pre-built named views to write. When
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list of pathlib.Path
|
The written Parquet files, in name order. |
Blob cache¶
A content-addressed on-disk cache of blob bytes, keyed by CID.
lairs.store.blobcache ¶
Content-addressed blob cache.
Caches blob bytes on disk under blobs/<cid>, populated lazily by the media
layer and shared across corpora. The cache is content-addressed: a blob's CID is
its file name, so identical content stored under the same CID is deduplicated and
put is idempotent. When the key is a decodable CID, :meth:BlobCache.put
verifies that it actually addresses the bytes (the multihash digest matches), so
a wrong CID can never poison the cache. A key that is not a CID (for example an
external URI used by the media layer to cache fetched bytes) is treated as an
opaque, trusted cache key and is stored without a digest check. Writes are atomic
so a crash mid-write cannot leave a truncated blob that :meth:BlobCache.exists
would report as present.
BlobCacheError ¶
Bases: ValueError
Raised when a blob operation violates the cache's content-addressing.
Carried by :meth:BlobCache.put when the key contains path separators that
would escape the blobs directory, or when a key that is a decodable CID has a
multihash digest that does not match the supplied bytes.
BlobCache ¶
A content-addressed on-disk cache of blob bytes.
| PARAMETER | DESCRIPTION |
|---|---|
root
|
The cache root directory. Blobs are stored under
TYPE:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
root |
The cache root directory.
TYPE:
|
path_for ¶
Return the on-disk path a blob would occupy for a CID.
The path is returned whether or not the blob is present, so the media layer can stream bytes straight to it.
| PARAMETER | DESCRIPTION |
|---|---|
cid
|
The content identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
The path |
| RAISES | DESCRIPTION |
|---|---|
BlobCacheError
|
If the CID is not a safe single path component. |
exists ¶
Return True if a blob for cid is cached.
| PARAMETER | DESCRIPTION |
|---|---|
cid
|
The content identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
Whether the blob is present on disk. |
get ¶
Return cached bytes for a CID, or None if absent.
| PARAMETER | DESCRIPTION |
|---|---|
cid
|
The content identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bytes or None
|
The cached bytes, or |
put ¶
Store bytes under a cache key.
Storing the same key again overwrites the existing file with identical
content, so put is idempotent for content-addressed input. When the
key is a decodable CID it is verified against the bytes (its multihash
digest must match) so a wrong CID cannot poison the cache; a non-CID key
(for example an external URI) is stored as an opaque trusted key. The
write is atomic (a temp file in the same directory is renamed onto the
final path) so a crash mid-write cannot leave a truncated blob.
| PARAMETER | DESCRIPTION |
|---|---|
cid
|
The cache key: a content identifier, or an opaque trusted key.
TYPE:
|
data
|
The bytes to cache.
TYPE:
|
verify
|
When
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Path
|
The path the bytes were written to. |
| RAISES | DESCRIPTION |
|---|---|
BlobCacheError
|
If the key is not a safe path component, or (when |