Authoring and publishing

This chapter rebuilds the running example from scratch: a part-of-speech layer over an expression, anchored by byte span. You stage the records into a local store, commit them as a snapshot, and compute the plan to publish that snapshot to a PDS, without sending anything.

Two facts govern the write path. Writes target only the authenticated user's own repository, so lairs never writes to another account's records. A dry-run publish returns the full plan for inspection before any write leaves the machine.

Anchors

An anchor says how an annotation attaches to the source data. The builders in lairs.author.builders construct the correct anchor sub-model and validate their arguments against the lexicon constraints at construction time, raising BuildError rather than deferring to a PDS rejection.

For text, the byte-span builder takes UTF-8 byte offsets:

from lairs.author.builders import span

anchor = span(0, 3)
anchor.textSpan.byteStart    # 0
anchor.textSpan.byteEnd      # 3

The argument order and ranges are checked: a negative offset, or an end before the start, raises BuildError. The other builders follow the same pattern: token_ref for a token reference, temporal for a millisecond time span, and bbox / keyframe / spatio_temporal for spatial and spatio-temporal anchors.

An annotation layer

LayerBuilder assembles an annotation layer over one expression. It takes the expression's AT-URI, the layer kind, and a creation timestamp. add appends an annotation, minting a UUID for each one that lacks it, and build finalizes the layer:

from datetime import datetime, timezone

from lairs.author.builders import LayerBuilder, span

expr_uri = "at://did:plc:zf3l5xq2example/pub.layers.expression.expression/abc123"

builder = LayerBuilder(
    expr_uri,
    "token-tag",
    datetime(2026, 1, 1, tzinfo=timezone.utc),
    subkind="pos",
)
builder.add(anchor=span(0, 3), label="DET", token_index=0)
builder.add(anchor=span(4, 7), label="NOUN", token_index=1)
layer = builder.build()

layer.kind                      # 'token-tag'
len(layer.annotations)          # 2
layer.annotations[0].label      # 'DET'

kind and subkind are validated against the generated model's open vocabulary. The vocabulary is open, so a community value outside the published set is accepted. Only an empty string is rejected. A layer must hold at least one annotation, so calling build with none raises BuildError.

Staging into a store

The store is a Repository: a content-addressed, git-like store where a corpus snapshot is a commit. Initialize one on disk, stage each record under its AT-URI with save, and commit:

import json
from pathlib import Path

from lairs.records._generated.expression import Expression
from lairs.store.repository import Repository

expression = Expression.model_validate_json(
    json.dumps(
        {
            "id": "doc-0001",
            "kind": "sentence",
            "createdAt": "2026-01-01T00:00:00Z",
            "text": "The cat sat on the mat.",
        },
    ),
)

layer_uri = "at://did:plc:zf3l5xq2example/pub.layers.annotation.annotationLayer/lay123"

repo = Repository.init(Path("store"))
repo.save(expr_uri, expression)
repo.save(layer_uri, layer)
revision = repo.commit("author a cats corpus")

repo.staged_uris()
# ['at://.../pub.layers.annotation.annotationLayer/lay123',
#  'at://.../pub.layers.expression.expression/abc123']

commit returns a revision identifier. That revision pins the exact record values, so it is reproducible: you can read them back, tag the revision as a named dataset version with repo.tag(...), or diff two revisions.

Note that the expression record is constructed through model_validate_json. The generated models coerce formatted scalars such as the createdAt datetime from their JSON string form on that path, which the keyword constructor does not do.

Planning the publish

publish maps a local revision to the minimal set of writes that would make a PDS match it, by diffing the revision against what is already on the PDS by AT-URI and content. With dry_run=True it computes and returns that plan and sends nothing. With no endpoint, the PDS is treated as empty, so every record in the revision becomes a create:

from lairs.author.publish import publish

plan = publish(
    repo,
    revision,
    to="did:plc:zf3l5xq2example",
    dry_run=True,
)

plan.repo                   # 'did:plc:zf3l5xq2example'
plan.revision == revision   # True
plan.is_empty()             # False
len(plan.creates)           # 2
len(plan.updates)           # 0
len(plan.deletes)           # 0

The to argument is the target repository DID: the one authenticated repository the writes would target. The plan separates creates, updates, and deletes, and orders the whole write set so a referenced record always commits before its referrer. Inspect that order with ordered_writes:

for op in plan.ordered_writes():
    print(op.action, op.collection, op.rkey)
# create pub.layers.expression.expression abc123
# create pub.layers.annotation.annotationLayer lay123

The expression is created before the annotation layer that references it, because the publisher ranks collections by cross-reference dependency. Each operation carries the collection, the record key, the target AT-URI, and the record value, so the dry-run plan is exactly what a live publish would send.

From plan to live publish

A live publish drops dry_run and supplies the PDS endpoint and an authenticated httpx client carrying the session's write scopes. lairs does not implement OAuth: the authenticated client is injected, and every write is scoped to the one repository named by to. With an endpoint set, the plan is diffed against the live PDS first, so a re-publish of unchanged records is a no-op and a re-publish of changed records upserts on a deterministic record key rather than duplicating. That path is covered in the authoring and publishing guide. For the tutorial, the dry run is the safe stopping point.

What you have

You built anchors and an annotation layer with the authoring builders, staged the records into a committed store snapshot, and computed a dependency-ordered publish plan with a dry run that sent nothing. That closes the loop: read a corpus, materialize its views, author new records, and plan their publication.

From here, the Guides cover each subsystem in depth, and the API reference gives the per-symbol detail.