Generated models¶
No pub.layers.* model in lairs is written by hand. Every record type,
every nested object, and the one formal union are generated from the
vendored Layers lexicons and committed to the repository. This page
explains why that constraint exists, the path a lexicon takes to become
a dx.Model, why a deliberately lossy shortcut is avoided, and the gate
that keeps the committed models honest.
Why generated, never authored¶
The Layers lexicons are the single source of truth for the schema. There is no second description of it anywhere in lairs. A hand-written model would be a second description, and the moment one exists it can drift from the lexicon it claims to mirror, silently, in either direction. Generation removes the possibility: the committed models are a pure function of the vendored lexicons, and updating to a new Layers version is a mechanical sequence rather than a model-by-model edit. Re-vendor the lexicons, regenerate, run the drift check.
This is a hard rule, not a preference. Behavior over the generated
models (builders, view helpers, anchor dispatch) is ordinary code and
lives outside the generated tree. Anything that mirrors the schema is
generated. Anything that is behavior over the schema is not. The
anchor_kind helper and the explode_layer helper in lairs.records
are behavior. The Anchor and AnnotationLayer classes are generated.
The path from lexicon to model¶
A lexicon document becomes committed Python in four stages.
lexicon JSON
-> panproto.parse_atproto_lexicon(doc) parse to a Schema
-> Schema + document -> spec models lairs._codegen.schema_to_spec
-> emitted module text lairs._codegen.emit
-> lairs/records/_generated/<ns>.py committed, ruff-canonicalized
panproto parses each lexicon into a Schema under its built-in
atproto protocol. The Schema is the parsed, structured form: it
retains the union discriminators, the refined value types, the
reference-versus-containment edge distinction, and the integer ranges.
lairs then walks the lexicon into a sequence of spec models: the
FieldSpec, VariantSpec, and ModelSpec value types, which are
themselves dx.Models, because the codegen intermediate representation
is data like everything else in lairs. One spec is produced per record,
per nested object, and per formal union. The spec carries the
description, the optionality (whether a property is in the lexicon's
required set), the refined type, the integer range, the knownValues
of an open string enum, and, for a union, its discriminator and members.
An important detail of the actual implementation: the spec mapping reads
its structure from the lexicon document, not from the parsed Schema.
The document retains the required sets and the field descriptions that
the Schema graph does not surface, and it preserves definition order. The
Schema is parsed and accepted (which asserts that the document parses
cleanly under the atproto protocol) but the field-by-field walk is
driven by the JSON. The two sources are complementary: the parse is the
correctness check, the document is the data.
The emitter renders each spec to module text, the pipeline injects the
cross-namespace imports a module needs (for example annotation
embedding defs#anchor), and a ruff format then ruff check --fix
then ruff format pass converges the output to a stable, lint-clean
form. That stability is what lets a fresh generation be compared
byte-for-byte against the committed modules.
Why not the lossy theory path¶
panproto can also induce a categorical theory from a Schema, and didactic can synthesize models from a theory. That route is shorter, and it is not used. The induced theory is lossy by design: it cannot express refined value types, per-field defaults and descriptions, or the reference-versus-containment distinction, and it drops union discriminators, so a model rebuilt from a theory cannot reconstruct a tagged union. didactic's own spec-dict synthesizer is closer but still discards descriptions, defaults, optionality, refined types, and the embed-versus-ref distinction.
Because lairs needs every one of those properties in the committed output, it does not route through either. It walks the rich lexicon into rich spec models and the emitter renders them directly. The substantive codegen work is exactly this mapping. The theory path is fine for a quick structural check but is not the generation path.
The one place this matters most is the union. The Layers lexicons
contain a single formal union: the selector of defs#externalTarget,
over the three W3C selector types. It generates a dx.TaggedUnion
(ExternalTargetSelector) with a kind discriminator and one member
class per reference. Had codegen gone through the theory, the
discriminator would have been lost and the union could not have been
rebuilt. A focused codegen test asserts that the lexicon union
round-trips to a tagged union with its discriminator intact.
Note what is not a tagged union. The polymorphic anchor and the
universal objectRef are lexicon objects with several optional fields,
where a consumer dispatches on which fields are populated, not formal
unions over refs. They generate as ordinary dx.Models with optional
fields, faithfully to the lexicon. The
anchors-and-modality page explains why the
lexicons model anchors this way and how lairs dispatches on them.
The drift gate¶
The generated modules are committed, not generated at install time. This buys import speed, IDE and type-checker support, and reviewable diffs when a Layers version is bumped. It also creates the obligation that the committed modules stay faithful to the vendored lexicons.
The drift gate discharges it. lairs gen --check regenerates the modules
into a temporary directory off the vendored lexicons and compares them
byte-for-byte against the committed ones. Any difference fails. Each
generated module carries a header recording the lexicon-tree hash it was
produced from, and the same hash lives in the manifest, so a stale
generation is visible at a glance and caught in CI. The canonicalization
pass exists precisely so this comparison is byte-exact rather than
merely semantically equivalent.
For the operational steps (vendoring a lexicon tree, regenerating, and running the check) see the codegen guide.