Code generation

The lairs._codegen package drives vendored lexicon JSON through panproto parsing, a schema-to-spec mapping, and module emission, writing one committed module per pub.layers.* namespace into lairs.records._generated. The generated modules are never hand-edited. Edit this pipeline instead. See generated models for the rationale.

Pipeline

The top-level driver. generate writes the modules and check powers the lairs gen --check drift gate.

lairs._codegen.pipeline

Top-level codegen pipeline.

Drives lexicon JSON through panproto parsing, the Schema-to-spec mapping, and module emission, writing one committed module per pub.layers.* namespace into :mod:lairs.records._generated. Cross-namespace embeds (for example annotation embedding defs#anchor) are resolved to imports so each module type-checks in isolation. The check entry powers the lairs gen --check drift gate.

generate

generate(lexicon_root: Path, out_root: Path) -> list[Path]

Generate model modules from a vendored lexicon tree.

PARAMETER DESCRIPTION
lexicon_root

The root of the vendored lexicon tree (the directory that contains the pub package and MANIFEST.toml).

TYPE: Path

out_root

The output directory for emitted modules, normally lairs/records/_generated.

TYPE: Path

RETURNS DESCRIPTION
list of pathlib.Path

The paths of the generated module files, in stable order.

check

check(lexicon_root: Path, out_root: Path) -> bool

Check whether committed modules match a fresh generation.

PARAMETER DESCRIPTION
lexicon_root

The root of the vendored lexicon tree.

TYPE: Path

out_root

The directory holding the committed generated modules.

TYPE: Path

RETURNS DESCRIPTION
bool

True if the committed *.py set is exactly the freshly generated set and every committed module (and the package __init__) is byte-identical to its fresh counterpart. An orphan committed module with no fresh counterpart (left behind after a lexicon namespace was removed upstream) counts as drift and returns False.

namespace_specs

namespace_specs(
    lexicon_root: Path,
) -> dict[str, list[ModelSpec]]

Return the codegen specs for each namespace, for inspection and tests.

PARAMETER DESCRIPTION
lexicon_root

The root of the vendored lexicon tree.

TYPE: Path

RETURNS DESCRIPTION
dict of str to list of lairs._codegen.schema_to_spec.ModelSpec

A mapping of namespace name to its specs.

Schema to spec

Fuses a parsed panproto Schema with its lexicon document into the codegen intermediate representation: ModelSpec, FieldSpec, and VariantSpec value models, one per record, object, or formal union.

lairs._codegen.schema_to_spec

Walk a panproto Schema into lairs codegen spec models.

The parsed Schema retains union discriminators, refined value types, the reference-versus-containment edge distinction, and integer ranges, all of which the lossy theory_of path would drop. The lexicon JSON document retains the field descriptions and the required set that panproto does not surface. This module fuses both sources into a sequence of :class:ModelSpec value models, one per record or object definition and one per formal union definition, which the emitter renders to committed Python module text.

Notes

Every spec type here is a dx.Model; the codegen intermediate representation is data, like everything else in lairs. The synthesised-model round-trip path (didactic.models_from_specs) is intentionally not used for emission because it discards descriptions, defaults, optionality, refined value types, and the reference-versus-embed distinction. The emitter renders the rich spec directly.

FieldSpec

Bases: Model

A single field of a generated model.

ATTRIBUTE DESCRIPTION
name

The lexicon property name, used verbatim as the python attribute name.

TYPE: str

type_kind

The resolved field shape, one of "str", "int", "bool", "datetime", "bytes", "blob", "embed", "union", or "array". There is no "unknown" kind: an unrecognised lexicon construct (cid-link, the empty type, or any future construct) maps to "str" so the generated record validates rather than failing.

TYPE: str

target

For "embed" and "union" kinds, the name of the referenced model or union class. None for scalar kinds.

TYPE: (str or None, optional)

item

For the "array" kind, the spec of the element type. None otherwise.

TYPE: (FieldSpec or None, optional)

required

Whether the lexicon lists this property in its required set.

TYPE: (bool, optional)

description

The lexicon description for the property, recorded as field metadata.

TYPE: (str or None, optional)

string_format

The lexicon format of a string field (for example "at-uri" or "did"), recorded as field metadata.

TYPE: (str or None, optional)

known_values

The lexicon knownValues of an open string enum, recorded as field metadata; never a hard enum.

TYPE: tuple of str, optional

minimum

The lexicon minimum of an integer field, recorded as field metadata.

TYPE: (int or None, optional)

maximum

The lexicon maximum of an integer field, recorded as field metadata.

TYPE: (int or None, optional)

min_length

The lexicon minLength of a string field, recorded as field metadata.

TYPE: (int or None, optional)

max_length

The lexicon maxLength of a string field, recorded as field metadata.

TYPE: (int or None, optional)

VariantSpec

Bases: Model

A single member of a formal union definition.

ATTRIBUTE DESCRIPTION
discriminator_value

The value the union discriminator takes for this variant, derived from the member reference shortname.

TYPE: str

class_name

The python class name of the variant model.

TYPE: str

target

The name of the embedded member model the variant wraps.

TYPE: str

ModelSpec

Bases: Model

A generated model or union, ready for emission.

ATTRIBUTE DESCRIPTION
name

The python class name (the capitalised lexicon definition shortname).

TYPE: str

nsid

The source lexicon namespace identifier (for example "pub.layers.defs").

TYPE: str

def_name

The lexicon definition shortname (for example "span" or "main").

TYPE: str

is_record

Whether the definition is a top-level record (def main) rather than a nested object.

TYPE: (bool, optional)

is_union

Whether the definition is a formal union rendered as a dx.TaggedUnion.

TYPE: (bool, optional)

discriminator

For unions, the discriminator field name.

TYPE: (str or None, optional)

description

The lexicon description of the definition, used as the class docstring summary.

TYPE: (str or None, optional)

fields

The model fields, for non-union models.

TYPE: tuple of lairs._codegen.schema_to_spec.FieldSpec, optional

variants

The union members, for unions.

TYPE: tuple of lairs._codegen.schema_to_spec.VariantSpec, optional

schema_to_specs

schema_to_specs(
    schema: Schema, document: dict[str, JsonValue]
) -> Sequence[ModelSpec]

Map a parsed Schema plus its lexicon document to codegen spec models.

PARAMETER DESCRIPTION
schema

A Schema parsed from a lexicon document under the atproto protocol.

TYPE: Schema

document

The raw lexicon JSON document the Schema was parsed from. It supplies the required sets and field descriptions that the Schema graph does not surface.

TYPE: dict

RETURNS DESCRIPTION
collections.abc.Sequence of lairs._codegen.schema_to_spec.ModelSpec

One spec per record, object, and formal union definition in the lexicon, with descriptions, optionality, refined types, integer ranges, knownValues, and union discriminators preserved. Method definitions (query, procedure, subscription) are skipped.

Emit

Renders the spec models into deterministic, committed module source text with a generated-by header and the source manifest hash.

lairs._codegen.emit

Emit Python module text for generated models.

Renders :class:~lairs._codegen.schema_to_spec.ModelSpec value models into committed module source text with a generated-by header and the source manifest hash. Emission is deterministic (stable class ordering and stable field ordering) so the lairs gen --check drift gate is meaningful. The emitted modules are the import surface of :mod:lairs.records; they are rich didactic models carrying descriptions, optionality, refined value types, integer ranges, knownValues, and union discriminators, which the lossy spec-synthesis path could not reconstruct.

emit_module

emit_module(
    specs: Sequence[ModelSpec], *, manifest_hash: str
) -> str

Render record and union specs to Python module source text.

PARAMETER DESCRIPTION
specs

The specs for one namespace, already ordered so embed targets precede the models that embed them.

TYPE: collections.abc.Sequence of lairs._codegen.schema_to_spec.ModelSpec

manifest_hash

The content hash of the source lexicon tree, recorded in the header so the committed module records the lexicon revision it was generated from.

TYPE: str

RETURNS DESCRIPTION
str

The module source text, with a generated-by header, the manifest hash, a module docstring, imports, the emitted classes, and an __all__.

Manifest

The vendoring manifest model and loader. The lexicon_tree_hash is stamped into every emitted module.

lairs._codegen.manifest

Vendoring manifest model and loader.

lairs/lexicons/MANIFEST.toml records the provenance of the vendored lexicon tree: the upstream Layers revision, the vendoring date, and a content hash of the tree. The runtime representation is a :class:Manifest didactic model; this module loads the TOML form into that model. The lexicon_tree_hash is stamped into every emitted module so a generated file records the lexicon revision it came from.

Manifest

Bases: Model

Provenance of the vendored lexicon tree.

ATTRIBUTE DESCRIPTION
layers_git_sha

The upstream Layers git revision the tree was vendored from.

TYPE: str

layers_version

The upstream Layers release version.

TYPE: str

vendored_at

The ISO date the tree was vendored.

TYPE: str

lexicon_tree_hash

A content hash of the vendored lexicon tree, stamped into emitted modules so a generated file records its source revision.

TYPE: str

lexicon_files

The number of vendored lexicon JSON files.

TYPE: int

record_types

The number of record definitions across the tree.

TYPE: int

load_manifest

load_manifest(path: Path) -> Manifest

Load a vendoring manifest from its TOML file.

PARAMETER DESCRIPTION
path

The path to MANIFEST.toml.

TYPE: Path

RETURNS DESCRIPTION
Manifest

The parsed manifest model.