Skip to main content

Definitions

Definitions are the core of defacto. They describe your entities as state machines: what states they can be in, what events cause transitions, what properties to track, and how to resolve identity from raw event data.

Definitions are written in YAML and organized into three types: entities, sources, and schemas.

Entities and states

An entity definition starts with a name, a starting state, and a set of states. Each state can have handlers that fire when specific events arrive.

customer:
starts: lead

states:
lead:
when:
signup:
effects:
- create
- { transition: { to: active } }

active:
when:
cancel:
effects:
- { transition: { to: churned } }

churned: {}

The starts field determines the initial state for new entities. When the first event for a new customer arrives and the signup handler fires, defacto creates the entity in lead and immediately transitions it to active.

A state with no handlers (like churned above) is terminal. Entities in that state won't respond to any events unless you add an always handler (covered below).

Effects

Effects are the actions that modify entity state when a handler fires. There are five kinds.

create

Initializes a new entity. This is idempotent, so it's safe to include in handlers that might fire more than once.

effects:
- create

transition

Moves the entity to a different state.

effects:
- { transition: { to: active } }

set

Assigns a value to a property. There are three ways to specify the value:

# From an event field
- { set: { property: email, from: event.email } }

# From a literal value
- { set: { property: status, value: "pending" } }

# From a computed expression
- { set: { property: annual_revenue, compute: "entity.mrr * 12" } }

You can also add a per-effect condition that must be true for the set to apply:

- { set: { property: high_score, from: event.score, condition: "event.score > entity.high_score" } }

increment

Adds to a numeric property. Defaults to incrementing by 1.

- { increment: { property: login_count } }
- { increment: { property: balance, by: -50 } }

relate

Creates a relationship to another entity. The target entity is identified using the same hint mechanism as identity resolution.

- { relate: { type: placed_by, target: customer, hints: { customer: [email] } } }

Effects are applied in order within a handler. Each effect sees the state changes from previous effects, so a set that runs after a transition will see the new state.

Properties

Properties are typed attributes on an entity. They persist across state transitions and become columns in the state history table.

properties:
email: { type: string }
mrr: { type: number, default: 0 }
plan: { type: string, allowed: [free, pro, enterprise] }
signup_date: { type: datetime }
annual_revenue: { type: number, compute: "entity.mrr * 12" }

Supported types are string, number, integer, boolean, and datetime.

OptionWhat it does
defaultValue used when the entity is created
allowedRestricts the property to specific values
min, maxNumeric bounds (validation)
computeExpression that's recalculated after every state change
sensitiveSensitivity label: pii, phi, or pci
treatmentHow to handle sensitive data on redact: hash, mask, or redact

Computed properties are useful for derived values like annual_revenue above. The expression is re-evaluated whenever the entity's state changes, so computed properties always reflect the current state.

Guards

Guards are boolean expressions that determine whether a handler's effects should apply. If the guard evaluates to false, the handler is skipped entirely.

active:
when:
upgrade:
guard: "event.plan != entity.plan"
effects:
- { set: { property: plan, from: event.plan } }

Guards can reference event data with event.* and current entity state with entity.*. They support the full expression language, including functions:

guard: "event.amount > 0 and days_since(entity.last_purchase) < 365"

Always handlers

Handlers in the always section fire regardless of which state the entity is in. They're useful for events that should be handled the same way everywhere.

customer:
starts: lead
states:
lead: { ... }
active: { ... }
churned: { ... }

always:
admin_reset:
effects:
- { transition: { to: lead } }

Both the state-specific handler and the always handler will fire if both match the incoming event type. State effects apply first, then always effects.

Identity

The identity section declares which fields identify an entity. When defacto processes an event, it extracts these fields as hints and uses them to find or create the entity.

identity:
email:
match: exact
normalize: "str::to_lowercase(value)"
phone:
match: exact
OptionWhat it does
matchMatching strategy: exact or case_insensitive
normalizeExpression applied to hint values before matching

The normalize expression is useful for canonicalizing values. In the example above, all email comparisons happen in lowercase, so Alice@Example.com and alice@example.com resolve to the same entity.

An entity can have multiple identity fields. If an event carries two hints that resolve to different entities, defacto detects this as a merge and combines them automatically. See how it works for details.

Time rules

Time rules fire based on elapsed time rather than events. They're defined in the after section of a state.

active:
when:
upgrade: { ... }
after:
- type: inactivity
threshold: 90d
effects:
- { transition: { to: churned } }

Three types are available:

TypeFires when
inactivityNo events received for the specified duration
expirationThe specified duration has elapsed since entity creation
state_durationThe entity has been in its current state for the specified duration

Threshold values use duration syntax: 30d (days), 24h (hours), 90m (minutes), 45s (seconds).

Time rules are evaluated during event processing and when you call tick(). They produce effects just like event handlers.

Relationships

Relationships connect entities to each other. They're declared on the entity definition and created via the relate effect.

customer:
relationships:
- type: placed
target: order
cardinality: has_many

order:
relationships:
- type: placed_by
target: customer
cardinality: belongs_to

Both sides of a relationship must be declared. Valid cardinality values are has_many, has_one, belongs_to, and many_to_many.

Relationships can have their own typed properties:

relationships:
- type: placed
target: order
cardinality: has_many
properties:
total: { type: number }

Relationship history is tracked with the same SCD Type 2 pattern as entity state.

Sources

Source definitions tell defacto how to read raw events from external systems. Each source declares the event type field, timestamp field, and per-event-type field mappings.

billing:
event_type: event
timestamp: created_at
event_id: [transaction_id]

events:
order_created:
raw_type: "order.created"
mappings:
order_id: { from: id }
email: { from: customer_email }
total: { from: amount, type: number }
item_count: { from: line_item_count, type: integer }
hints:
order: [order_id]
customer: [email]

The raw_type field lets you map the external system's event names to your internal names. In this example, the billing system sends events with event: "order.created", which defacto maps to the order_created handler.

Field mapping options:

OptionWhat it does
fromSource field name (supports dot-path: user.address.city)
computeExpression for derived fields
typeType coercion: string, number, integer, boolean, datetime
defaultFallback value if the field is missing

The from and compute options are mutually exclusive.

The event_id field controls deduplication. By default, defacto hashes the event type and all mapped data fields. If you specify event_id: [transaction_id], only the event type and the transaction_id field are hashed, which is useful when you have a natural unique identifier.

The hints section connects events to entities. customer: [email] means "use the email field from this event to identify which customer it belongs to." A single event can carry hints for multiple entity types, which is how cross-entity relationships are established.

Schemas

Schemas define the contract between sources and entities. They specify what a normalized event must look like after a source handler processes it: which fields must be present, what types they should be, and what values are valid.

This matters because the three definition types form a pipeline:

  1. Sources take raw events and produce normalized events through field mappings
  2. Schemas define what those normalized events must contain
  3. Entities consume normalized events through state machine handlers

When you register definitions, defacto validates this chain. If a source's field mappings don't produce the fields that the schema requires, or produce fields that aren't in the schema, you get a validation error before anything runs. At runtime, if the data types don't match what the engine expects, the event fails normalization.

order_created:
fields:
order_id:
type: string
required: true
total:
type: number
required: true
min: 0
email:
type: string
required: true

This schema says that any order_created event must have an order_id string, a total number that's at least 0, and an email string. The source mappings for order_created must produce all three, and the entity handler can rely on them being present and correctly typed.

Field options:

OptionWhat it does
typeExpected type: string, number, integer, boolean, datetime
requiredWhether the field must be present and non-null
allowedList of permitted values
min, maxNumeric bounds
min_lengthMinimum string length
regexPattern the string must match

Definition versioning

Defacto supports multiple versions of definitions. When you change your definitions and activate a new version, defacto detects what changed and picks the appropriate build mode automatically.

d = Defacto("definitions/")
d.ingest("app", events, process=True)

# Later, register updated definitions
d.definitions.register("v2", updated_definitions)
d.definitions.activate("v2") # auto-detects build mode and rebuilds

For incremental changes, you can use the draft API to modify definitions one operation at a time, with validation after each step:

draft = d.definitions.draft("v2", based_on="v1")
draft.add_property("customer", "lifetime_value", {"type": "number", "default": 0})
draft.validate() # check for errors
draft.diff() # see what changed vs v1
draft.impact() # predicted build mode
draft.register() # commit to database

Validation

Defacto validates definitions at registration time and logs warnings at startup. Validation catches structural errors (missing states, invalid transitions, type mismatches) and produces warnings for potential issues.

Warnings include:

  • Dead-end states: states with no outgoing transitions (intentional terminal states are fine, but worth flagging)
  • Unused event types: events defined in sources that no state handles
  • Write-never-read properties: properties set by effects but never referenced in guards or computed expressions

For CI/CD pipelines, you can validate definitions without a database:

from defacto import validate_definitions

result = validate_definitions(definitions_dict)
if not result.valid:
for error in result.errors:
print(error)