Observations
Event-Driven Observation System
Section titled “Event-Driven Observation System”Naras form consensus about network state through distributed observation events rather than centralized broadcast.
Philosophy
Section titled “Philosophy”Instead of every nara broadcasting their complete view of all 5000 naras (750KB+ per broadcast), naras emit events when they observe changes:
- “I saw nara-x restart” (restart event)
- “I saw nara-y come online” (status-change event)
- “I’m seeing nara-z for the first time” (first-seen event)
These events spread organically through the network via:
- Real-time MQTT broadcasting (for immediate awareness)
- Background mesh syncing (for catching up)
- Boot recovery (for new/restarting naras)
Consensus Formation
Section titled “Consensus Formation”Each nara collects observation events from neighbors and forms their own opinion using trimmed mean:
- Collect all reported values from observers
- Remove statistical outliers (values >5× or <0.2× median)
- Average the remaining values (naturally weighted by how many observers agree)
- Single observer fallback: use their latest value if only one reporting
This is subjective consensus - each nara may have slightly different views, and that’s OK. It’s a hazy collective memory.
Event Lifecycle
Section titled “Event Lifecycle”Nara A observes nara B restart ↓A emits restart observation event ↓Event added to A's local ledger ↓Event broadcast via MQTT (real-time) ↓Neighbors C, D, E receive and store event ↓During boot recovery, new nara F syncs from C ↓F now knows about B's restart history ↓Background sync spreads to remaining naras ↓Eventually most naras know (eventual consistency)Event Types
Section titled “Event Types”restart
Section titled “restart”Detected a nara restarted (StartTime or restart count changed).
Fields:
start_time: Unix timestamp when nara startedrestart_num: Total restart countimportance: Critical (3) - never dropped
Used for:
- Consensus on restart counts
- Detecting unstable naras
- Triggering “high-restarts” teases
first-seen
Section titled “first-seen”First time this observer sees a particular nara.
Fields:
start_time: Best guess at StartTimeimportance: Critical (3) - never dropped
Used for:
- Bootstrap StartTime consensus
- Detecting new naras joining network
status-change
Section titled “status-change”Observer detected ONLINE ↔ OFFLINE ↔ MISSING transition.
Fields:
online_state: “ONLINE”, “OFFLINE”, or “MISSING”cluster_name: Current cluster (optional)importance: Normal (2) - may be filtered by very chill naras
Used for:
- Real-time awareness of network topology
- Triggering “comeback” teases
- Barrio/cluster tracking
Backfill Events
Section titled “Backfill Events”Events with is_backfill: true represent historical knowledge being converted to events during migration from newspapers to event-driven mode.
When created:
- On startup if no events exist yet for a known nara
- When seeing a nara for first time but already have newspaper data about them
Purpose:
- Grandfather existing network knowledge into event system
- Prevent “amnesia” when switching to event mode
- Allow smooth migration without data loss
Example:
lisa has been running since 2021 (StartTime: 1624066568, Restarts: 1137)
Nara boots with event mode enabled: - Checks ledger: no observation events about lisa - Has newspaper data: lisa StartTime=1624066568, Restarts=1137 - Creates backfill event: { Observer: "bart", Subject: "lisa", Type: "restart", IsBackfill: true, StartTime: 1624066568, RestartNum: 1137, Importance: Critical }Checkpoint Events
Section titled “Checkpoint Events”Checkpoint events are multi-party attested snapshots of historical state. They provide a more robust “historical anchor” than backfill events by requiring multiple naras to vote on and sign the data through MQTT-based consensus.
When created:
- Each nara proposes a checkpoint about itself every 24 hours
- Consensus is reached through a two-round MQTT voting process
Purpose:
- Permanent anchor for historical restart counts and uptime
- Multi-party attestation provides stronger guarantees than single-observer backfill
- Allows deriving restart count as:
checkpoint.Observation.Restarts + count(unique StartTimes after checkpoint)
Structure:
{ "service": "checkpoint", "checkpoint": { "subject": "lisa", "subject_id": "nara-id-hash", "as_of_time": 1704067200, "observation": { "restarts": 47, "total_uptime": 23456789, "start_time": 1624066568 }, "round": 1, "voter_ids": ["homer-id", "marge-id", "bart-id"], "signatures": ["sig1", "sig2", "sig3"] }}Key Concepts:
- NaraObservation: Pure state data embedded in checkpoint (restarts, total_uptime, start_time)
- Attestation: Signed claims about a nara’s state used in checkpoint voting
- Self-attestation: Proposer signs their own values (checkpoint proposal)
- Third-party attestation: Voters sign their view (checkpoint vote)
MQTT Checkpoint Consensus Flow:
- Nara proposes checkpoint about itself via
nara/checkpoint/propose - Other naras respond within 5-minute vote window via
nara/checkpoint/vote - Voters can APPROVE (sign proposer’s values) or REJECT (sign their own values)
- If majority agrees on same values → checkpoint finalized
- If no consensus → Round 2 with trimmed mean values
- Round 2 is final - if no consensus, try again in 24 hours
Minimum Requirements:
- At least 2 voters required (outside the proposer, so 3+ total signatures)
- Each signature is for specific values (verifiable)
Deriving Restart Count:
Total Restarts = checkpoint.Observation.Restarts + count(unique StartTimes after checkpoint.AsOfTime)
Priority order:1. Checkpoint (if exists) - strongest guarantee2. Backfill (if exists) - single observer historical data3. Count events - no historical baselineImportance Levels
Section titled “Importance Levels”Critical (3)
Section titled “Critical (3)”Never filtered by personality - essential for consensus. Never pruned.
Events: restart, first-seen, backfill, checkpoint
Normal (2)
Section titled “Normal (2)”May be filtered by very chill naras (>85).
Events: status-change
Casual (1)
Section titled “Casual (1)”Filtered based on personality preferences.
Events: teasing, routine social interactions
Anti-Abuse Mechanisms
Section titled “Anti-Abuse Mechanisms”Four layers protect against malicious or misconfigured naras:
1. Per-Pair Compaction
Section titled “1. Per-Pair Compaction”- Max 20 observation events per observer→subject pair
- Oldest events dropped when limit exceeded
- Prevents hyperactive observer from saturating storage
2. Time-Window Rate Limiting
Section titled “2. Time-Window Rate Limiting”- Max 10 events about same subject per 5-minute window
- Blocks burst flooding (e.g., restart event every second)
- Window slides forward automatically
3. Content-Based Deduplication
Section titled “3. Content-Based Deduplication”- Hash restart events by (subject, restart_num, start_time)
- Multiple observers reporting same restart = single stored event
- Prevents redundant storage
4. Importance-Aware Pruning
Section titled “4. Importance-Aware Pruning”- Global ledger pruning respects importance levels
- Critical events (restarts) survive longest
- Casual events pruned first under storage pressure
Combined result: Network remains functional with 1% malicious nodes at 5000 node scale.
Comparison: Newspapers vs Events
Section titled “Comparison: Newspapers vs Events”Old Approach (Newspapers)
Section titled “Old Approach (Newspapers)”How it worked:
- Broadcast entire Observations map every 5-55 seconds
- Contains one entry per known nara (StartTime, Restarts, LastRestart, etc.)
- At 5000 nodes: 750KB × 5000 nodes / 5s = 68MB/s - 1GB/s
Pros:
- Instant consistency
- Centralized state (easy to understand)
Cons:
- O(N²) traffic growth
- Doesn’t scale past ~100 nodes
- No protection against malicious broadcasts
New Approach (Events)
Section titled “New Approach (Events)”How it works:
- Emit events only when changes occur
- Background sync: 83 KB/s network-wide
- Eventual consistency via gossip
Pros:
- 99.99% traffic reduction
- Scales to 5000+ nodes
- Anti-abuse protection built-in
- Gradual propagation (resilient to spikes)
Cons:
- Eventual consistency (not instant)
- Requires background sync for memory strengthening
- More complex (events + consensus)
Migration Path
Section titled “Migration Path”Phase 1: Backfill (Complete)
Section titled “Phase 1: Backfill (Complete)”- On startup: backfill existing observations into events
- Consensus switches to events-primary, newspapers-fallback
- Newspapers stop broadcasting Observations map
- Newspaper frequency reduced (30-300s)
- Network supports both old (newspaper) and new (event) modes simultaneously
Phase 2: Checkpoints (Current)
Section titled “Phase 2: Checkpoints (Current)”- Naras create multi-signed checkpoint events via MQTT consensus
- Checkpoint takes precedence over backfill for restart counting
- Historical data anchored with stronger guarantees
- Each nara proposes checkpoint about itself every 24 hours
Phase 3: Event-Only (Future)
Section titled “Phase 3: Event-Only (Future)”- Remove Observations map from NaraStatus struct entirely
- Remove newspaper fallback from consensus
- Events are sole source of truth
- Restart count derived from checkpoint + unique StartTimes
Background Sync
Section titled “Background Sync”To help the collective memory stay strong, naras perform lightweight periodic syncing:
Schedule:
- Every ~30 minutes (±5min random jitter)
- Query 1-2 random online neighbors
Query focus:
{ "services": ["observation"], "since_time": "<24 hours ago>", "max_events": 100, "min_importance": 2}Why needed:
- Event persistence (critical events survive even if some naras drop them)
- Gradual propagation (events spread organically)
- Personality compensation (high-chill naras catch up)
- Network healing (partitioned nodes eventually converge)
Consensus Algorithm
Section titled “Consensus Algorithm”Uses trimmed mean to calculate robust consensus from observation events:
- Query events: Get all observation events about subject from SyncLedger
- Collect values: Extract reported values (StartTime, Restarts, etc.) from each observer
- Filter outliers: Remove values outside [median×0.2, median×5.0] range
- Calculate average: Average remaining values (naturally weighted by frequency)
- If 5 observers report “270” and 2 report “271”, the average weights toward 270
- Single-observer fallback: Use max value if only one observer reporting
- Log quality metrics:
- Agreement % - how many observers agree on most common value
- Std deviation - spread of opinions (low = tight consensus, high = noisy)
- Outliers removed - which values were filtered as suspicious
Result: Distributed consensus on StartTime, Restarts, LastRestart without centralized broadcast.
Key insight: Large discrepancies indicate bugs (false MISSING detections, backfill failures) rather than natural variance.
Code Locations
Section titled “Code Locations”sync.go- ObservationEventPayload struct, anti-abuse logicobservations.go- Event emission, consensus formation, NaraObservation structattestation.go- Attestation type for signed claims (checkpoint voting)checkpoint_types.go- CheckpointEventPayload structcheckpoint_service.go- Checkpoint consensus and MQTT votingnetwork.go- Background sync, backfill mechanismnara.go- NaraStatus (will remove Observations map in v1.0.0)
Example Flow
Section titled “Example Flow”T=0: lisa restarts (actual restart #1137)
T=1: bart observes via newspaper - Creates restart event: {Observer: bart, Subject: lisa, RestartNum: 1137} - Adds to local ledger - Broadcasts via MQTT
T=2: nelly receives event - Adds to local ledger (unless filtered by personality)
T=5: jojo boots (new node) - Requests sync from bart - Gets restart event in response - Now knows lisa has 1137 restarts
T=30min: alice does background sync - Queries random neighbor (nelly) - Gets restart event about lisa - Fills in gap from when she wasn't listening
T=3h: Consensus runs on all nodes - Each reads observation events about lisa - Weighted clustering: most agree on 1137 restarts - Consensus: lisa.Restarts = 1137