Version: 1.0
choir is an LLM agent orchestration system targeting
personalized agents while providing infrastructure-grade architecture.
It follows a philosophy of deterministic outer control plane +
stochastic inner cognition: the LLM proposes, the control plane
decides.
See repo here: choir.
Table of Contents
- 1. Deployment Targets
- 2. Scope and Non-Goals
- 3. Core Design Principles
- 4. Design Invariants
- 5. Architecture Overview
- 6. System Components
- 7. Agent Execution Model
- 8. Tool System
- 9. Resource Lock Model
- 10. Skill System
- 11. Memory Architecture
- 12. Communication Protocol
- 13. Crash Recovery and Replication
- 14. Secrets and Credentials
- 15. Container Architecture
- 16. Dynamic Configuration and Secret Apply
- 17. User Commands
- 18. Networking and Web Access
- 19. Gateway (Telegram)
- 20. Self-Evolution Workflow
- 21. Inference Provider
- 22. Observability and Audit
- 23. Choird Data Model (Postgres)
- 24. Implementation Phases
1. Deployment Targets
- Single Linux host. No distributed consensus, no multi-node coordination.
- Orchestrator-agent communication via Unix domain sockets (UDS) for local deployment, with TCP/HTTP as a built-in second transport for future EKS-style deployments.
- OpenAI-compatible chat completions API for text generation (via OpenRouter or any compatible endpoint); ElevenLabs API for text-to-speech (configurable voice profiles). LLM models must support structured tool calling.
- Postgres + pgvector for long-term memory and archival storage.
2. Scope and Non-Goals
2.1 In Scope (v1)
- Single Linux host deployment.
- One control plane daemon on host (
choird). - One user control client (
choirctl). - One unified runtime daemon in container
(
choir-agent). - Two execution lanes inside the same logical agent:
- Edge lane (fast, user-facing).
- Core lane (deeper async reasoning).
- Skill-based multi-phase orchestration with per-state tool allowlists.
- Tool-mediated workspace and side-effect access.
- Tool locking with
FREE | S | Xstates and atomic lockset semantics. - Host-owned durable state and crash recovery.
- Postgres + pgvector memory subsystem integrated into
choird. - Manual approval path for capability-changing actions.
- Dual transport support: UDS (primary) and TCP/HTTP (scaffolding for future multi-node / EKS-style deployments).
- Gateway: Telegram via multiple bot instances. Each agent is bound to a named DM. Admin DMs have full choirctl-equivalent command access; regular DMs can only affect their bound agent.
- Git-managed source repos: tools, skills, identity files
(
USER.md,SOUL.md,SOUL-CORE.md), and agent Dockerfiles are versioned in git repositories (global + per-agent). Each agent receives a git identity.
2.2 Out of Scope (v1)
- Multi-host orchestration (but transport layer scaffolds for it).
- Kubernetes/EKS-native deployment (but TCP/HTTP transport is implemented to ease future migration).
- Multi-tenant isolation model.
- Hot mutation of structural runtime identity (
/choircontents, tool binaries, skill definitions). - Unlimited multi-agent graph orchestration (fixed to edge/core dual-lane model).
- Multi-channel gateway (Slack, Discord, web UI, etc.). Telegram only in v1.
3. Core Design Principles
- LLMs are proposal engines, not authorities.
- All world mutations happen through tools.
choirdis the only host authority.- Container runtime is disposable; host state is authoritative.
- Structural mutations require approval + rebuild + redeploy.
- Dynamic reload only applies to data that is safe to reload.
4. Design Invariants
- Core lifetime is a strict subset of Edge lifetime – mechanically enforced, not advisory. Each core job’s lifetime is a subset of the edge session.
- Mutual exclusion on each resource – multiple core jobs may hold exclusive locks on different resources concurrently. The same resource cannot be exclusively locked by more than one owner.
- Injection is append-only – never mutates history, never resets budgets.
- True log is immutable and authoritative – the
arbiter’s in-memory log is the runtime authority; Postgres
session_eventsis the durable replica. Compacted memory is a cache/view. - No hidden recursion – explicit states, explicit transitions, no graph DSL.
- Core never sends raw chain-of-thought – only workflow summaries.
- Agent identity is unified – Edge and Core are the same agent to the user.
- Container is disposable – restart is cheap, workspace is non-authoritative, secrets are ephemeral.
- LLM proposes, control plane decides – all side effects serialized through the arbiter.
- Keep it boring – single host, single DB, minimal components, strong outer boundary.
- At most one active instance per agent – choird
rejects
agent startif the agent is already running. No concurrent sessions for the same agent configuration.
5. Architecture Overview
User(s)
| (multiple Telegram bots, multiple DMs)
v
choird (host)
|-- gateway (multi-bot Telegram, DM routing, admin/regular permissions)
|-- control plane (lifecycle, policy, approvals)
|-- memory module -> Postgres(+pgvector, per-agent schema)
|-- embedding client -> OpenRouter embedding API
|-- browser worker (Playwright, host-side, per-agent contexts)
|-- search client (Brave API)
|-- log manager -> .choir.d/logs/
|
|-- [transport: UDS ~/.choir.d/socks/choird.sock OR HTTP :9400]
|
v
Docker container
|-- choir-agent (single Go binary, single process)
|-- Edge lane (goroutine, fast model, user-facing)
|-- Core lane (goroutine, flagship model, deep reasoning)
|-- Arbiter (serializes all committed side effects)
|-- Lock manager
|-- Secret store (in-memory only)
|-- Tool executor
|-- choird RPC client (UDS or HTTP, selected at startup)
|
|-- /choir (read-only, image-baked identity + tools)
|-- /workspace (writable, bind-mounted persistent directory)
| +-- .choirtmp/send/ (agent -> choird file staging)
| +-- .choirtmp/recv/ (choird -> agent file staging)
5.1 Trust Boundaries
Three real trust boundaries:
- Host <-> Container (strong): namespaced root,
no
--privileged, no Docker socket, no host PID namespace. - Choird <-> Agent RPC (policy): lease-scoped authentication, rate limits, payload size caps.
- Tool lock system (serialization): resource-level mutual exclusion.
Core security invariant: even a malicious, root-level container cannot cause host-level side effects without explicit human approval.
6. System Components
Four components:
| Component | Location | Role |
|---|---|---|
| choird | Host daemon | Control plane: Telegram gateway (multi-bot, multi-DM routing), container lifecycle, tool/skill registry authority, policy enforcement, session leasing, approval pipeline, memory/embeddings module, Postgres connection pooling. |
| choirctl | Host CLI | Stateless admin interface to choird. Approve proposals, manage sessions, inspect logs, add tools/skills, control policies. |
| choir-agent | Docker container (single Go binary) | Unified cognition + syscall runtime. Manages state machines, edge/core lanes, skills, LLM requests, tool execution, lock manager, secret store, choird RPC client. |
.choir.d/ |
Host filesystem | Choird state directory. Contains config.json (static
resource config), secrets.json (authoritative secrets),
local clone cache (repos/), logs, and user-local sockets
(socks/). |
6.1 Why a Single Binary in the Container
The agent loop lifecycle is fully contained within the member lifecycle: starts after init, ends before shutdown. The container boundary is the sandbox; internal process-level separation adds complexity without meaningful security gain. Root + bash are allowed inside the container. This is a single-tenant disposable cognitive appliance.
Internal Go structure (logical separation preserved):
type Agent struct {
arbiter *Arbiter // serializes all committed side effects
llm *LLMEngine
skills *SkillEngine
tools *ToolExecutor
locks *LockManager
secrets *SecretStore
rpc *ChoirdClient
}6.2 choird Internal Module Structure
choird/
control_plane/
agent_lifecycle/
memory/
embeddings/
approvals/
gateway/
Memory is folded into choird (not a separate service) because on a single host with a single operator, an extra service adds complexity without adding meaningful isolation. Memory is an extension of control-plane state – it interacts with agent lifecycle, heartbeat, secrets, workspace identity, and approval flow.
choird startup sequence (verbose logging at each
step): 1. Read and validate .choir.d/config.json. 2. Read
and validate .choir.d/secrets.json. 3. Connect to Postgres
using credentials resolved from the configured secret reference. 4.
Initialize per-agent schemas and roles if they do not exist. 5.
Initialize connection pool. 6. Detect orphaned Docker containers from a
previous run (by label choir.managed=true). Remove them. 7.
Clean up stale leases in Postgres (sessions marked active but whose
containers no longer exist). Mark as crashed, release resource leases.
8. Start all gateway bot instances (connect to Telegram Bot API). 9.
Begin accepting choirctl and gateway commands. 10. Log
“choird ready” with config version and summary of loaded resources.
First-time setup (choirctl init): 1.
Creates .choir.d/ directory structure. 2. Generates a
skeleton config.json with placeholder values. 3. Generates
a skeleton secrets.json (authoritative secret store, mode
0600). 4. Creates user-local runtime directories (including
.choir.d/socks/). 5. Prints instructions: configure
Postgres, create the database, set secrets, configure at least one
gateway bot, one DM, and one agent. 6. Does NOT create Postgres schemas
(the user must ensure the database exists; choird creates per-agent
schemas on first startup).
Database migrations are deferred in v1. The user manages schema
changes manually. choirctl should support migration
commands in a future version.
6.3 .choir.d/ Configuration Directory
.choir.d/
config.json # static resource configuration (see below)
secrets.json # authoritative secret values (strict JSON, mode 0600)
repos/ # choird-managed local clone cache
global/ # clone of global repo
agents/
<agent-id>/ # clone of per-agent repo
socks/ # user-local unix sockets
choird.sock # default choird UDS socket
logs/ # structured log files
choird.log # active choird log
<agent-id>.log # active per-agent log
archive/ # compressed archived logs (preserving structure)
Tools, skills, identity files, and Dockerfiles are not stored as
loose files in .choir.d/. They live in git repositories,
whose local clones are cached under .choir.d/repos/ (see
section 6.5). This keeps all choir host-side configuration and cached
state in one directory.
config.json configures all choird-managed resources: –
Global repo: URL and ref for the shared tools/skills/identity repo. –
Agent definitions: agent ID, per-agent repo URL/ref, resource defaults
(references to named workspaces, models, git identities, etc.). –
Workspaces: named workspace definitions with explicit host paths. –
Models: named LLM model/provider pairs with endpoints, request
templates, default parameters (temperature,
reasoning_effort), and secret references. LLM models must
support OpenAI-compatible tool calling (see section 21). – TTS
providers: named text-to-speech provider configurations with endpoint,
model ID, and secret reference. – Voice profiles: named voice
configurations with voice ID, output format, and voice settings
(stability, similarity, style, speed). – Git identities: named
identities with name, email, and secret reference. – Notion
integrations: named, with secret reference. – Email accounts: named,
with SMTP/IMAP host and port, sharing mode, and secret reference. –
Search providers: named, with secret reference. – Embedding model:
provider, model name, dimensions, secret reference. – Gateways: named
Telegram bot instances with secret references. – DMs: named DM bindings
(gateway + user ID, admin flag). The set of configured DMs for a bot
forms that bot’s allowlist. – Postgres: connection host, port, admin
credentials secret reference. – Logging: archive thresholds, log
retention settings. – Tunable defaults: budgets, timeouts, heartbeat
interval, feature flags.
Secret values are never stored in config.json. Each
resource that needs a secret contains a "secret" field
referencing a named secret managed via choirctl secret set
/ choirctl secret delete, then synchronized to running
choird/agents via choirctl secret apply.
config.json is the source of truth for static resource
configuration. secrets.json is the source of truth for
secret values (v1). choird reads both at startup. Config changes are
applied via the two-phase choirctl config load /
choirctl config apply workflow (see section 17.6); secret
changes are applied via choirctl secret apply. Runtime
state (sessions, events, memory) lives in Postgres.
Example config.json:
{
"global_repo": {
"url": "git@github.com:org/choir-global.git",
"ref": "main"
},
"workspaces": {
"main-ws": { "path": "/var/lib/choir/workspaces/main" },
"scratch": { "path": "/var/lib/choir/workspaces/scratch" }
},
"models": {
"sonnet": {
"provider": "openrouter",
"model": "anthropic/claude-sonnet-4-20250514",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": 0.7,
"reasoning_effort": null,
"secret": "openrouter-key"
},
"gpt4o": {
"provider": "openrouter",
"model": "openai/gpt-4o",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": 0.5,
"reasoning_effort": null,
"secret": "openrouter-key"
},
"o3": {
"provider": "openrouter",
"model": "openai/o3",
"endpoint": "https://openrouter.ai/api/v1",
"temperature": null,
"reasoning_effort": "medium",
"secret": "openrouter-key"
}
},
"tts": {
"eleven": {
"provider": "elevenlabs",
"model_id": "eleven_multilingual_v2",
"endpoint": "https://api.elevenlabs.io/v1",
"secret": "elevenlabs-key"
}
},
"voice_profiles": {
"default-en": {
"tts": "eleven",
"voice_id": "...",
"output_format": "opus_48000_128",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": true,
"speed": 1.0
}
},
"narrator": {
"tts": "eleven",
"voice_id": "...",
"output_format": "mp3_44100_128",
"voice_settings": {
"stability": 0.8,
"similarity_boost": 0.9,
"style": 0.3,
"use_speaker_boost": false,
"speed": 0.9
}
}
},
"git_identities": {
"dev-identity": {
"name": "Dev Agent",
"email": "dev@choir.local",
"secret": "git-dev-token"
},
"ops-identity": {
"name": "Ops Agent",
"email": "ops@choir.local",
"secret": "git-ops-token"
}
},
"notion_integrations": {
"personal-wiki": {
"secret": "notion-personal-key"
},
"work-wiki": {
"secret": "notion-work-key"
}
},
"email_accounts": {
"primary-email": {
"smtp_host": "smtp.example.com",
"smtp_port": 587,
"imap_host": "imap.example.com",
"imap_port": 993,
"sharing": "exclusive",
"secret": "email-primary-creds"
},
"notifications": {
"smtp_host": "smtp.example.com",
"smtp_port": 587,
"imap_host": "imap.example.com",
"imap_port": 993,
"sharing": "shared",
"secret": "email-notify-creds"
}
},
"search": {
"brave": {
"secret": "brave-api-key"
}
},
"embedding": {
"provider": "openrouter",
"model": "text-embedding-3-small",
"dimensions": 1536,
"endpoint": "https://openrouter.ai/api/v1",
"secret": "openrouter-key"
},
"gateways": {
"bot-main": {
"type": "telegram",
"secret": "tg-bot-main-token"
},
"bot-work": {
"type": "telegram",
"secret": "tg-bot-work-token"
}
},
"dms": {
"admin-dm": { "gateway": "bot-main", "user_id": "123456789", "admin": true },
"user-dm": { "gateway": "bot-main", "user_id": "987654321", "admin": false },
"work-dm": { "gateway": "bot-work", "user_id": "123456789", "admin": true }
},
"postgres": {
"host": "localhost",
"port": 5432,
"database": "choir",
"secret": "postgres-admin-creds"
},
"agents": {
"agent-1": {
"repo": {
"url": "git@github.com:org/choir-agent-1.git",
"ref": "main"
},
"defaults": {
"workspace": "main-ws",
"llm": "sonnet",
"voice_profile": "default-en",
"git_identity": "dev-identity",
"notion": "personal-wiki",
"email": "primary-email",
"dm": "admin-dm"
}
}
},
"heartbeat_interval_ms": 5000,
"crash_detection_threshold_ms": 10000,
"rate_limit_retry_ms": 1000,
"log_archive_threshold_lines": 100000
}All resources are named. Names are stable identifiers used throughout
the system (commands, gateway, config); underlying paths, endpoints,
credentials, and providers can change without affecting references.
Every named secret follows the same pattern: a "secret"
field references a named secret managed via
choirctl secret set / choirctl secret delete,
with runtime refresh triggered by
choirctl secret apply.
6.4 Resource Allocation Model
Resources are classified by access mode:
Shared resources (any number of agents may use
concurrently): – Models (LLM): API tokens are
referenced by name in model definitions. Multiple agents can use the
same model concurrently. – TTS providers: API tokens
are referenced by name. Multiple agents can use the same provider
concurrently. – Voice profiles: Named voice
configurations. Multiple agents can use the same voice profile
concurrently. – Search (Brave): Shared API key,
stateless. – Email accounts (when
"sharing": "shared"): Multiple agents may send from the
same account.
Exclusive resources (leased to one agent at a time):
– Workspaces: Host directories; see section 15.4. –
Git identities: Name, email, and auth credentials.
Leased so commits are unambiguously attributable. – Notion
integrations: Per-agent scoped. One agent per integration at a
time. – Email accounts (when
"sharing": "exclusive"): One agent at a time. –
DMs: Each agent gets exclusive access to its bound DM.
One agent per DM at a time. See section 19. – Browser
contexts: Each running agent gets an isolated Playwright
browser context with exclusive r/w access to its tabs. Managed by the
host-side browser worker, not configured in config.json
(automatically created per agent).
Leasing rules (applies to all exclusive resources):
1. choird grants leases at agent start. If any requested
exclusive resource is already leased to another running agent, the start
is rejected. 2. Leases are released when the agent’s session ends (stop,
crash, or terminate). 3. Every exclusive resource has a default defined
in the agent’s defaults block. All defaults are overridable
at start time via choirctl agent start flags or
/start arguments.
Singleton constraint: At most one instance of a
given agent configuration may be active at any time (Design Invariant
11). choird enforces this at agent start.
6.5 Git-Managed Source Repos
All tools, skills, identity files (USER.md,
SOUL.md, SOUL-CORE.md), and agent Dockerfiles
are versioned in git repositories. This provides change tracking,
reproducible builds, and a clean self-evolution workflow.
Global repo (one per choir installation):
choir-global/
tools/
<tool-name>.json # tool manifest
<tool-name>/ # tool source directory (if compiled)
skills/
<skill-name>.json # skill spec
identity/
USER.md # shared user identity
SOUL.md # default edge lane personality
SOUL-CORE.md # default core lane personality
Dockerfile.base # base image: OS packages, choir-agent binary
Per-agent repo (one per agent):
choir-agent-<id>/
tools/ # agent-specific tools (additive or override)
<tool-name>.json
<tool-name>/
skills/ # agent-specific skills (additive or override)
<skill-name>.json
identity/
SOUL.md # agent-specific edge personality (overrides global)
SOUL-CORE.md # agent-specific core personality (overrides global)
Dockerfile # FROM choir-base; agent-specific system deps + tool builds
Identity merge rule: USER.md comes from
the global repo only (shared user identity across all agents).
SOUL.md and SOUL-CORE.md each come from the
per-agent repo if present, otherwise fall back to global.
Tool/skill merge rule: Agent-specific tools and skills are merged onto global ones. If an agent tool has the same name as a global tool, the agent version overrides. This is resolved at build time, not runtime.
Dockerfile roles: – Dockerfile.base
(global repo): Defines the base image – OS packages (git,
openssh-client, python3-minimal, etc.), the
choir-agent binary, and shared infrastructure. Rebuilt only
when global repo changes. – Dockerfile (per-agent repo):
FROM choir-base:<version>. Installs agent-specific
system dependencies and compiles tool binaries from source directories.
Does not copy tools/skills/identity (the build pipeline handles that
separately before the Docker build).
Per-agent git identity: Each agent leases a named
git identity from the git_identities pool in
config.json (see section 6.4). The identity provides
user.name, user.email, and auth credentials.
Set in the container’s git config at startup. See section 14.4 for auth
details.
Local clone cache: choird maintains clones of all
repos under .choir.d/repos/. These are fetched during
config load and used during agent build. The
operator never edits these clones directly – they are managed by
choird.
6.6 Optional Host Workers
choird may manage host-side workers for heavyweight
integrations (e.g. Playwright browser worker), while keeping policy and
authority in choird.
7. Agent Execution Model
7.1 Two Cognitive Lanes
The system is NOT two separate agents. It is a single logical agent with two concurrent cognitive lanes sharing identity, tools, skills, and long-term memory:
- Edge Lane: Small, cheap, fast model. Acts as “mouth
and ears” – handles user interaction, routing decisions, real-time
responsiveness. Edge receives
USER.md+SOUL.mdin its system prompt. Edge decides to offload complex tasks to core when it judges the task is complex or when the user explicitly requests it. When spawning core, edge curates a clean task briefing: strips user chattiness, extracts the actionable request, and bundles only the context core needs. Edge can spawn multiple concurrent core jobs, each with a user-friendly name. While cores run, edge remains conversational but is restricted to read-only tools and can still spawn additional cores. - Core Jobs: Flagship reasoning model. Each core job
acts as “deep brain controlling limbs” – handles multi-step planning,
precision tool orchestration, deep reasoning. Core receives
SOUL-CORE.md(notUSER.mdorSOUL.md). Never sees raw user messages; only the curatedCoreJobStartfrom edge. Each core job has a name for easy reference (e.g., “refactor-auth”, “write-tests”). Multiple core jobs can run concurrently on different files without conflict; they are serialized only when accessing the same resource (e.g., both runningchoir.exec).
The async model should feel like “being able to chat with or interrupt a worker while it’s working.”
User message queue: Incoming user messages are
queued by the arbiter. Edge drains the queue only when it returns to
IDLE state. Messages that arrive while edge is in
REASONING, WAITING_TOOL,
WAITING_CORE, or FINALIZING wait in the queue.
They are not batched – each queued message triggers a separate IDLE
-> CONTEXT_READY -> … cycle. The
/inject <message> gateway command provides an
alternative: it injects a message into the edge lane’s context at the
next safe point, similar to how edge injects instructions into core (see
12.8).
Core completion report: When core returns to idle (CORE_COMPLETED or CORE_TERMINATED), edge must present a summary report of the core job’s outcome to the user. This is a mandatory edge behavior, not optional.
7.2 Edge Lane State Machine
States:
EDGE_IDLE
EDGE_CONTEXT_READY
EDGE_REASONING
EDGE_WAITING_TOOL
(EDGE_WAITING_CORE removed -- edge stays responsive while cores run)
EDGE_FINALIZING
EDGE_TERMINATED
Transitions:
IDLE -> CONTEXT_READY on UserMessage
CONTEXT_READY -> REASONING (inject memory context, call LLM)
REASONING -> FINALIZING on Finish (emit response, persist memory)
REASONING -> WAITING_TOOL on ToolProposal
REASONING -> REASONING on choir.core.spawn (core starts in background)
WAITING_TOOL -> REASONING on ToolResult
FINALIZING -> IDLE after result delivered
REASONING -> FINALIZING on LLM error (emit error message to user, then IDLE)
Any -> TERMINATED on Cancel, BudgetExceeded, fatal error
While any core job is running, edge remains conversational but is
restricted to read-only tools (choir.fs.read,
choir.fs.search, choir.web.search,
choir.memory.query) and choir.core.spawn (to
start additional cores). Edge can inject instructions to a specific core
by name, cancel a specific core by name, and list active cores. Edge is
forbidden from side-effect tools (choir.fs.write,
choir.exec) while any core is running.
7.3 Core Job State Machine (per-job)
States:
CORE_IDLE
CORE_CREATED
CORE_INITIALIZING
CORE_REASONING
CORE_WAITING_TOOL
CORE_COMPLETED
CORE_TERMINATED
Transitions:
IDLE -> CREATED on CoreJobStart from edge
CREATED -> INITIALIZING (load task, context, set budgets)
INITIALIZING -> REASONING (LLM call)
REASONING -> COMPLETED on FinalResult
REASONING -> WAITING_TOOL on ToolProposal
REASONING -> REASONING on InstructionInjected
REASONING -> TERMINATED on LLM error (error surfaced to edge via CoreEvent)
WAITING_TOOL -> REASONING on ToolResult
Any -> TERMINATED on Cancel, Error, BudgetExceeded
Each core job is a separate goroutine with its own state machine. The
job is identified by a user-friendly name (assigned by edge or the
user). Lock ownership uses core:<job-name> so
multiple cores can hold locks on different resources concurrently. When
a core job completes or terminates, edge presents a summary to the user
(mandatory).
7.4 Arbiter (Event-Sourced Commit Log)
The arbiter is a dedicated goroutine within the choir-agent process.
It owns the single authoritative commit log. Edge and core lanes submit
proposals to the arbiter via Go channels; the arbiter validates,
acquires locks, executes tools (or delegates to choird via
EXECUTE_HOST_TOOL), commits results, and releases locks. No
lane can commit a side effect without going through the arbiter.
Authoritative event types:
UserMsg
ModelOutput(edge/core)
ToolCallRequested
ToolCallCommitted
ToolResultCommitted
MemoryWriteCommitted
CoreStarted / CoreStopped
InjectedInstruction
Cancelled
Only the arbiter appends “Committed” events. Agents only emit proposals.
Commit model: the arbiter commits events to a local in-memory append
log (the primary authority during runtime). Heartbeat replication
asynchronously sends committed events to choird, which persists them to
session_events in Postgres. On crash, unreplicated events
are lost; recovery resumes from the last host-acknowledged revision.
This trades durability for latency – no database round-trip per tool
execution.
Two-phase tool execution: 1. Propose (agent emits structured tool call) 2. Commit (arbiter validates, acquires locks) 3. Execute (tool runs) 4. Commit result (arbiter records result to local log, releases locks)
v1 tradeoff: the arbiter is fully serial. If a host-delegated tool
(EXECUTE_HOST_TOOL) takes seconds (e.g. Playwright
browsing), the other lane’s tool proposals queue behind it. This is
accepted for v1 simplicity; async pipelining of host tools is a future
optimization.
7.5 Lifetime Containment
Core lifetime is a strict subset of Edge lifetime – mechanically enforced:
- Process tree: Core is spawned as child of Edge. If Edge dies, Core dies.
- Lease TTL: choird only accepts requests with valid edge session lease.
- Job token binding: Core job tokens are derived from edge lease and expire sooner.
7.6 Budget Enforcement
Edge enforces (session-level): – max_core_jobs –
total_session_tokens –
max_tool_calls_per_session
Core enforces (job-level): – per_job_max_steps –
per_job_max_tool_calls – per_job_wall_time
Budgets are never reset by injection.
7.7 LLM Call Error Handling
When an LLM API call fails (rate limit, 5xx, timeout, malformed response, content filter rejection): 1. The error is surfaced as a message to the user via the bound DM. 2. The session continues. Edge handles the error by transitioning through FINALIZING (emit error message) back to IDLE, ready for the next user message. Core terminates on LLM error (REASONING -> TERMINATED), surfacing the error to edge via CoreEvent; edge then presents the error to the user. 3. No automatic retry of the failed LLM call. The user can re-trigger by sending a new message (edge) or by spawning a new core job.
Rate limit handling (provider 429 responses): 1.
choir-agent parses the Retry-After header from the provider
response. 2. On the first rate-limited call, choir-agent immediately
sends an informational message to the user’s DM (via choird) and logs
the event. 3. choir-agent waits for the Retry-After
duration (or the configured rate_limit_retry_ms default if
no header, default: 1000ms), then retries the call. 4. If the retry also
fails, the error is surfaced to the user per the standard error handling
above.
7.8 Safe Points
Committed transitions happen only at safe points: 1. Completed LLM call parse/validate. 2. Completed tool result commit. 3. Skill state transition commit.
8. Tool System
8.1 Design Principle
The only way a model can affect or observe the workspace is through tools.
No implicit context, no hidden mounts, no backdoors.
OS analogy:
| OS Concept | Choir Concept |
|---|---|
| User process | LLM lane |
| Kernel | choir-agent runtime |
| Syscalls | Tools |
| Filesystem | Workspace |
| Scheduler | Arbiter |
| Process memory | Compacted working memory |
8.2 Tool Invocation Contract
- Model must use structured tool calling (out-of-band tool channel).
- No regex/text parsing for tool calls.
- Tool names are canonical and namespaced
(e.g.
choir.fs.read). - All tool inputs validated against schema before execution.
8.3 Tool Definition (Dual View)
Every tool has two representations:
LLM view (what the model sees):
{
"name": "choir.fs.write",
"description": "Write content to a file in the workspace.",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" },
"content": { "type": "string" },
"mode": { "type": "string", "enum": ["overwrite", "append"] }
},
"required": ["path", "content"]
}
}Runtime view (internal metadata):
{
"llm": { "name": "...", "description": "...", "input_schema": "..." },
"runtime": {
"exec_path": "/choir/tools/global/git_commit",
"timeout_ms": 15000,
"locks": [
{ "resource": "workspace", "mode": "X" }
],
"network": false,
"secret_resources": ["git_identity"],
"side_effect": "write",
"idempotent": false,
"version": "1.0"
}
}The LLM never sees lock semantics, security policies, resource keys,
or internal implementation. Only the llm section is sent to
model APIs.
8.4 Tool Taxonomy
Read-only: choir.fs.read,
choir.memory.search, choir.repo.status – safe
to parallelize, require S locks.
Write/Mutating: choir.fs.write,
choir.repo.commit, choir.memory.upsert –
require X locks.
External Side Effects: choir.http.post,
choir.email.send, choir.web.browse – require X
locks + audit.
8.5 Tool Output Handling
Return structured JSON, not raw logs:
{
"status": "success",
"summary": "...",
"artifact_ref": "hash123"
}Per-tool output tagging controls visibility: –
exposure: edge | core | both | none –
prompt_mask: true/false
8.6 Tool Surface Minimization
Group logically rather than exposing many granular tools. Example:
instead of choir.memory.get,
choir.memory.get_by_hash, choir.memory.scan,
expose:
{
"name": "choir.memory.query",
"parameters": {
"mode": ["semantic", "hash", "range"]
}
}8.7 Built-In Tools (v1)
| # | Name | ID | Lock | Host? | Notes |
|---|---|---|---|---|---|
| 1 | Shell | choir.exec |
workspace:X |
no | Arbitrary commands; secret_resources: [] |
| 2 | Read File | choir.fs.read |
file:<path>:S |
no | Supports chunking via head/tail |
| 3 | Edit File | choir.fs.write |
file:<path>:X |
no | Patch-based (replace range, append, insert at line) |
| 4 | Ripgrep | choir.fs.search |
file:<path>:S |
no | rg binary shipped in image; locks target file/dir |
| 5 | TTS | choir.tts.speak |
choirtmp:X |
no | TTS via agent’s voice profile; writes audio to
.choirtmp/send/ |
| 6 | Brave Search | choir.web.search |
none | no | Brave Search API (structured results) |
| 7 | Browse | choir.web.browse |
browser_tab:X |
yes | Playwright via host EXECUTE_HOST_TOOL |
| 8 | Notion | choir.notion.query |
none | no | Notion API integration |
| 9 | Email Send | choir.email.send |
none | no | SMTP email send |
| 10 | Email Receive | choir.email.receive |
none | no | IMAP fetch; returns message list/content |
| 11 | Email Check | choir.email.check |
none | no | IMAP check for new messages; returns count/summary |
| 12 | Memory Query | choir.memory.query |
none | yes | Search working/session/knowledge via host
EXECUTE_HOST_TOOL (see 11.7) |
| 13 | Memory Write | choir.memory.upsert |
none | yes | Write to knowledge store only via host
EXECUTE_HOST_TOOL (see 11.7) |
| 14 | Memory Compact | choir.memory.compact |
none | no | Force reference summary update for calling lane (see 11.3) |
8.8 Tool Registration
Registry sources: 1. Built-in tools (runtime-backed
Go implementations). 2. External tools defined by manifest + executable
in /choir/tools/....
On-disk structure for external tools:
/choir/tools/
global/ # baked into image, shared
foo.json # tool manifest
foo # executable
agent/ # agent-specific tools
bar.json
bar
No dynamic runtime installs. Tool and skill registries are immutable at runtime.
Tool loading startup sequence: 1. Load built-in tools (Go functions)
2. Scan /choir/tools/global 3. Scan
/choir/tools/agent 4. Validate JSON schemas 5. Verify
executables exist 6. Build registry 7. Register LLM-facing schemas
(stripped of runtime metadata) 8. If any JSON invalid -> fail startup
(fail fast)
Skill loading startup sequence: 1. Scan /choir/skills/
2. Parse each .json file against SkillSpec schema 3.
Validate state machines: all transitions reference valid states,
terminal states exist, no unreachable states 4. Validate
allowed_tools references against the tool registry 5. Build
skill registry 6. If any JSON invalid or state machine ill-formed ->
fail startup
Built-in and external tools share a unified Go interface:
type Tool interface {
Name() string
Schema() JSONSchema
Execute(ctx context.Context, input json.RawMessage) (json.RawMessage, error)
}8.9 Tool Execution Pipeline
- Send request (system prompt + messages + tool schemas) to LLM.
- Model responds with text OR structured tool call(s).
- Arbiter receives tool calls; for each:
- Validate schema.
- Enforce path constraints, lease, resource locks.
- Acquire locks.
- Execute tool.
- Log event.
- Release locks.
- Send tool result back to model.
- Call model again.
8.10 Meta-Tools and Approvals
Capability-changing requests (tool addition/change, skill change,
image update) are proposal-only and require manual approval via
choirctl or gateway command path.
Two classes of tools:
Runtime tools: Execute immediately (governed by locks).
Control plane tools (meta):
choir.propose.tool, choir.propose.skill,
choir.propose.config_change – never execute immediately,
never hold locks, always require manual approval.
Proposal pipeline:
LLM -> propose_tool_change
-> choir-agent logs proposal
-> choird stores pending proposal
-> human approval (via choirctl or gateway)
-> choird mutates registry
Two-phase tool registration: 1. LLM proposes metadata (name,
description, schema, intended behavior). 2. Human writes implementation,
reviews schema, registers via choirctl.
The project ships with tool-builder and
skill-builder skills (see section 10.3) that guide the
agent through the proposal process as a structured multi-phase
workflow.
Hard prohibitions: LLM never injects executable code. No auto-apply.
9. Resource Lock Model
9.1 Lock States
Per resource key: 1. FREE 2.
S_LOCKED(count) – shared/read, multiple holders 3.
X_LOCKED(owner) – exclusive/write, single holder
Locks apply to tools only, not skills.
Transition table:
| From | To | Condition |
|---|---|---|
| FREE | S_LOCKED | acquire S |
| FREE | X_LOCKED | acquire X |
| S_LOCKED | S_LOCKED | acquire S (additional reader) |
| S_LOCKED | FREE | last S released |
| X_LOCKED | FREE | X released |
No S-to-X upgrades in v1.
9.2 Resource Key Namespaces
workspace -- global per leased workspace (coarse-grained)
file:<path> -- per-file lock (fine-grained, under workspace)
mem:<workspace> -- memory namespace (reserved, no v1 users)
repo:<workspace> -- repo operations (reserved, no v1 users)
browser_tab -- browser context
choirtmp -- .choirtmp/ staging area (independent of workspace lock)
net:<service> -- network services (reserved, no v1 users)
ext:<provider> -- external side effects (reserved, no v1 users)
Lock hierarchy: workspace is a
coarse-grained lock that dominates all file:<path>
locks. When workspace:X is held (e.g. by
choir.exec), no file:<path> lock can be
acquired – reads and writes to individual files are blocked. When only
file:<path> locks are held, workspace:X
must wait for all file locks to be released.
.choirtmp/ is exempt from the workspace lock
hierarchy. It has its own independent choirtmp
lock. This allows gateway file transfers (inbound uploads, outbound
multimedia) to proceed without conflicting with workspace-level tool
execution. Tools that write to .choirtmp/send/ acquire
choirtmp:X; choird reads from .choirtmp/send/
and writes to .choirtmp/recv/ outside the agent’s lock
manager (choird operates on the bind mount directly).
workspace, file:<path>,
browser_tab, and choirtmp are actively used by
v1 tools. The remaining namespaces are retained for future tool
additions.
9.3 Atomic Lockset Requirement
Tool invocation must acquire all required locks atomically (all-or-nothing). Release is atomic for the full lockset.
9.4 Concurrency Model
The lock manager supports edge + N concurrent core jobs: 1. One mutex. 2. One condition variable. 3. Canonicalized lockset checks under the same critical section.
Lock ownership uses the lane identifier: edge or
core:<job-name>. Multiple core jobs can hold
exclusive locks on different resources simultaneously (e.g.,
core:refactor-auth holds file:src/auth.go:X while
core:write-tests holds file:src/auth_test.go:X).
No partial lock holding while waiting.
Properties: – Deadlock-free: Atomic acquisition
means no partial holding. – Dirty-read exception: Edge
may acquire file:<path>:S while any core job holds
workspace:X. This allows edge to read individual files (and
answer user questions) during core execution. Edge may observe
partially-modified state. – Writer-preference: If any
waiter requests X, block new S grants (except the dirty-read exception
above). – Release on all exits: normal completion,
error, timeout, cancellation, panic (via defer). –
Crash safe: If choir-agent crashes, locks vanish with
the process.
9.5 Default Policy
choir.execalways acquiresworkspace:X. No command parsing, no read-only override. This blocks all concurrent file-level operations.choir.fs.readandchoir.fs.searchacquirefile:<path>:S(per-file shared lock). Multiple concurrent reads to different files are allowed.choir.fs.writeacquiresfile:<path>:X(per-file exclusive lock). Blocks other reads/writes to the same file but not to different files.- Tools with no lock requirement (e.g.
choir.web.search,choir.email.send) acquire no locks and never block.
10. Skill System
10.1 Skill Definition
Skills are deterministic orchestration state machines. They do NOT own locks, commit tools, or mutate workspace directly – they only propose.
Formal definition:
Skill = (States, Transitions, Guards, Policies)
SkillSpec schema:
{
"name": "build_feature",
"description": "Implement a feature end-to-end.",
"initial_state": "understand",
"input_schema": "<JSONSchema>",
"output_schema": "<JSONSchema>",
"states": {
"understand": {
"objective": "Clarify and restate the requirement.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "plan" }
]
},
"plan": {
"objective": "Produce an implementation plan.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "modify" },
{ "on": "revise", "to": "understand" }
]
},
"modify": {
"objective": "Apply changes to files.",
"allowed_tools": ["choir.fs.read", "choir.fs.write"],
"transitions": [
{ "on": "complete", "to": "validate" }
]
},
"validate": {
"objective": "Verify correctness.",
"allowed_tools": ["choir.fs.read"],
"transitions": [
{ "on": "complete", "to": "done" },
{ "on": "fail", "to": "modify" }
]
},
"done": { "terminal": true }
},
"max_steps": 20,
"interruptible": true
}10.2 Skill Execution
Each LLM call receives: global context + compacted memory + current skill step objective + step-specific context. Not the whole skill, not all steps – just one phase.
Model proposes: tool call, transition event, or finish. Arbiter validates allowed tools, validates transition, applies state change, logs event.
Key constraints: – One active skill per lane at a time. – Skills must not spawn other skills recursively. – Step context must be structured (not free text accumulation). – LLM cannot jump to arbitrary states, call forbidden tools, or skip transitions.
Constraint violation handling: when the LLM proposes a tool not in
allowed_tools or an invalid transition, the arbiter rejects
the proposal and returns a structured error to the LLM with the list of
allowed tools and valid transitions for the current state. The LLM
retries with corrected output. A per-step retry budget (default: 2
retries) prevents infinite correction loops; exceeding it triggers a
skill-level error transition.
10.3 Built-In Skills
The project ships with two built-in skills for self-evolution:
tool-builder: Guides the agent through
proposing a new tool.
{
"name": "tool-builder",
"description": "Design and propose a new tool for the agent.",
"initial_state": "understand",
"states": {
"understand": {
"objective": "Clarify what the tool should do, its inputs/outputs, and side effects.",
"allowed_tools": ["choir.memory.query", "choir.fs.read"],
"transitions": [{ "on": "complete", "to": "design" }]
},
"design": {
"objective": "Draft the tool manifest: LLM view (name, description, input schema) and runtime view (lock requirements, secret_resources, timeout, side_effect classification).",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "validate" },
{ "on": "revise", "to": "understand" }
]
},
"validate": {
"objective": "Review the manifest for correctness, check for conflicts with existing tools, and verify the schema is well-formed.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "propose" },
{ "on": "fail", "to": "design" }
]
},
"propose": {
"objective": "Submit the tool proposal via choir.propose.tool for human approval.",
"allowed_tools": ["choir.propose.tool"],
"transitions": [{ "on": "complete", "to": "done" }]
},
"done": { "terminal": true }
},
"max_steps": 15,
"interruptible": true
}skill-builder: Guides the agent through
proposing a new skill.
{
"name": "skill-builder",
"description": "Design and propose a new skill state machine.",
"initial_state": "understand",
"states": {
"understand": {
"objective": "Clarify the workflow the skill should orchestrate, its phases, and expected outcomes.",
"allowed_tools": ["choir.memory.query", "choir.fs.read"],
"transitions": [{ "on": "complete", "to": "design" }]
},
"design": {
"objective": "Draft the SkillSpec: states, transitions, per-state objectives, allowed tools, input/output schemas, and max_steps.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "validate" },
{ "on": "revise", "to": "understand" }
]
},
"validate": {
"objective": "Verify the state machine is well-formed: all transitions reference valid states, terminal states exist, allowed tools are registered, and no unreachable states.",
"allowed_tools": ["choir.memory.query"],
"transitions": [
{ "on": "complete", "to": "propose" },
{ "on": "fail", "to": "design" }
]
},
"propose": {
"objective": "Submit the skill proposal via choir.propose.skill for human approval.",
"allowed_tools": ["choir.propose.skill"],
"transitions": [{ "on": "complete", "to": "done" }]
},
"done": { "terminal": true }
},
"max_steps": 15,
"interruptible": true
}Both skills produce proposals that go through the approval pipeline (see section 8.10). The human reviews and either approves (choird registers the new tool/skill, rebuilds the agent image) or rejects. The agent never directly installs tools or skills.
10.4 Hierarchical State Machine Composition
Session SM
+-- Lane SM (edge/core)
+-- Skill SM
+-- LLM step (stochastic proposal)
11. Memory Architecture
Each agent has four memory stores. Session-derived memory (tiers 1-3) is automated by choird. Knowledge (tier 4) is explicitly managed by the agent or operator.
11.1 Memory Overview
| Tier | Name | Location | Lifecycle | Vectorized |
|---|---|---|---|---|
| 1 | Working memory | choir-agent process (choird-snapshotted) | Current session | No (in-memory) |
| 2 | Mid-term memory | Postgres | Last N sessions (default 10) | Yes |
| 3 | Long-term memory | Postgres | Older sessions | Summaries only |
| 4 | Knowledge | Postgres | Persistent, agent-managed | Yes |
11.2 Access Control
| Operation | Scope |
|---|---|
| Write working memory | Own session only (automatic via arbiter) |
| Read working memory | Own session only |
| Write mid-term / long-term | Own agent only (choird-automated) |
| Read mid-term / long-term | Any agent’s |
| Write knowledge | Own agent only |
| Read knowledge | Any agent’s |
Enforced in choird’s EXECUTE_HOST_TOOL handler: writes
check agent_id == caller, reads allow any
target_agent.
11.3 Tier 1: Working Memory
Lives in choir-agent process memory. Per-lane (edge and core each maintain their own view of the same session events).
A. Event window – the last N committed events (full payloads with hash references). Injected directly into the LLM prompt. N is configurable (default ~50 events). When the window fills, oldest events roll off.
B. Per-lane reference summary – a mutable structured document summarizing everything that has rolled off the event window. Updated via LLM-generated structured deltas. Edge and core maintain separate summaries; they see the same events but summarize from their role’s perspective.
Reference summary schema:
{
"summary": "...",
"facts": [
{ "key": "...", "value": "...", "source_event": "ev-hash-123" }
],
"referenced_sessions": ["session-abc"],
"referenced_events": ["ev-hash-001", "ev-hash-002"]
}Memory delta schema (for updating the reference summary):
{
"memory_delta": {
"mode": "append | overwrite",
"summary_update": "...",
"add_references": ["hash"],
"remove_references": ["hash"],
"add_fact": { "..." },
"remove_fact": { "..." }
}
}Structured deltas prevent total corruption via free-form overwrite.
LLM prompt structure – differs by lane:
Edge lane:
[system prompt] -- edge instructions + skill summaries
[USER.md] -- user identity (NEVER compacted)
[SOUL.md] -- edge personality (NEVER compacted)
[lane_reference_summary] -- materialized summary of compacted events
[event window] -- last N events (compactable content only)
[current user message] -- NEVER compacted
Core lane:
[system prompt] -- core instructions + skill summaries
[SOUL-CORE.md] -- core personality (NEVER compacted)
[lane_reference_summary] -- materialized summary of compacted events
[event window] -- last N events (compactable content only)
[CoreJobStart] -- curated task briefing from edge
The system prompt for each lane contains: 1. Lane-specific behavioral
instructions (edge: user interaction, routing, injection; core: deep
execution, tool orchestration). 2. A summary list of available skills
(name + description for each). 3. Tool schemas are NOT included in the
system prompt – they are passed via the OpenAI-compatible
tools parameter in the API request.
Compactable content (eligible for rolling off the
event window into the reference summary): UserMsg events
(except the most recent), LLM outputs (ModelOutput), and
tool outputs (ToolResultCommitted). These are the bulk of
context growth.
Never compacted (always present in full, excluded
from the event window token budget): – System prompt (lane instructions
+ skill summaries). – Identity files (USER.md +
SOUL.md for edge; SOUL-CORE.md for core). –
The most recent UserMsg event (the message currently being
responded to). Older user messages ARE compactable. –
CoreJobStart (core lane only: the task briefing from
edge).
For older in-session context beyond the event window, the agent uses
choir.memory.query with store: "working" to
search the full in-memory event log by keyword or hash reference.
Compaction triggers (reference summary update): –
Automatic: when the compactable content in the event
window exceeds a configurable token/byte threshold (set in
.choir.d/config.json, default: 80% of the model’s context
window minus the non-compactable content). The runtime measures after
each LLM response or tool result. – Manual: via
choirctl session compact <session-id>,
/compact (gateway), or the
choir.memory.compact tool (agent-invoked).
When compaction fires, the oldest compactable events in the event window are folded into the reference summary via LLM-generated structured deltas. The event window shrinks; the reference summary grows. Non-compactable content is untouched. Compaction runs asynchronously within the lane – the lane continues processing while the compaction LLM call is in flight. The updated reference summary is swapped in atomically when the compaction call completes. If the compaction call fails, the event window retains its current contents and compaction retries on the next trigger.
Persistence: choird snapshots both lanes’ working
memory (reference summaries + event window boundaries) via heartbeat
replication. On crash recovery, reference summaries are restored from
snapshot; the event window is rebuilt from the
session_events tail.
11.4 Tier 2: Mid-Term Memory
Stored in Postgres. Contains the last N sessions (N configurable, default 10) with full events chunked and vectorized.
When a session ends (graceful stop), choird: 1. Chunks the session
events (already in session_events) into logical blocks (by
tool sequence, skill phase, or fixed size). 2. Embeds each chunk via the
embedding pipeline. 3. Stores in memory_documents with
tier = 'mid_term'.
Raw session_events rows are retained indefinitely
alongside the memory_documents chunks (redundant but
preserves full granularity for replay and audit).
Searchable via choir.memory.query with
store: "session", mode semantic or
text. Returns chunks from any of the agent’s last N
sessions. Cross-agent queries also hit this tier.
11.5 Tier 3: Long-Term Memory
Stored in Postgres. Contains sessions that have aged out of mid-term. Only summaries are vectorized; full event detail is retained but not indexed for vector search.
When a session ages out of mid-term (session count exceeds N): 1. The
agent generates a session summary as the last step of graceful shutdown
(LLM call, structured output, included in the final heartbeat). This
keeps choird inference-free. 2. choird chunks the summary into logical
partitions (per-skill, per-topic, or per-time-block). 3. Embeds the
summary chunks and stores as tier = 'long_term_summary'
(vectorized, searchable). 4. Marks the session’s mid-term event chunks
as tier = 'long_term_detail' – kept but vectors dropped
from the HNSW index (saves index size). 5. Raw
session_events rows remain in Postgres (unchanged).
Default semantic search (via store: "session") hits
mid-term chunks + long-term summaries. To drill into a specific
long-term session’s full events, the agent must reference a
session_id explicitly via choir.memory.query
with mode session_detail, which triggers text search
against long_term_detail chunks for that session only.
11.6 Tier 4: Knowledge
Stored in Postgres. Separate from session-derived memory. Not
automated by choird. Explicitly managed by the agent via
choir.memory.upsert or by the operator via
choirctl.
Stores persistent facts, user preferences, domain notes, reference material, project context – anything not tied to a specific session’s event stream.
Supports insert, update-by-key (optional dedup key), and delete.
Vectorized and searchable via choir.memory.query with
store: "knowledge".
11.7 Memory Tool Surface
| Tool | store |
Modes | Notes |
|---|---|---|---|
choir.memory.query |
working |
keyword, hash reference | Current session in-memory log. Own agent only. |
choir.memory.query |
session |
semantic, text,
session_detail |
Mid-term + long-term summaries. session_detail requires
session_id for long-term drill-down.
target_agent defaults to self, can be any agent. |
choir.memory.query |
knowledge |
semantic, text |
Knowledge base. target_agent defaults to self, can be
any agent. |
choir.memory.upsert |
knowledge |
insert, update-by-key, delete | Own agent’s knowledge store only. |
choir.memory.compact |
working |
(triggers compaction) | Forces reference summary update for the calling lane. |
11.8 Session Shutdown Memory Pipeline
During graceful stop (choirctl agent stop /
agent update): 1. Agent completes current safe point. 2.
Agent generates session summary (LLM call, structured output). Timeout:
if the LLM call does not complete within 30 seconds (configurable), the
agent skips the summary and proceeds with shutdown. A missing summary
means the session will not have a long-term summary when it ages out of
mid-term (mid-term chunks are still generated from raw events). 3. Agent
includes summary in final heartbeat payload (if generated). 4. Agent
flushes all unreplicated events to choird. 5. choird persists everything
to Postgres. 6. choird chunks and embeds session events into
memory_documents (mid-term). 7. If mid-term session count
exceeds N, the oldest session is promoted: summary chunks become
long_term_summary (vectorized), event chunks become
long_term_detail (vectors dropped).
11.9 Embedding Backend
Integrated into choird (not separate service in v1): 1.
Postgres for durable text and metadata. 2. pgvector for embedding
search. 3. Embedding calls via configurable provider endpoint.
Embedding configuration: The embedding model is
configured in .choir.d/config.json, not hardcoded. Default:
text-embedding-3-small (1536 dimensions) via OpenRouter.
The vector column dimension is derived from config
(vector(N) where N matches the configured model’s output).
Swapping models requires a re-embedding migration but no code
change.
Embedding pipeline: – Batch: up to 32 chunks per API
call. – Queue: embeddings from choir.memory.upsert and
session close chunking are queued and flushed on heartbeat tick or when
queue reaches batch size. – Retry: exponential backoff with jitter, max
3 retries (1s/2s/4s base). – Graceful degradation: on final failure,
store chunk without vector and log warning. Never block tool execution
on embedding failure. Chunks without vectors are still searchable via
full-text search (tsvector).
Design rule: keep memory logic modular inside choird
with a clean MemoryStore interface for future extraction if
needed.
12. Communication Protocol
12.1 Dual Transport Model
Both transports are implemented in v1. UDS is the default for single-host deployment; TCP/HTTP is scaffolding for future EKS-style multi-node deployments.
| Transport | When | Auth | Notes |
|---|---|---|---|
| UDS (default) | Single-host, container on same machine | Lease token + file permissions (chmod 600) |
Lowest latency, no port conflicts, impossible to expose externally |
| TCP/HTTP | Future multi-node, or when container runs on a remote host | Lease token + mTLS (or signed request headers / JWT) | Required for EKS pods talking to a control-plane service |
Both transports carry the same logical messages with the same semantics. The choice is a deployment-time configuration decision, not an application-logic decision.
UDS details: Socket path
~/.choir.d/socks/choird.sock on host, bind-mounted as
/run/choir.sock inside container. choird listens,
choir-agent connects. UDS is bidirectional: choird can also write to the
socket to push signals (cancel, injection, config change notifications)
to the agent.
TCP/HTTP details: choird exposes an HTTP endpoint
(e.g. https://choird.internal:9400). choir-agent connects
with lease token in Authorization header. For EKS, this
becomes a Kubernetes Service endpoint. TLS is required when traversing a
network boundary; plaintext is acceptable only over loopback.
HTTP wire format: JSON-over-HTTP. No gRPC or
protobuf in v1. – Request/response: POST /rpc/<verb>
with Content-Type: application/json. Common envelope fields
(request_id, session_id,
lease_token) in HTTP headers; verb-specific payload in the
body. – Server-push streaming: SSE
(GET /events?session_id=...,
text/event-stream). Agent opens after
INIT_HELLO. choird pushes config change notifications,
injection signals from gateway, and cancel signals. Agent sends
heartbeats as regular POST requests. – No bidirectional streaming
needed: agent-to-choird is request/response, choird-to-agent is SSE.
12.2 Transport Abstraction
All protocol logic is transport-agnostic. Transport selection happens once at startup; business logic never branches on transport type.
Agent-side interface:
type ControlPlane interface {
Heartbeat(ctx context.Context, req HeartbeatReq) (HeartbeatResp, error)
RequestApproval(ctx context.Context, req ApprovalReq) (ApprovalResp, error)
GetSecrets(ctx context.Context, req SecretReq) (SecretResp, error)
Terminate(ctx context.Context, req TerminateReq) error
}Server-side interface (choird):
type ControlPlaneHandler interface {
HandleHeartbeat(ctx context.Context, req HeartbeatReq) (HeartbeatResp, error)
HandleApproval(ctx context.Context, req ApprovalReq) (ApprovalResp, error)
HandleSecrets(ctx context.Context, req SecretReq) (SecretResp, error)
HandleTerminate(ctx context.Context, req TerminateReq) error
}Mountable on UDS listener, HTTP server, or (future) gRPC server.
Runtime toggle:
# Single-host (default)
choir-agent --control-plane=uds --socket=/run/choir.sock
choird --transport=uds
# Multi-node / EKS scaffolding
choir-agent --control-plane=http --endpoint=https://choird.internal:9400
choird --transport=http --listen=:9400 --tls-cert=... --tls-key=...12.3 Protocol Design Rules
- Include request ID, session ID, and auth token in logical headers on every call, regardless of transport.
- Messages are stateless; never rely on connection state.
- Use a structured schema (protobuf or strict JSON schema) that is transport-independent. The same schema definition generates both UDS and HTTP serialization.
- Transport branching only in client factory / server bootstrap code, never in business logic.
12.4 Authentication Per Transport
| Transport | v1 Auth | Future Auth |
|---|---|---|
| UDS | Lease token (env var at container start) + socket file permissions | Same |
| HTTP | Lease token in Authorization header + server-only TLS
(self-signed OK for single-host) |
mTLS (self-signed CA or cert-manager), ServiceAccount identity, JWT |
The lease token is generated by choird at container creation, passed as an env var, and required in every RPC call. It is scoped to a single session. The token’s lifetime matches the agent’s lifecycle: it is valid while the agent is active and revoked when the session ends (graceful stop or crash). There is no time-based expiry – token validity is tied to session liveness, not a clock.
No mTLS in v1: v1 is single-host only, so the HTTP
transport runs over loopback or a local Docker network. Lease token +
server-only TLS is sufficient. mTLS (mutual TLS with per-container
client certificates) is deferred to future EKS work, where containers
run on remote nodes and stronger identity verification is needed. The
ControlPlane interface does not change; only the TLS config
factory needs updating.
12.5 Minimum RPC Verbs (choir-agent ->
choird)
INIT_HELLOGET_SECRETSHEARTBEATREQUEST_APPROVALREPORT_STATUSTERMINATE_SELFFETCH_DYNAMIC_CONFIGEXECUTE_HOST_TOOL
No RPC verb can directly perform host-destructive operations. The verb set is identical across both transports.
Schema source of truth: Go struct definitions with JSON tags. No protobuf in v1. All requests carry a common envelope:
{
"request_id": "uuid",
"session_id": "...",
"lease_token": "...",
"verb": "HEARTBEAT",
"payload": { ... }
}Per-verb request/response schemas:
| Verb | Request Payload | Response Payload |
|---|---|---|
INIT_HELLO |
{ agent_id, session_id, image_version, tool_manifest_hash, skill_manifest_hash } |
{ status, lease_token, resource_bindings, snapshot?, tail_events[]?, config_version } |
GET_SECRETS |
{ resources[] } |
{ secrets: map[string]string } |
HEARTBEAT |
{ base_rev, new_rev, patches[], config_version, hash_prev, hash_new, timestamp } |
{ ack_rev, config_version_latest } |
REQUEST_APPROVAL |
{ request_type, payload (jsonb) } |
{ approval_id, status (pending\|approved\|rejected) } |
REPORT_STATUS |
{ lane, state, skill?, step?, budget_remaining } |
{ ack } |
TERMINATE_SELF |
{ reason } |
{ ack } |
FETCH_DYNAMIC_CONFIG |
{ current_config_version } |
{ config_version, config (jsonb)? } |
EXECUTE_HOST_TOOL |
{ tool_name, call_id, input (jsonb) } |
{ call_id, status (success\|error), output (jsonb) } |
EXECUTE_HOST_TOOL is the generic dispatch verb for tools
that require host-side execution. choir-agent sends:
{
"tool_name": "choir.web.browse",
"call_id": "...",
"input": { "url": "...", "mode": "text" }
}choird dispatches internally to the appropriate host worker
(Playwright for choir.web.browse, Postgres for
choir.memory.query, etc.) and returns the tool result. This
covers: – Browser rendering (Playwright worker) – Long-term memory
operations (Postgres + pgvector) – Any future host-side tool that cannot
run in-container
12.6 Edge <-> Core Communication
Edge and core are goroutines within the same choir-agent process. They communicate via Go channels, not IPC. The arbiter goroutine mediates all committed side effects between them.
Core -> Edge channel: CoreEvent stream (progress,
plan, result, error).
Edge -> Core channel: ToolResult,
Injection, Cancel.
12.7 Message Contracts
Edge to Core (CoreJobStart):
Core never sees raw user messages. Edge curates a
CoreJobStart that removes user chattiness, distills the
request into a clear task, and includes only the context core needs.
This is a key responsibility of the edge lane – it acts as a filter
between the conversational user interface and the precision execution
engine.
{
"job_id": "...",
"job_name": "descriptive-name (e.g. refactor-auth, write-tests)",
"task_spec": "clear, actionable instruction (written by edge, not the user's raw message)",
"context_bundle": {
"facts": ["relevant facts extracted by edge"],
"excerpts": ["file excerpts, prior results, or memory references"],
"constraints": ["user-stated requirements or preferences"]
},
"tool_constraints": { "allowed_tools": [], "time_budget_ms": 0 },
"output_schema": "<JSONSchema>",
"verbosity": "normal"
}Core to Edge (CoreEvent):
Event types:
progress (phase, percent, current focus)
plan (structured steps)
thought_summary (short reasoning summary, never raw CoT)
tool_proposal (tool, args, why, expected output)
need_info (what's missing)
partial_result (intermediate artifact)
final_result (structured deliverable)
error (structured failure)
12.8 Injection Protocol
Core injection (edge -> core): Edge injects
instructions to a specific core job during CORE_REASONING
or CORE_WAITING_TOOL. The user never injects directly into
a core job — they tell edge what they want, and edge decides whether and
how to relay to the appropriate core.
{
"type": "core_injection",
"job_name": "refactor-auth",
"content": "New instructions..."
}Injection is append-only – never mutates history, never resets budgets. Max injection count enforced. Injection cannot spawn new cores.
Edge injection (user -> edge): The
/inject <message> gateway command injects a message
into the edge lane’s context at the next safe point. This is the
mechanism for the user to provide additional instructions without
waiting for edge to drain its message queue (which only happens at
IDLE). The injected message is appended to the edge context, not queued.
There is no user-facing command to inject directly into a core job — all
core communication flows through edge.
13. Crash Recovery and Replication
13.1 Heartbeat Replication
choir-agent sends heartbeats at a configurable interval
(default: 5000ms, set via heartbeat_interval_ms in
config.json).
Crash detection: choird considers an agent crashed
if no heartbeat is received within
crash_detection_threshold_ms (default: 10000ms). On crash
detection, choird performs cleanup: 1. Marks the session as crashed in
Postgres. 2. Releases all resource leases (workspace, git identity,
notion, email, DM, browser context). 3. Removes the orphaned container.
4. Logs the crash event. 5. The session is recoverable on next
agent start via the recovery handshake.
Both heartbeat_interval_ms and
crash_detection_threshold_ms are configurable in
config.json. The crash threshold should be at least 2x the
heartbeat interval to avoid false positives.
Each heartbeat carries committed deltas to choird:
{
"session_id": "...",
"agent_id": "...",
"base_rev": "<last host-acknowledged revision>",
"new_rev": "<agent's latest committed revision>",
"patches": ["<ordered, contiguous>"],
"config_version": 41,
"hash_prev": "<chain hash>",
"hash_new": "<chain hash>",
"timestamp": "<for ops>"
}Committed event types: – LLM_CALL_COMMITTED –
TOOL_CALL_COMMITTED – TOOL_RESULT_COMMITTED –
SKILL_TRANSITION_COMMITTED
Host acknowledges accepted revision (ack_rev). Host
reconstructs state by folding events. Periodic snapshots bound replay
cost.
13.2 Recovery Handshake
On startup: 1. Agent sends hello with identity/session context. 2. Host responds with canonical snapshot + tail events (or start fresh). 3. Agent rehydrates runtime state to host-committed revision.
Recovered state is the last host-acknowledged revision, not mid-step transient state. Events committed locally but not yet replicated via heartbeat are lost on crash. This is the tradeoff of local-first commit (see section 7.4).
13.3 Idempotency Requirements
- Resent deltas for already committed revisions are no-ops.
- Tool side effects must be deduplicable with invocation IDs.
13.4 Working Memory Structure (Snapshotted)
reference_summary_edge (per-lane structured summary, see 11.3)
reference_summary_core (per-lane structured summary, see 11.3)
event_window_boundary (rev of oldest event in window)
skill_state (skill name, node, local ctx)
lane_state (edge/core status + budgets)
open_jobs (core job status, tool in-flight)
14. Secrets and Credentials
14.1 Identity vs Secret Injection
- Non-secret IDs (e.g.
AGENT_ID) may be env-injected at launch. - Secrets are never provided via env vars. Root + bash means
envor/proc/<pid>/environwould reveal them.
14.2 Secret Handshake
- choir-agent boots, connects to choird over the configured transport (UDS or HTTP).
- Sends:
{"type": "INIT", "agent_id": "...", "session_id": "...", "image_version": "..."} - Choird validates agent ID, session, policy. Resolves the agent’s resource bindings (leased workspace, git identity, notion integration, email account, models, voice profile, DM) from the session’s start-time overrides or agent defaults.
- Agent sends:
{"type": "REQUEST_SECRETS", "resources": ["git-dev-token", "openrouter-key", "notion-personal-key"]}(Secret names derived from the"secret"fields of the agent’s bound resources.) - Choird replies:
{"type": "SECRETS", "data": {"git-dev-token": "...", "openrouter-key": "...", ...}} - Agent stores secrets in-memory only; never writes to disk; never logs.
14.3 Secret Handling Rules
- Secrets only in process memory. Atomic swap on update; never mutate in place.
- Never persisted to
/workspaceor logs. - Secret access scoped per tool. A tool’s
secretfield in its runtime manifest references the named secret from the agent’s bound resources (e.g.choir.notion.querycan access the agent’s bound Notion secret;choir.execcannot access anything). - Explicit secret refresh supported without restart
(
choirctl secret apply). - On crash, secrets are lost. On restart, handshake repeats.
14.4 Git Identity and Auth
Each agent leases a named git identity (see section 6.4): 1.
user.name and user.email from the leased
git_identities entry. Configured in the container’s git
config at startup (not env vars). 2. Git auth credentials referenced by
the identity’s secret field. Each identity gets its own
credential set so commits are attributable. Identities are exclusive –
at most one agent may lease a given identity at a time. 3. Git auth via
custom credential helper (choir-agent binary detects
argv[0] == "git-cred-helper" and acts as helper). Helper
validates remote host before returning token. 4. SSH private keys
injected ephemerally into ssh-agent memory via
ssh-add - on stdin. Allowed hosts enforced via SSH config.
5. No persistent key files in workspace, env vars, or
.git/config. 6. Arbitrary git commands allowed.
15. Container Architecture
15.1 Filesystem Layout
/choir (read-only, baked into image)
/tools/
/global/ (shared tool executables + definitions)
/agent/ (agent-specific tools)
/skills/ (skill definitions)
/bin/git-cred-helper (credential helper binary/symlink)
USER.md (user identity -- edge lane only)
SOUL.md (edge lane personality)
SOUL-CORE.md (core lane personality)
version.json (version metadata)
/workspace (writable, bind-mounted, long-lived)
.choirtmp/ (staging area for gateway file transfer)
send/ (agent -> choird: tool outputs, multimedia)
recv/ (choird -> agent: user uploads, inbound files)
Version metadata (/choir/version.json):
{
"agent_id": "...",
"image_version": "abc1234",
"global_repo_commit": "def5678",
"agent_repo_commit": "ghi9012",
"tool_manifest_hash": "...",
"skill_manifest_hash": "..."
}choird verifies this at container startup to prevent silent drift.
15.2 Container Properties
- Lean image (Debian slim +
git,openssh-client,python3-minimal), restartable, no hot self-update. - Root and shell access inside container are allowed by design.
- Host escalation is prevented at container boundary and RPC surface.
15.3 Container Security Profile
Container constraints: – --cap-drop=ALL
– --security-opt no-new-privileges –
--read-only (root filesystem) – --tmpfs /tmp
(and /run) – cgroup limits: CPU/mem/pids – Default Docker
seccomp
Prohibited: – --privileged – Docker
socket mount – --pid=host – --network=host –
Device passthrough – Host mounts (except /workspace and,
when using UDS transport, the control socket)
15.4 Workspace Model
Workspaces are named resources defined in config.json
with explicit host paths. They are leased to agents – only one agent may
hold a lease on a given workspace at a time.
Leasing rules: 1. When an agent starts a session,
choird grants a workspace lease. The default workspace is defined in the
agent’s config; an alternative can be specified via
--workspace flag on agent start or
/start. 2. If the requested workspace is already leased to
another running agent, the start is rejected with an error. The operator
must stop the other agent or choose a different workspace. 3. The lease
is released when the agent’s session ends (stop, crash, or terminate).
4. Workspace-to-agent binding is per-session, not permanent. The same
agent can use different workspaces across sessions.
Workspaces are plain directories on the host filesystem. They are not necessarily git repositories – a workspace may contain a git repo, loose files, or any mix. choird does not manage workspace contents; the agent and operator do.
WS-1: /workspace is non-authoritative
and may be wiped at any time. Durable truth lives in external sources
(remote git origins, databases, etc.) or host control-plane state.
First-class reset:
choirctl workspace reset <workspace-name> – deletes
the backing directory contents. If an agent holds a lease on the
workspace, the agent is stopped first.
15.5 Restart vs. Hot Reload
Requires restart (changes what the agent can
do): – Tool registry / executables – Skill definitions – Lock
policies – Binary runtime logic – /choir content (USER.md,
SOUL.md, SOUL-CORE.md) – Dockerfiles (base or per-agent)
Hot-reloadable (changes how the agent does it): – Model list, provider endpoints, inference parameters (temperature, reasoning_effort), request templates, TTS provider settings, voice profiles – Feature flags – Secrets (revocable/refreshable without restart)
16. Dynamic Configuration and Secret Apply
16.1 Dynamically Reloadable
- Secrets.
- Model/provider list (LLM and TTS).
- Request templates and voice profiles.
- Tunable defaults and flags.
16.2 Not Dynamically Reloadable (Restart Required)
- Tool registry and executables.
- Skill definitions.
- Runtime binary.
/choiridentity content (USER.md,SOUL.md,SOUL-CORE.md).- Dockerfiles (base or per-agent).
16.3 Sync Mechanism
choird reads .choir.d/config.json and
.choir.d/secrets.json at startup. Subsequent updates are
explicit: 1. choirctl config load – read and validate
.choir.d/, stage in choird. 2.
choirctl config apply – bump config_version,
publish hot-reloadable changes to agents. 3.
choirctl secret apply – reload
.choir.d/secrets.json in choird and publish secret refresh
to running agents.
Agents sync via heartbeat: 1. Agent reports current
config_version. 2. Host replies with latest version. 3.
Agent fetches full new config via FETCH_DYNAMIC_CONFIG if
mismatch. 4. Agent atomically swaps in-memory config and secret
snapshots.
Restart-required changes (tools, skills, identity, Dockerfiles – all
sourced from git repos) are applied via
choirctl agent update or
choirctl agent update-all (see sections 17.4, 17.5).
Reload discipline: – Reload does NOT retroactively affect in-flight tasks. – Snapshot config at start of tool execution. – Snapshot model settings at start of LLM call. – Never partially patch config – version-based full replacement.
17. User Commands
Gateway (Telegram) commands are a subset of choirctl
commands. Config updates are two-phase: load stages into
choird, apply publishes hot-reloadable changes to agents.
Restart-required changes use the agent lifecycle commands
(build, update).
17.1 choirctl Command Reference
System setup:
choirctl init # first-time setup: create .choir.d/, skeleton config.json, print instructions (see 5.2)
Agent lifecycle:
choirctl agent init <agent-id> # scaffold new agent with placeholder config (see 17.8)
choirctl agent list # list all agents and status
choirctl agent start <agent-id> [flags] # start agent container (see below)
choirctl agent stop <agent-id> # graceful termination (see 17.2)
choirctl agent restart <agent-id> # stop + start (same image, same resource bindings)
choirctl agent status <agent-id> # detailed status (lane states, budgets, skill, resource bindings, uptime)
choirctl agent build <agent-id> # build new/updated agent image (see 17.3)
choirctl agent update <agent-id> # build (if needed) + stop + start with new image (see 17.4)
choirctl agent update-all # build + redeploy all agents (see 17.5)
agent start flags – every agent default (section 6.4) is
overridable:
--workspace=<name> # override default workspace
--llm=<name> # override default LLM model
--voice-profile=<name> # override default voice profile
--git-identity=<name> # override default git identity
--notion=<name> # override default Notion integration
--email=<name> # override default email account
--dm=<name> # override default DM binding (required for choirctl start)
Omitted flags use the agent’s defaults from
config.json. When starting via choirctl,
--dm is required (choirctl has no implicit DM context).
When starting via gateway /start, the DM defaults to the
one that sent the command (if in the bot’s allowlist). choird validates
that all named resources exist and that exclusive resources are not
already leased before starting the agent.
Session:
choirctl session list [agent-id] # list active/recent sessions
choirctl session events <session-id> # stream session event log
choirctl session cores <session-id> # list active core jobs (name, state, step)
choirctl session cancel <session-id> # cancel active core job (by name)
choirctl session compact <session-id> # trigger working memory compaction
Model switching (per-agent, hot-reloadable):
choirctl model list # list named LLM models and TTS providers from config
choirctl model get <agent-id> # show current LLM model and voice profile for agent
choirctl model set <agent-id> --llm=<name> # switch text generation model for agent
choirctl voice list # list named voice profiles
choirctl voice get <agent-id> # show current voice profile for agent
choirctl voice set <agent-id> <name> # switch voice profile for agent
Model and voice profile changes are hot-reloadable – the agent picks
up the new settings on its next LLM call or TTS invocation via the
heartbeat config sync (no restart required). <name>
refers to a named model or voice profile defined in
config.json. Overrides via model set or
voice set are transient: on agent restart, the agent
reverts to its defaults (or start-time
--llm/--voice-profile overrides).
Configuration (two-phase):
choirctl config load # read .choir.d/ into choird, validate, stage
choirctl config diff # show diff: staged vs running (see 17.6)
choirctl config apply # push hot-reloadable changes to running agents
choirctl config show # show current running config
Approvals:
choirctl approval list # list pending approvals
choirctl approval show <id> # show approval detail (what, who, when)
choirctl approval approve <id> # approve
choirctl approval reject <id> # reject
Workspace:
choirctl workspace list # list workspaces, paths, and current lease holder
choirctl workspace reset <workspace-name> # delete contents; stops leasing agent if running
Secrets:
choirctl secret list # list secret names (never values)
choirctl secret set <name> [secret] # set/update a secret value (arg or stdin)
choirctl secret delete <name> # delete a secret
choirctl secret apply # reload secrets.json and push to running agents
Secret values are never stored in config.json. Resources
in config.json reference secrets by name via
"secret" fields (e.g.
"secret": "openrouter-key"). Values live in
~/.choir.d/secrets.json (authoritative in v1), and running
agents are refreshed via choirctl secret apply.
choirctl config apply does not reload secret values.
Observability:
choirctl logs <agent-id> # snapshot current log + tail -f stream (see 22.2)
choirctl status # system overview: choird health, Postgres, agents
17.2 agent stop (Graceful Termination)
- choird sends a shutdown signal to the agent.
- Agent completes its current safe point (tool result commit, LLM call finish, or skill transition).
- Agent generates a session summary (LLM call, structured output).
- Agent flushes all unreplicated events and the session summary from its local commit log to choird via a final heartbeat.
- choird persists the session state (events, working memory snapshots, lane states, skill state, budget counters) to Postgres.
- choird chunks and embeds session events into
memory_documents(mid-term). If mid-term session count exceeds N, the oldest session is promoted to long-term (see section 11.8). - Agent calls
TERMINATE_SELFand exits. - Session is recoverable: a subsequent
agent startcan resume from the persisted session via the recovery handshake (INIT_HELLO-> snapshot- tail events).
17.3
agent build
- Reads the agent definition from
.choir.d/config.json. - Pulls/fetches the global repo into
.choir.d/repos/global/(checks out configured ref). - Pulls/fetches the per-agent repo into
.choir.d/repos/agents/<agent-id>/(checks out configured ref). - Builds the base image from
Dockerfile.basein the global repo (cached; only rebuilt if global repo HEAD changed since last build). - Merges artifacts: global tools + agent tools (agent overrides by
name), global skills + agent skills (same),
USER.mdfrom global,SOUL.mdfrom agent repo if present else global,SOUL-CORE.mdfrom agent repo if present else global. - Builds agent image from per-agent
Dockerfile(FROM choir-base): installs agent-specific system packages, compiles tool binaries from source directories undertools/. - Bakes merged tools into
/choir/tools/global/and/choir/tools/agent/, merged skills into/choir/skills/, identity into/choir/USER.md,/choir/SOUL.md,/choir/SOUL-CORE.md, and/choir/version.json(includes both repo commit SHAs) into the image. - Tags the image:
choir-agent-<id>:<git-short-hash>. - Stores in the local Docker image cache only. Remote image registries are out of scope for v1 – the per-agent git repos are the source of truth; images are rebuilt locally as needed.
- Does NOT start or restart the agent. The image is ready for use by
agent startoragent update.
17.4
agent update
- Builds a new image if the staged config or repo commits differ from
the running image (compares
version.jsoncommit SHAs; skips build if already up to date). - Gracefully stops the running agent (same as
agent stop– session persisted to Postgres). - Starts the agent with the new image.
- New container picks up both the new image contents (tools, skills,
identity) AND the latest hot-reloadable config via
INIT_HELLO. - Session resumes from persisted state if applicable.
17.5
agent update-all
- Builds new images for all agents whose staged config or repo commits differ from their current image (skips if already up to date).
- For each running agent: graceful stop (session persisted), start with new image (session resumed via recovery handshake). Agents are updated sequentially, not in parallel.
- For each stopped agent: image is built but the
agent is not started. The new image is ready for the next
agent start. - Reports per-agent results: built, redeployed, skipped, or failed.
17.6 config load / config diff /
config apply
config load reads and validates the
entire .choir.d/ directory atomically: – Parses
config.json. – Fetches latest commits from the global repo
and all per-agent repos into .choir.d/repos/. Reports any
fetch failures (unreachable remote, auth failure) as warnings (stale
local clones are usable but flagged). – Scans repo contents: validates
tool/skill JSON schemas, checks that Dockerfiles parse, verifies source
directories exist for compiled tools. – Reports errors. If any
validation fails, nothing is staged. – On success, choird holds the
staged config (including repo commit SHAs) in memory alongside the
current running config.
config diff compares staged vs running
and categorizes each change: – Hot-reloadable: secrets,
model/provider list, request templates, tunable defaults, feature flags.
– Restart-required: tool additions/removals/changes,
skill additions/removals/changes, /choir identity
content.
Example output:
~ global:tools/web_browse.json [restart-required] → use: choirctl agent update <agent-id>
+ agent-1:skills/code_review.json [restart-required] → use: choirctl agent build <agent-id>
~ agent-1:identity/SOUL.md [restart-required] → use: choirctl agent update <agent-id>
~ config.json: models.default [hot-reload] → included in: choirctl config apply
global repo: abc1234 -> def5678 [restart-required]
agent-1 repo: 111aaa -> 222bbb [restart-required]
config apply only publishes
hot-reloadable changes. It bumps config_version; agents
pick it up via heartbeat. Restart-required changes are surfaced by
config diff and applied via agent build /
agent update / agent update-all at the
operator’s discretion.
17.7 Atomicity Guarantees
Load atomicity: config load reads the
entire .choir.d/ directory as one snapshot. Either all
files validate and the snapshot is staged, or none of it is staged. No
partial staging.
Apply atomicity (per-agent): Each agent receives the
full config as a single versioned blob via
FETCH_DYNAMIC_CONFIG. The agent swaps its entire in-memory
config atomically (pointer swap behind a mutex). There is no state where
an agent runs with half-old, half-new config. The agent snapshots config
at the start of each tool execution and LLM call, so an in-flight
operation completes with the config it started with.
Apply atomicity (cross-agent): NOT atomic across multiple agents. Different agents poll at different heartbeat intervals, so they may run different config versions briefly. In v1 this is moot (single agent), but cross-agent consistency is eventual, not transactional.
Stop atomicity: agent stop guarantees
that all locally committed events are replicated to Postgres before the
process exits. If the agent crashes during shutdown before the final
flush, recovery resumes from the last acknowledged revision
(unreplicated tail is lost).
Update atomicity: agent update is
stop-then-start, not a rolling swap. There is a window where the agent
is down. The session is fully persisted before the old container exits
and fully restored after the new container starts. No events are
processed during the gap.
Failure during apply: If an agent fails to fetch the
new config (RPC error, timeout), it retries on the next heartbeat and
continues running with its current config. choird logs the discrepancy.
choirctl agent status shows each agent’s current
config_version so the operator can see who is behind.
17.8
agent init
- Creates a new agent entry in
config.jsonwith placeholder values:repo.url:""(must be filled in before build).repo.ref:"main".defaults.workspace:""(must reference an existing workspace).defaults.llm:""(must reference an existing model).defaults.voice_profile:""(optional, references a named voice profile).defaults.git_identity:""(must reference an existing identity).defaults.notion:""(optional).defaults.email:""(optional).defaults.dm:""(must reference an existing DM).
- Initializes a new per-agent git repo (bare) if a local repo path is configured, or prints instructions for creating the remote repo.
- Scaffolds the per-agent repo with skeleton files:
Dockerfile(minimalFROM choir-base).identity/SOUL.md(edge personality placeholder).identity/SOUL-CORE.md(core personality placeholder).tools/andskills/(empty directories with.gitkeep).
- Does NOT build an image or start the agent. The operator must fill
in placeholders, then run
agent buildandagent start.
18. Networking and Web Access
18.1 Network Policy
Arbitrary outbound web requests from container are allowed in v1 by design tradeoff (single-user, user-managed risk).
18.2 Browser Handling
To avoid shipping headless browser in container image: 1.
choir.web.browse is routed through host-side Playwright
worker via EXECUTE_HOST_TOOL RPC to choird. 2.
Each running agent gets an isolated Playwright browser context,
automatically created at agent start and destroyed at agent stop. 3. The
agent has exclusive read/write access to its own browser tabs. No agent
can access another agent’s browser context. 4. Browser contexts are not
configurable in config.json – they are ephemeral,
agent-scoped resources managed by the browser worker.
18.3 Direct API Integrations
API tools like Notion/TTS/Search/Email run in-container with
memory-only scoped secrets. TTS uses the agent’s bound voice profile
(voice ID, output format, voice settings) resolved from
config.json at runtime. Email uses SMTP for sending and
IMAP for receiving (v1 only supports these protocols; no API-based email
providers).
19. Gateway (Telegram)
19.1 Architecture
The gateway supports multiple Telegram bot instances, each with multiple DM conversations. choird owns all bot tokens (managed as secrets, never exposed to containers). The gateway module routes messages between Telegram DMs and the appropriate agent’s edge lane.
Named resources: – Gateways: Named
bot instances in config.json, each with a secret reference
for its bot token. – DMs: Named DM bindings in
config.json, each referencing a gateway (bot) and a
Telegram user ID. The set of configured DMs for a bot implicitly forms
that bot’s allowlist – messages from unconfigured user
IDs are silently ignored. – Admin DMs: DMs with
"admin": true have full choirctl-equivalent command access
across all agents and system resources. – Regular DMs:
DMs with "admin": false can only issue commands affecting
their bound agent.
Each agent gets exclusive access to its bound DM. One agent per DM at
a time. DM binding is established at agent start (see
section 6.4).
Channel capability requirements: Any gateway channel (current or future) must support: 1. Text messages: Send and receive plain text. 2. File transfer: Send and receive files (documents, archives, etc.). 3. Multimedia: Send voice messages, images, and videos.
Telegram satisfies all three natively.
19.2 Message Flow
Telegram -> Bot API (long-poll) -> choird gateway module
-> identify bot instance + user ID -> look up bound DM + agent
-> route to edge lane of bound agent (queued; see 7.1)
-> edge response -> choird gateway -> Telegram reply to DM
choird translates between Telegram message format and the internal
UserMsg event type. Messages from unauthorized users (no
matching DM config) are dropped.
No streaming in v1: All outbound messages are sent as complete blocks. One user message may trigger the sending of multiple response messages (e.g. a text response followed by a file upload), but each message is complete before sending. No incremental token-by-token delivery.
19.3 Gateway Commands
Regular DM commands (available to all configured DMs, scoped to bound agent):
| Command | Equivalent choirctl |
Notes |
|---|---|---|
/status |
choirctl agent status <bound-agent> |
Shows lane states, budgets, skill, uptime |
/stop |
choirctl agent stop <bound-agent> |
Graceful termination (see 17.2) |
/restart |
choirctl agent restart <bound-agent> |
Stop + start, same image |
/cores |
choirctl session cores <active-session> |
Lists active core jobs with name, state, step |
/cancel <name> |
choirctl session cancel <active-session> |
Cancels a core job by name |
/compact |
choirctl session compact <active-session> |
Trigger working memory compaction |
/events |
choirctl session events <active-session> |
Last N events |
/model |
choirctl model get <bound-agent> |
Show current LLM model |
/model llm <name> |
choirctl model set <bound-agent> --llm=<name> |
Switch text generation model |
/voice |
choirctl voice get <bound-agent> |
Show current voice profile |
/voice <name> |
choirctl voice set <bound-agent> <name> |
Switch voice profile |
/inject <message> |
(no choirctl equivalent) | Inject message into edge context at next safe point |
/approvals |
choirctl approval list |
Lists pending approvals for bound agent |
/approve [id] |
choirctl approval approve <id> |
Approve; scoped to bound agent’s approvals |
/reject [id] |
choirctl approval reject <id> |
Reject; scoped to bound agent’s approvals |
Regular DM commands always target the bound agent. No
[agent-id] argument is accepted.
Admin DM commands (in addition to all regular commands):
| Command | Equivalent choirctl |
Notes |
|---|---|---|
/start <agent-id> [key=value ...] |
choirctl agent start <agent-id> [flags] |
Override defaults; DM binding is the triggering DM if in allowlist |
/stop <agent-id> |
choirctl agent stop <agent-id> |
Stop any agent |
/restart <agent-id> |
choirctl agent restart <agent-id> |
Restart any agent |
/update [agent-id] |
choirctl agent update <agent-id> |
Build + stop + start with new image |
/update-all |
choirctl agent update-all |
Build + redeploy all agents |
/config load |
choirctl config load |
Stage config changes |
/config diff |
choirctl config diff |
Show staged vs running diff |
/config apply |
choirctl config apply |
Apply hot-reloadable changes |
/workspace list |
choirctl workspace list |
List workspaces and lease holders |
/workspace reset <name> |
choirctl workspace reset <name> |
Reset workspace |
/secret list |
choirctl secret list |
List secret names |
/agent list |
choirctl agent list |
List all agents and status |
/agent build <agent-id> |
choirctl agent build <agent-id> |
Build agent image |
/approvals |
choirctl approval list |
Lists ALL pending approvals |
/approve [id] |
choirctl approval approve <id> |
Approve any agent’s approval |
/reject [id] |
choirctl approval reject <id> |
Reject any agent’s approval |
Admin DMs accept <agent-id> arguments to target
any agent. When /approve or /reject is sent
without an ID and there is exactly one pending approval, it targets that
approval. If multiple are pending, the bot replies with a numbered list
and waits for selection.
19.4 DM Binding
DM-to-agent binding is established at agent start: 1.
Via gateway /start: The DM that sends the
command is the binding target, provided it is in the bot’s allowlist
(configured in dms). If the DM is already bound to another
agent, the start is rejected. 2. Via
choirctl agent start:
--dm=<name> is required. References a named DM from
config.json. 3. Default: Each agent has a
default DM in its defaults block. Used when no explicit
override is provided via choirctl.
The binding is exclusive: one agent per DM at a time. Released when the agent’s session ends (stop, crash, terminate).
19.5 Supported Message Types
Inbound (user -> agent): 1. Text
messages: Forwarded to edge lane as UserMsg
(queued; drained when edge returns to IDLE). 2. File/image
uploads: choird stores the file to
/workspace/.choirtmp/recv/, then forwards a
UserMsg to edge with the
.choirtmp/recv/<filename> path as an attachment
reference. 3. Audio messages: Optionally transcribed
(future), otherwise stored as file attachment in
.choirtmp/recv/.
Outbound (agent -> user): 1. Text
messages: Agent text responses sent as Telegram messages
(chunked at 4096 chars). Each message is a complete block. 2.
Files: Agent writes file to
.choirtmp/send/, tool result contains the path reference.
choird fetches from .choirtmp/send/ and sends as Telegram
document upload. 3. Voice messages: TTS tool writes
audio to .choirtmp/send/. choird sends as Telegram voice
message (OGG Opus format). 4. Images: Agent writes to
.choirtmp/send/. choird sends as Telegram photo message. 5.
Videos: Agent writes to .choirtmp/send/.
choird sends as Telegram video message.
choird cleans up files from .choirtmp/send/ after
successful delivery and from .choirtmp/recv/ after the
agent acknowledges receipt.
19.6 Approval UX
When REQUEST_APPROVAL arrives from the agent: 1. choird
sends a Telegram message to the agent’s bound DM with the proposal
summary and an inline keyboard: [Approve]
[Reject]. Callback data encodes approval_id.
2. On approve: choird sends “Approved: [summary]. Executing.” and
notifies the agent with status: approved. 3. On reject:
choird sends “Rejected: [summary].” and notifies the agent with
status: rejected. 4. Unanswered approvals timeout after 30
minutes (configurable in .choir.d/config.json). On timeout,
choird sends “Approval timed out: [summary]. Rejected automatically.”
and notifies the agent with status: rejected. 5. Multiple
pending approvals each get their own message. No batching – each is
independently approvable/rejectable. 6. /approve and
/reject commands work as fallback (reply to the approval
message) for cases where inline buttons don’t render. 7. Admin DMs can
approve/reject any agent’s approvals. Regular DMs can only
approve/reject their bound agent’s approvals.
19.7 Design Constraints
- One agent per DM. No multiplexing.
- choird rate-limits outbound messages to respect Telegram API limits.
- Long responses are chunked (Telegram 4096-char message limit).
- The gateway is a thin adapter; all routing logic lives in choird’s control plane, not in the gateway module.
- For
/update-all(admin only), the operator gets a progress summary as each agent is processed.
20. Self-Evolution Workflow
Agent can propose changes to its own per-agent repo, but cannot apply directly. The workspace does not contain a checkout of the agent’s repo by default – repo interaction only happens when the user explicitly prompts for it.
User-initiated self-evolution flow: 1. User asks the
agent to modify its own tools/skills/identity. 2. Agent clones its
per-agent repo into /workspace (on demand). 3. Agent makes
edits, commits to a proposal branch
(proposal/<session-id>/<description>) using its
git identity. 4. Agent submits proposal to choird via
REQUEST_APPROVAL with
request_type: "repo_change" and the branch ref. 5. On
approval: choird merges the proposal branch into the per-agent repo’s
configured ref (e.g. main), then triggers
agent build + agent update. 6. On rejection:
proposal branch is left for manual inspection (not auto-deleted). 7. The
workspace clone is ephemeral – cleaned up after the proposal is
submitted or on workspace reset.
Approval-only proposal flow (no repo clone needed):
1. Agent proposes a change via REQUEST_APPROVAL with a
description and diff/spec only (e.g. via choir.propose.tool
or choir.propose.skill). 2. On approval, choird clones the
per-agent repo, applies the change, commits with the agent’s git
identity, pushes, then triggers build + update. 3. The agent never
directly touches the repo in this path.
No hot self-update of current running image. No Docker socket mount.
21. Inference Provider
21.1 API Compatibility
v1 targets OpenAI-compatible chat completions API
only. All LLM inference goes through endpoints that implement
the OpenAI /v1/chat/completions schema. This includes: –
Native OpenAI API. – OpenRouter (first-class supported routing layer). –
Any self-hosted or third-party endpoint exposing the same schema (vLLM,
Ollama, Together, etc.).
Tool calling requirement: Every LLM model used by
choir must support the OpenAI tool calling interface (tools
parameter in the chat completions request, tool_calls in
the assistant response). Models that do not support structured tool
calling cannot be used – choir does not fall back to text-based tool
parsing (see section 8.2).
21.2 Model Configuration
Each named model in config.json supports the following
inference parameters: – temperature
(number | null): Sampling temperature. Set to
null for reasoning models that do not accept temperature
(e.g. o1, o3). Default: 0.7 if omitted. –
reasoning_effort
(string | null): Reasoning effort level for models that
support it (e.g. "low", "medium",
"high" for OpenAI o-series). Set to null or
omit for models that do not support reasoning effort. Passed as-is to
the API.
These are per-model defaults. They are included in the hot-reloadable config – changes take effect on the next LLM call without restart.
21.3 TTS Configuration
TTS is configured separately from LLM models, split into two layers:
TTS providers ("tts" in
config.json): Named provider configurations specifying the
API endpoint, TTS model ID, and secret reference. v1 supports ElevenLabs
only.
{
"provider": "elevenlabs",
"model_id": "eleven_multilingual_v2",
"endpoint": "https://api.elevenlabs.io/v1",
"secret": "elevenlabs-key"
}Voice profiles ("voice_profiles" in
config.json): Named voice configurations referencing a TTS
provider and specifying voice-specific settings:
| Field | Type | Description |
|---|---|---|
tts |
string | Reference to a named TTS provider |
voice_id |
string | ElevenLabs voice ID (from Get Voices endpoint) |
output_format |
string | Audio format: mp3_44100_128 (default),
opus_48000_128, pcm_16000, etc. |
voice_settings.stability |
number | Voice stability (0.0-1.0). Lower = more expressive, higher = more consistent |
voice_settings.similarity_boost |
number | Voice similarity (0.0-1.0). Higher = closer to original voice |
voice_settings.style |
number | Style exaggeration (0.0-1.0). 0 = minimal latency |
voice_settings.use_speaker_boost |
boolean | Boost speaker similarity. Increases latency |
voice_settings.speed |
number | Playback speed (0.5-2.0). 1.0 = normal |
Each agent references a voice profile via its
defaults.voice_profile. Voice profiles are shared resources
– multiple agents can use the same profile concurrently. Profile changes
are hot-reloadable (takes effect on the next
choir.tts.speak call). When choir.tts.speak
executes, it resolves the agent’s current voice profile, uses the
referenced TTS provider’s endpoint and credentials, and passes the voice
settings to the API.
For Telegram gateway, the output format should produce audio
compatible with Telegram voice messages (OGG Opus). The
opus_48000_* formats are recommended.
21.4 OpenRouter Implications
OpenRouter as a routing layer introduces specific constraints: – No reliable system-prefix KV caching across requests (OpenRouter is a routing layer, not a model provider). – Design for full recompute cost per call. – Keep system prompt small, stable, immutable. – Put all dynamic content after the system prompt. – Enforce behavior via state machine constraints, not verbose prompt prose.
22. Observability and Audit
22.1 Telemetry
Minimum required telemetry: 1. Session lifecycle events: session start/stop, lane transitions, container start/restart/kill. 2. LLM call boundaries: model, token counts, latency, lane origin. 3. Tool call traces: proposals, lock acquisition/release, execution results, durations. 4. Approval requests and outcomes: who requested, what was requested, approved/rejected, by whom. 5. Heartbeat revisions: replication progress, recovery operations, snapshot creation. 6. Config/secrets version changes: metadata only (version numbers, timestamps), never secret values.
All telemetry must include session ID and lane ID for correlation.
22.2 Logging
Logs are written to files in .choir.d/logs/, organized
by source: – choird.log: Global choird logs (startup,
gateway, control plane, Postgres operations, config changes). –
<agent-id>.log: Per-agent logs (agent lifecycle, LLM
calls, tool execution, heartbeat, errors).
Log format: Structured JSON lines. Each line includes timestamp, level, source (choird / agent-id), session ID (if applicable), lane (if applicable), and message.
Archiving: – Agent logs: Archived
when a session ends (graceful stop). The active log file is compressed
and moved to
.choir.d/logs/archive/<agent-id>/<timestamp>.log.gz.
– choird logs: Archived when the active log file
exceeds a configurable line threshold
(log_archive_threshold_lines in config.json,
default: 100000). Compressed and moved to
.choir.d/logs/archive/choird/<timestamp>.log.gz.
choirctl log access:
choirctl logs <agent-id> # snapshot current log + tail -f stream
choirctl logs choird # snapshot choird log + tail -f stream
choirctl logs prints the current log file contents
(snapshot), then continues streaming new lines as they are written (like
tail -f). Ctrl+C stops streaming. No filtering or search in
v1 – use standard tools (grep, jq) on the log
files directly.
23. Choird Data Model (Postgres)
23.0 Schema Isolation
Each agent gets its own Postgres schema and
role within a single centralized database (configured
in config.json under "postgres"). choird
manages these schemas and roles: 1. On first startup (or
agent init), choird creates a schema
choir_<agent_id> and a role
choir_<agent_id> with access limited to that schema.
2. Agents never connect to Postgres directly. All memory operations go
through choird via EXECUTE_HOST_TOOL (see section 8.7).
choird uses the per-agent role when executing queries on an agent’s
behalf, ensuring schema-level isolation without giving the agent
credentials. 3. Cross-agent memory reads (section 11.2) go through
choird’s EXECUTE_HOST_TOOL handler, which uses the admin
connection to query across schemas. Agents never directly access another
agent’s schema. 4. Control plane tables (sessions, events, snapshots,
approvals) live in a shared choir_control schema owned by
choird.
choird maintains a connection pool to Postgres. Per-agent roles are used for agent-initiated queries (via host tool delegation); the pool’s admin connection is used for control plane operations.
Database migrations are deferred in v1. choirctl should
support migration commands (choirctl db migrate,
choirctl db backup) in a future version.
23.1 Control Plane Tables (schema: choir_control)
agents: Agent definitions and metadata.
agent_id, name, image_version, created_at, config (jsonb)
sessions: Active and historical sessions.
session_id, agent_id, lease_id, status, started_at, ended_at,
resource_bindings (jsonb) -- { workspace, llm, voice_profile, git_identity, notion, email, dm }
session_events: Append-only true execution log (see section 11.3). Serves as both the authoritative event history and the crash recovery source.
id, session_id, rev, event_type, lane (edge/core),
payload (jsonb, includes hash refs), created_at
session_snapshots: Periodic state snapshots for bounded replay.
session_id, rev, snapshot (jsonb), created_at
pending_approvals: Queued approval requests.
id, session_id, agent_id, request_type, payload (jsonb),
status (pending/approved/rejected), created_at, resolved_at
Note: resource configuration lives in
.choir.d/config.json on the host filesystem, not in
Postgres. Postgres stores only runtime state (sessions, events, memory,
approvals).
23.2 Session-Derived Memory Tables (schema:
choir_<agent_id>, Tiers 2-3)
memory_documents: Chunked session events and long-term summaries.
id uuid PRIMARY KEY
session_id text NOT NULL
tier text NOT NULL -- 'mid_term', 'long_term_summary', 'long_term_detail'
chunk_index int NOT NULL
text text NOT NULL
tsv tsvector -- GIN indexed
created_at timestamptz NOT NULL
metadata jsonb -- skill, phase, topic tags
-- agent_id is implicit from schema name (choir_<agent_id>)memory_embeddings: Vector embeddings for session memory search.
document_id uuid REFERENCES memory_documents(id)
embedding vector(N) -- HNSW indexed; N from embedding model config (default 1536)
-- exists for mid_term and long_term_summary only (not long_term_detail)
23.3 Knowledge Tables (schema: choir_<agent_id>, Tier
4)
knowledge_documents: Agent-managed persistent knowledge.
id uuid PRIMARY KEY
key text -- optional dedup key (e.g. 'user.preference.theme')
text text NOT NULL
tsv tsvector -- GIN indexed
created_at timestamptz NOT NULL
updated_at timestamptz NOT NULL
metadata jsonb -- tags, source, category
-- agent_id is implicit from schema name (choir_<agent_id>)knowledge_embeddings: Vector embeddings for knowledge search.
document_id uuid REFERENCES knowledge_documents(id)
embedding vector(N) -- HNSW indexed23.4 Query Patterns
All queries are executed by choird using schema-qualified table names
(e.g. choir_<agent_id>.memory_documents). For
own-agent queries, choird uses the per-agent role; for cross-agent
reads, choird uses the admin connection.
Default semantic search (mid-term + long-term summaries):
SELECT d.*, e.embedding
FROM choir_<target_agent>.memory_documents d
JOIN choir_<target_agent>.memory_embeddings e ON d.id = e.document_id
WHERE d.tier IN ('mid_term', 'long_term_summary')
ORDER BY (1 - (e.embedding <=> $query_vec)) DESC
LIMIT 10;Long-term drill-down (full session detail, requires session_id):
SELECT d.*
FROM choir_<target_agent>.memory_documents d
WHERE d.session_id = $session_id
AND d.tier = 'long_term_detail'
ORDER BY d.chunk_index;Knowledge search:
SELECT d.*, e.embedding
FROM choir_<target_agent>.knowledge_documents d
JOIN choir_<target_agent>.knowledge_embeddings e ON d.id = e.document_id
ORDER BY (1 - (e.embedding <=> $query_vec)) DESC
LIMIT 10;All queries support hybrid scoring (vector + tsvector) when both a semantic query and text query are provided.
24. Implementation Phases
- Phase 1 – Control Plane Protocol: Transport
abstraction layer (ControlPlane interface + ControlPlaneHandler
interface), UDS implementation, container lifecycle, init/secret/config
handshake, heartbeat. Includes
choirctl init,choirctl agent init. - Phase 2 – Single-Lane Tool Loop: Structured tool calling, basic lock manager, tool execution pipeline, skill engine, built-in tools (fs, exec, search). Get deterministic base working.
- Phase 3 – Gateway & User Interface: Telegram
gateway (multi-bot, multi-DM, admin/regular permissions), DM binding,
gateway commands,
.choirtmp/file transfer, TTS tool, email tools (SMTP + IMAP). - Phase 4 – TCP/HTTP Transport: HTTP implementation of the same ControlPlane interface, TLS support, lease-token-in-header auth. Validate that all RPC verbs work identically over both transports.
- Phase 5 – Core Lane Async: Add core lane execution, injection/ cancel workflow, event streaming. Test heavily.
- Phase 6 – Crash Recovery: Heartbeat replication, ack protocol, snapshot creation, recovery handshake.
- Phase 7 – Memory Integration: Postgres/pgvector schema, per-agent schema isolation, embedding pipeline, hybrid search, memory compaction.
- Phase 8 – Self-Evolution & Hardening: Approval workflows, self-evolution pipeline (tool-builder/skill-builder skills, repo proposal flow), structured logging and archiving, observability instrumentation, security hardening.
