DIYClaw is a procedural prompting framework for building and deploying your own agentic systems. Define behavior through composable prompt contracts — not application code. If you want to know what your agents are actually doing, this is the only way to get there.
DIYClaw is not a codebase. It is a procedural prompting framework for building and deploying your own agentic systems. Composable, versioned prompt contracts define identity, execution logic, tool use, safety, failure handling, and self-extension. We recommend building with Claude Code — but the system you build should talk to any provider you choose: OpenAI, Anthropic, Ollama, and beyond. That's your call.
If you want to know what your agents are actually doing — not hope, not guess, know — procedural prompt contracts are the only way to get there. Code changes underneath you. Models change underneath you. Contracts are the invariant.
This site is a development tool. Configure prompt templates, assemble agent definitions, and download prompt packs ready to deploy on any runtime.
Every DIYClaw agent prompt is built from exactly four composable blocks. They stack in order. Each block has a single job. Together they form a complete behavior contract.
Identity, role, mission, and hard constraints. This is who the agent is and what it will never do regardless of what anyone asks. Hard constraints are non-negotiable — no fabricating results, no leaking secrets, no executing outside declared scope. This block never changes between runs.
Ships as: base_system.txt
The action loop. Each turn the agent picks exactly one action: respond with text or invoke tools. Never both. After every tool result it re-evaluates: did the tool succeed? Is the objective met? Should I stop? Mandatory stop criteria (step limit, time budget, token budget, no-progress detector) are checked every turn. On stop, it emits a structured stop reason — not a vague apology.
Ships as: execution.txt
Machine-readable runtime state. Provider, model, API version, budget numbers, allowed/denied tool lists, workspace scope, sandbox flags, config paths. No prose — just key-value pairs the agent reads to know what it's working with. This block changes per deployment but not per task.
Ships as: environment.txt
The concrete objective for this run. What to do, what "done" looks like (acceptance criteria), and any run-specific context. This is the only block that changes every invocation. Everything above it is stable infrastructure — the task block is the variable.
Injected at runtime per invocation
The blocks are designed to be independently versionable. Update your execution logic without touching identity. Swap providers without rewriting stop criteria. Add new safety rules without breaking task injection. This is what makes prompt contracts composable — and what makes them auditable.
The agentic coding space is moving fast. Here's where things stand and where DIYClaw fits in.
Anthropic's agentic coding CLI. Terminal-native, tool-using, reads and writes your codebase directly. The recommended builder for DIYClaw prompt packs — it understands contract structure and can assemble agents from specs. Also ships the Claude Agent SDK for building custom agents in code.
OpenAI's cloud-sandboxed coding agent. Runs tasks asynchronously in isolated containers. Strong at multi-file refactors. Pairs with ChatGPT for conversational coding. Different philosophy: cloud-first, sandboxed execution.
Google's open-source agentic CLI. Terminal-native like Claude Code, powered by Gemini 2.5 Pro. Integrates with Google ecosystem (Vertex AI, Cloud). Free tier with generous limits. Growing MCP tool support.
GitHub Copilot — IDE-embedded, agent mode expanding. Cursor / Windsurf — AI-native editors with inline agents. Aider — open-source, git-native, multi-model. Ollama — run models locally, zero cloud dependency. Every month there are more.
DIYClaw is not competing with any of these. It sits underneath them. These tools are runtimes — they execute agent behavior. DIYClaw defines what that behavior is. A DIYClaw prompt pack works whether your runtime is Claude Code, Codex, a custom Python script, or a Rust daemon. The contracts are portable. The runtime is your choice.
The operator. Manages infrastructure, deploys services, runs scheduled jobs, monitors health, enforces budgets. If something needs to be started, stopped, or kept alive — Gonff owns it.
Example skills: Docker control · cron scheduling · health checks · deploy pipelines · resource limits
The communicator. Handles external channels, messaging, notifications, and real-time data flow. Anything that involves talking to people, services, or streams — Munca owns it.
Example skills: Slack · email · webhooks · RSS · WebSocket feeds · notification routing
The integrator. Connects to APIs, normalizes data across providers, manages authentication flows, and keeps tool contracts consistent. External services — Apodemus owns it.
Example skills: HubSpot · Stripe · GitHub · database queries · OAuth flows · data transforms
These are starting points. Go to Prompts to see each agent's full contract, edit it, and download your pack.
Describe what your agent system needs to do. This assistant will ask about your requirements — data sources, tools, memory needs, deployment target — and build a customized prompt pack for you.
Examples: "I need agents that read PDFs and remember key facts" · "DevOps system that manages Docker and runs cron jobs" · "SaaS hub that syncs HubSpot, Stripe, and Slack"
I'll help you build a custom DIYClaw prompt pack. Let's start with the basics.
What kind of system are you building? Tell me what your agents need to do — the data they work with, the services they connect to, and what "done" looks like for a typical task.
This spec is written for agents to read. Not for humans to skim — for agents to execute. Each section is a standalone behavior contract. Adopt them incrementally or all at once. The agent doesn't care either way. It follows what's written.
Four design principles govern every contract:
Agents need memory. The Memory spec defines the full thermodynamic keystone memory contract. When you're ready, go to Prompts to configure each block, edit the agent roles, and download your complete prompt pack.
In the early era of Generative AI, "prompt engineering" was largely an artistic endeavor. Users handcrafted static blocks of text, hoping to cajole a model into the correct behavior. This approach is brittle — a prompt that works for one task fails when the context shifts slightly.
Procedural Prompting rejects the static string. A prompt is a compiled artifact — the output of a function that accepts state, logic, and data as arguments. We do not write prompts. We write the code that generates prompts.
The key insight: a procedural prompt is a recursive, self-referencing object. The same structure — typed slots, composable blocks, runtime injection — applies whether you're defining identity, execution logic, tool contracts, or memory policy. The object refers to itself at every level. Change the base and every layer above it recomputes. This is what makes it a control system, not a chat template.
At its most fundamental level, procedural prompting separates Structure (invariant constraints) from State (variable context). A procedural prompt is never stored as a raw string. It is stored as a template with typed slots.
"You are an expert {{ROLE}}. Your security clearance is {{LEVEL}}. Analyze the following log data: {{DATA}}. If you detect {{THREAT_TYPE}}, output the alert code {{CODE}}."
ROLE: Network Security Analyst LEVEL: Top Secret DATA: [Stream of raw firewall logs] THREAT_TYPE: SQL Injection CODE: CRITICAL_DB_BREACH
By treating the prompt as a template, we decouple the instruction from the data. This prevents instruction drift — where the model confuses the data it is processing with the commands it is supposed to follow.
If the template model explains the structure, mathematical functions explain the deterministic nature. A procedural prompt P is not text — it is a function of specific variables:
P = f( S, E, T, C ) Where: S (System Identity) — immutable core personality and constraints E (Environment) — machine-readable state of the world T (Tools) — capabilities available for this run C (Context/Task) — immediate input data or user query Therefore: Output = Model( f( S, E, T, C ) )
This equation reveals why static prompting fails. If E (Environment) changes — the server time shifts, an API key expires, a budget is consumed — but the static prompt stays the same, the output will be incorrect. Procedural prompting dynamically recalculates P every time any variable changes, keeping the model grounded in current reality.
The function f is self-referencing. The output of one prompt becomes the context variable C for the next. The environment E updates after each tool call. The tool set T can expand via self-build connectors. The system S can version itself. At every level, the same structure applies — slots, injection, recomputation. This recursive property is what makes procedural prompting a control system rather than a conversation.
Just as a programming function has a signature defining what it returns (int, string, void), a procedural prompt defines an output contract. We don't ask the model to "reply nicely." We program a constraint:
Constraint: Respond in valid JSON. Schema: { "thought_process": string, "action": "retry" | "abort" | "success", "payload": object }
This turns the probabilistic output of the LLM into a deterministic data structure that downstream code can parse without error.
Procedural prompts implement control flow inside the generation context:
The execution loop: Generate P₁ → Model → Output O₁ Check O₁ against acceptance criteria If incomplete: append O₁ to history, update Context(C), generate P₂
Just as a Car class inherits from Vehicle, procedural prompts use inheritance:
Patch a security hole in the base class and it instantly propagates to all agents without rewriting their individual prompts. This is the same principle that makes object-oriented software maintainable — and it's why prompt contracts scale where static prompts collapse.
Respond(text) or ToolCalls([{id,name,args}])max_stepsRespond(text) — final answerToolCalls([{id,name,args}]) — invocationsNever both. Runtime rejects mixed outputs.
read / write / authThis is where DIYClaw's security actually lives. Every agent declares which tools it can call in its prompt contract. The runtime — not the agent — enforces the boundary. When you build your runtime, you build this gate:
container-run, Munca gets send-message. Neither can call the other's tools.read tools are safer than write tools. auth tools are gated further. The runtime can enforce tiered permissions: an agent might read freely but need explicit approval for writes.The agent says what it wants to do. The runtime decides if it's allowed. This separation is the entire security model. If you skip building this gate, your prompt contracts are just suggestions.
Concise. Low tool use. Strict brevity.
Tool-heavy. Artifact-centric. Strict verification.
Diagnostics-first. Explicit risk + stop reasons.
prompt_pack=v1.0.0DIYClaw gives you four prompt layers: base system, execution, environment, and task. OpenClaw — the production heavyweight — adds layers on top for hostile real-world channels. If you're deploying agents that face untrusted users on Telegram, Discord, WhatsApp, or Slack, you'll eventually need these.
A final self-check that runs before every output or stop. The agent re-examines its own response: does it comply with policy? Do claims match tool evidence? Did it stay within budget? If any check fails, the output is blocked and rewritten — never sent raw.
DIYClaw's security contract covers the rules. The verifier is what enforces them at the gate.
DIYClaw treats all inbound as data. OpenClaw gates what the agent can do based on who's talking.
Same agent, same prompt stack — different output shape depending on who's reading.
Each messaging channel (Telegram, Discord, WhatsApp, email) has its own formatting constraints, rate limits, and trust model. OpenClaw adapts response length, format, and tool permissions per channel without rewriting the system prompt.
If your agents only talk to you or your team, you don't need this. If they face the public, you do.
398 real MCP tools from a production Evolve deployment across 65 servers. Every tool is available to the system. No single agent gets all of them. Each agent's prompt contract declares its subset, and the runtime enforces it (see §4-5 above). Click any server to see its tools, parameters, and documentation.
Loading from toolbox-index.json — …
Loading toolbox…
This is the memory contract your agents will use. It ships as memory.md in your prompt pack zip — the spec your agent reads to know how to remember, forget, and recall. It is also the blueprint for building the memory service itself, whether you run it locally or in a container.
Some memories become keystones — anchors that the rest of memory organizes around. A memory is promoted to keystone when it is frequently accessed, high-impact, or structurally central in the graph. Keystones are exempt from routine decay. They keep rich summaries and full provenance chains.
Promotion criteria: high recall frequency over a sliding window, high graph centrality, or explicit operator pin.
memory_id — unique identifierowner_agent_id — which agent owns this memorycreated_at, last_accessed_at — timestampsfidelity — 0..1, decays over timeimportance — 0..1, set at creationdecay_alpha — bounded decay rateaccess_count — total recallsconsolidation_depth — merge countstate — ACTIVE | FORGIVEN | ARCHIVEDembedding_ref — pointer to vectorgraph_refs[] — linked graph nodeskeystone — booleansource_trace_id — provenance linkartifact_refs[] — linked files/outputsquality_score — embedding qualityprivacy_scope — access controlfidelity using decay_alphaconsolidation_depth for merged outcomesOne-way transitions only. ACTIVE → FORGIVEN → ARCHIVED. Never backwards. This is thermodynamically correct — entropy does not reverse.
score = w1*relevance + w2*fidelity + w3*importance
+ w4*recency + w5*graph_centrality + w6*keystone_bonusFive operations. This is the interface your memory service exposes to agents.
remember(event, owner_agent_id, role, tags, quality_hint) — store a memoryrecall(query, owner_agent_id, scope, k, recency_bias) — retrieve ranked memoriesoffer(memory_id, from_agent, to_agent) — share memory between agentsdream(owner_agent_id, budget) — active consolidation: decay + compress + merge + prunestatus(owner_agent_id) — health, counts, fidelity distributionmax_active_memories — bound the hot setdecay_tick_interval — how often decay runsrehydration_budget_tokens — max tokens for rehydrated bundleskeystone_promotion_threshold — access/centrality gatearchival_pressure — how aggressively to move stale → archivedThis spec is designed to be built by an AI agent. Hand this page (or the memory.md from your prompt pack) to your builder agent and tell it to implement. The spec is self-contained enough that a capable agent can scaffold the entire service.
Simplest path. Good for single-agent, single-machine setups.
# Agent builds from memory.md spec mkdir memex && cd memex # Implement: server.py, engine.py, decay.py, schema.sql # Embedding: ONNX bge-small-en-v1.5 (384d, ~50MB) python server.py # → http://localhost:8765 # API: /remember, /recall, /offer, /dream, /status
Production path. Multi-agent, multi-service, persistent storage.
services: memex: build: ./memex ports: ["8765:8765"] volumes: ["memex-data:/data"] environment: EMBED_MODEL: bge-small-en-v1.5 DECAY_INTERVAL: 300 MAX_ACTIVE: 10000 agent: build: ./agent environment: MEMEX_URL: http://memex:8765 volumes: memex-data:
The graph layer is optional for v1. Start with vectors + decay + consolidation. Add graph edges when you need rehydration across long time horizons.
Kord Campbell
@CampbellKord
It's not every day you say to yourself, this changes everything. But today, this changes everything.
I just watched a project I'm working on update itself, then write a changelog, then commit its own code to Github, after starting a Docker container running its code.
July 2024
As LLMs transition from chat interfaces to agentic infrastructure, the method of interaction must evolve from "prompt engineering" — often characterized by superstition and vibes — to Procedural Prompting: a software engineering discipline where prompts are treated as dynamic functions rather than static text.
We do not write prompts. We write the code that generates prompts.
At its most fundamental level, procedural prompting works like Mad Libs. The narrative structure is fixed, but specific semantic slots are left open to fill. This formalizes the separation of Structure (invariant constraints) from State (variable context).
"In Mad Libs, the wrong word is funny. In programming, the wrong word is a bug.
In prompting, the wrong context is a hallucination."
your prompt, deconstructed
You are an expert {{ROLE}}. Your current security clearance is {{LEVEL}}. You must analyze the following log data: {{DATA}}. If you detect {{THREAT_TYPE}}, output the alert code {{CODE}}.
ROLE: Network Security Analyst LEVEL: Top Secret DATA: [Stream of raw firewall logs] THREAT_TYPE: SQL Injection CODE: CRITICAL_DB_BREACH
By decoupling instruction from data, we prevent instruction drift — where the model confuses the data it's processing with the commands it's supposed to follow. Each key term reduces the model's probability space from infinite possibility to specific output. Every slot is typed. Every injection is validated.
P = f( S, E, T, C ) → the prompt is a function, not a string Output = Model( f( S, E, T, C ) ) S = System Identity (who the agent is — immutable) E = Environment (runtime state — changes per deployment) T = Tools (capabilities — scoped per agent) C = Context/Task (the job — changes every run)
If E changes — server time shifts, an API key expires, a budget is consumed — but the static prompt stays the same, the output is wrong. Procedural prompting recalculates P every time any variable changes.
We don't ask the model to "reply nicely." We program a constraint:
Schema: { "action": "retry" | "abort" | "success", "payload": object }
Probabilistic output becomes a deterministic data structure.
Patch a security hole in the base class — it propagates to all agents instantly.
Treat Prompts Like Software With Procedural Prompting
~4 minutes. The full argument for why prompts are control systems.
AI architect. Three decades building at the edge of search, infrastructure, and AI. Created Grub — the first open-source distributed web crawler (2000, acquired by LookSmart 2004). Co-founded Loggly (2009, acquired by SolarWinds 2018). Built community and developer platforms at Splunk, Rackspace, Lucidworks, and FeatureBase. Built SlothAI at FeatureBase starting May 2023 — an agentic deterministic pipeline for async AI inferences, open-sourced and later run as MittaAI (now retired). Predated the mainstream agentic framework wave by 8–10 months. Currently ahead of the curve by 7–8 months. The gap is closing though.
Now building multiple products, including AI development tools. DIYClaw is the prompting control layer — the contracts that make agents deterministic, regardless of the runtime underneath. Output is increasing.