root@Ramesh:~$_
[ Back to blog ]

> Cage the agent. Ship the code.

42 views
aicoding-agentshuskytoolingspec-driven

You paste a task into Claude Code. Ten minutes later you have 400 lines across six files, three of them in folders that do not match your convention, two imports using relative paths you killed off last quarter, and a new util that duplicates one you already have. The code runs. The review hurts.

This is not a model problem. It is a constraints problem.

An agent will happily respect your architecture if you tell it the rules in a way the machine can verify. Telling it in a prompt is the weakest version of that. Telling it through tools that block bad code is the strongest. Most teams stop at the prompt and wonder why output drifts.

Here is the layered setup I run. It works with any agent (Claude Code, Codex, Cursor, Aider), and most of it is not language specific.

The layers

Think of it as a funnel. Each layer catches a different class of mistake before it reaches your branch.

LayerWhat it does
Type systemShapes and contracts.
LinterPatterns and complexity.
FormatterStyle.
Pre commit hooksThe gate before history.
Specs and codegenStructure the agent cannot invent around.
Feedback loopThe agent fixes its own output before you see it.

A prompt rule says "please do X". These layers say "you literally cannot commit if you do not do X". Different category of enforcement.

Layer 1: the type system is a contract

Strict types are the cheapest guardrail you will ever add. They pay for themselves in the first wrong call the agent makes.

TypeScript, tsconfig.json:

{
  "compilerOptions": {
    "strict": true,
    "noImplicitAny": true,
    "noUncheckedIndexedAccess": true,
    "exactOptionalPropertyTypes": true,
    "baseUrl": ".",
    "paths": {
      "@components/*": ["src/components/*"],
      "@lib/*": ["src/lib/*"]
    }
  }
}

Python, pyproject.toml with mypy:

[tool.mypy]
strict = true
disallow_untyped_defs = true
disallow_any_generics = true
warn_unused_ignores = true

Go gives you this for free. Rust gives you more than you asked for. Same idea either way: the compiler rejects sloppy output so the agent has to tighten it.

Path aliases matter more than people think. If the agent sees @lib/db once in an example, it stops writing ../../../lib/db. One config line, hundreds of cleaner imports.

Layer 2: the linter enforces patterns

A type system catches shape errors. A linter catches pattern errors. This is where you encode the architectural rules the agent keeps forgetting.

ESLint, the rules that actually move the needle for AI output:

export default {
  rules: {
    "max-lines-per-function": ["error", 60],
    "max-lines": ["error", 250],
    "complexity": ["error", 10],
    "no-restricted-imports": ["error", {
      "patterns": [{
        "group": ["../*"],
        "message": "Use @lib, @components, @hooks aliases."
      }]
    }],
    "react/function-component-definition": ["error", {
      "namedComponents": "arrow-function"
    }]
  }
};

Python, ruff.toml:

line-length = 100
select = ["E", "F", "I", "N", "UP", "B", "SIM", "C90"]
 
[mccabe]
max-complexity = 10
 
[pylint]
max-args = 5
max-statements = 40

Go, .golangci.yml:

linters:
  enable:
    - gocyclo
    - funlen
    - gocognit
    - revive
    - errcheck
linters-settings:
  funlen:
    lines: 80
    statements: 40
  gocyclo:
    min-complexity: 10

Notice the shape. Complexity cap, length cap, import rules. These three alone stop the "megafunction that does everything" output that agents love to produce when they get nervous.

Layer 3: formatter, non negotiable

Prettier, gofmt, rustfmt, ruff format, black. Pick one per language and wire it up. The agent stops spending tokens on indentation and quote style, you stop reading style noise in diffs. There is no debate here.

npx prettier --write .
cargo fmt
gofmt -w .
ruff format .

Layer 4: pre commit hooks, the gate

This is where most teams leave money on the table. You have a linter and types. You trust humans (and agents) to run them. They do not.

Husky for JavaScript and TypeScript projects:

npm install -D husky lint-staged
npx husky init

Then .husky/pre-commit:

npx lint-staged
npm run typecheck

And package.json:

{
  "lint-staged": {
    "*.{ts,tsx}": ["eslint --max-warnings=0", "prettier --write"],
    "*.{js,jsx,json,md}": ["prettier --write"]
  },
  "scripts": {
    "typecheck": "tsc --noEmit"
  }
}

Now the agent cannot land a commit that fails typecheck or lint. Not "should not". Cannot.

For polyglot repos or non JS stacks, use the pre-commit framework. It is a Python tool but it runs hooks for any language.

.pre-commit-config.yaml:

repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.5.0
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.10.0
    hooks:
      - id: mypy
        args: [--strict]
  - repo: https://github.com/golangci/golangci-lint
    rev: v1.59.0
    hooks:
      - id: golangci-lint
  - repo: local
    hooks:
      - id: cargo-clippy
        name: cargo clippy
        entry: cargo clippy -- -D warnings
        language: system
        types: [rust]
        pass_filenames: false

Install once with pre-commit install. The hook runs on every commit across every language in the repo.

Add a commit-msg hook while you are here. Agents write commit messages too, and they drift toward "update stuff" if you let them.

# .husky/commit-msg
npx --no -- commitlint --edit $1

Or a two line shell check for conventional commits:

# .husky/commit-msg
pattern='^(feat|fix|chore|docs|refactor|test|perf)(\(.+\))?: .{1,72}$'
grep -qE "$pattern" "$1" || {
  echo "Commit must be conventional (feat:, fix:, chore:, ...)."
  exit 1
}

A pre-push hook is where the expensive checks go. Unit tests, build, contract tests. Slower, but it runs way less often.

# .husky/pre-push
npm run build
npm test

Side effect the agent will notice: when a hook fails, it gets the error output and fixes the code on the next turn. You are not just blocking bad commits, you are teaching the agent what "done" looks like.

Layer 5: specs, the structure the agent cannot invent around

This is the one most teams skip, and it is the one that pays the most.

Write the spec first. Generate code from it. The agent fills in behavior, not structure.

API contracts, OpenAPI:

# openapi.yaml
paths:
  /users/{id}:
    get:
      operationId: getUser
      parameters:
        - name: id
          in: path
          required: true
          schema: { type: string, format: uuid }
      responses:
        "200":
          content:
            application/json:
              schema: { $ref: "#/components/schemas/User" }
components:
  schemas:
    User:
      type: object
      required: [id, email, createdAt]
      properties:
        id: { type: string, format: uuid }
        email: { type: string, format: email }
        createdAt: { type: string, format: date-time }

Then generate:

# TypeScript client + types
npx openapi-typescript openapi.yaml -o src/api/schema.ts
 
# Python server stubs
openapi-generator-cli generate -i openapi.yaml -g python-fastapi -o server/
 
# Go client
oapi-codegen -package api openapi.yaml > api/client.go

Now the agent writes the handler body. It does not invent the URL, the status codes, the payload shape, or the field names. Those came from the spec.

Data contracts, JSON Schema or Protobuf:

// order.proto
message Order {
  string id = 1;
  string customer_id = 2;
  repeated LineItem items = 3;
  Money total = 4;
  OrderStatus status = 5;
}

Run protoc or buf generate, get types in every language you ship. The agent cannot rename a field and get away with it, the generator will fight back.

Runtime validation at the edges, even in typed languages:

import { z } from "zod";
 
const CreateOrder = z.object({
  customerId: z.string().uuid(),
  items: z.array(z.object({
    sku: z.string(),
    qty: z.number().int().positive(),
  })).min(1),
});
 
export async function POST(req: Request) {
  const body = CreateOrder.parse(await req.json()); // throws on drift
  // ...
}

Python with pydantic, Go with go-playground/validator, Rust with validator or serde plus assertions. Pattern is the same. The schema is source of truth, the handler is a thin wrapper, the agent cannot get creative with shapes.

Frontend component contracts, props schemas:

type ButtonProps = {
  variant: "primary" | "secondary" | "ghost";
  size: "sm" | "md" | "lg";
  disabled?: boolean;
  onClick: () => void;
  children: React.ReactNode;
};
 
export const Button = (props: ButtonProps) => {
  // ...
};

Closed unions, not strings. The agent cannot pass variant="fancy" and have it compile.

Layer 6: the feedback loop

This is the multiplier.

Most agent harnesses already run typecheck and tests, but the loop only works if the signals are loud and fast. Three rules:

  1. Fail fast. tsc --incremental, cargo check, mypy --install-types cached. Anything over 20 seconds and the agent stops iterating.
  2. Fail specifically. One error with a file and line beats a summary. Agents read errors well when the shape is consistent.
  3. Fail early. Lint before test, typecheck before lint, format before all of it. Cheapest check first.

A Makefile or justfile as the single entry point:

.PHONY: check
check: fmt lint typecheck test
 
fmt:
	prettier --check .
	cargo fmt -- --check
	ruff format --check .
 
lint:
	eslint .
	cargo clippy -- -D warnings
	ruff check .
 
typecheck:
	tsc --noEmit
	mypy src
 
test:
	npm test -s
	cargo test --quiet
	pytest -q

Now make check is the contract. The agent runs it, reads failures, fixes, repeats. You never had to describe "what clean code means in this repo". The tools describe it.

Putting it together: what happens on a real task

You tell Claude Code "add a /orders/:id/cancel endpoint".

  1. It reads openapi.yaml and finds no cancel op. It adds one, with the right response shapes, because the existing ones showed the pattern.
  2. It runs npx openapi-typescript and the types regenerate.
  3. It writes the handler. tsc --noEmit fails once because it returned the wrong discriminated union variant. It fixes it.
  4. It writes a test. eslint rejects the file for exceeding complexity, it splits into two helpers.
  5. It commits. lint-staged formats. Husky runs typecheck. Commitlint rejects the message, it rewrites to feat(orders): add cancel endpoint. Commit lands.
  6. It pushes. Pre-push runs the full test suite. Green.

At no step did you tell the agent "follow our patterns". The tools did.

Honest tradeoffs

What you gain

  • Drift stops. The first failing commit teaches the agent the rule.
  • Review is about intent, not style. Everything mechanical is already enforced.
  • New contributors (human or AI) onboard via make check instead of a 40 page wiki.
  • The spec becomes the primary artifact. Code is downstream.

What it costs

  • Up front setup. A day or two to wire linters, hooks, specs, and codegen.
  • Slower first commit on a new repo. Hooks run, generators run, types build.
  • Every rule you add is a rule you have to maintain. Do not over constrain, the agent will thrash against bad rules.
  • Spec first development is a mindset shift. Some teammates will fight it.

The rule I have landed on

If a mistake is mechanical, a tool should catch it. If a tool can catch it, a hook should block it. If a hook blocks it, the agent will learn it.

Prompts are the weakest constraint. Types, linters, hooks, and specs are the strong ones. Stack them, and the agent starts looking a lot more like a disciplined junior and a lot less like a clever intern with commit access.

The model is not the bottleneck. Your guardrails are.