ClawFilters - Real-Time Behavioral Filters for AI Agents

Open Research

The question was published.
ClawFilters is the answer.

Two independent lines of research identified the same governance gap from different directions. ClawFilters is the working implementation of both.

Anthropic - The Claude Spec

"Imagine a disposition dial that goes from fully corrigible…to fully autonomous. Neither extreme is safe. AI agency is incrementally expanded the more trust is established."

Anthropic's model spec names the architecture: a dial between full human control and full agent autonomy. Current AI should sit toward the corrigible end. As trust is established through demonstrated behavior, the dial moves. The framework is clear. What was missing was the mechanism - something that holds the position, measures behavior, and moves the dial deliberately. That is ClawFilters.

Anthropic Model Spec →

Jouneaux & Cabot - arXiv:2511.02885

"The notion of Service Level Agreement (SLA) for AI agents is still largely open and would require new research efforts to tackle the properties that make AI agents unique."

Jouneaux and Cabot named OversightLevel as a first-class quality metric for AI agent services and proposed a formal DSL for expressing those commitments in machine-readable form. They solved the specification problem - how to say what you promise. ClawFilters solves the enforcement problem - how to guarantee it holds before any tool call executes. ClawFilters expresses its governance commitments in their exact JSON format, using their vocabulary.

arXiv:2511.02885 →

ClawFilters is where both lines of research become operational.

Compliant agents run unimpeded. ClawFilters scores every action in real time - if a recently promoted agent begins to drift, the filter catches it before the next tool call executes. No audit. No human required to notice first.

In my view, these two bodies of work describe the same architecture from different angles. I had been building toward this for some time. When I found them, they gave words to the vision - the philosophical frameworks of Anthropic and Jouneaux & Cabot, enforced at every action an AI agent takes.

The dial, expressed as tiers

Anthropic names the architecture. Jouneaux & Cabot name the metric. ClawFilters holds the position in real time.

1.0

QUARANTINE

Fully corrigible - maximum oversight Zero autonomous execution. All calls blocked or gated. No action without human approval.

0.75

PROBATION

High oversight, low autonomy Explicit allowlist only. Every out-of-allowlist call gated for human review.

0.50

RESIDENT

Balanced - earned operational trust Standard calls autonomous. Privileged categories gated. Behavioral scoring active.

0.25

CITIZEN

High autonomy, strong compliance record Broad autonomy within scope. Exceptional operations still gated.

0.10

AGENT

Near-autonomous - apex trust Full declared scope autonomous. Anomaly detection triggers a mandatory human gate before execution completes.

A tier is not a destination. An agent may hold a tier for days or months - as long as behavior stays within bounds. ClawFilters scores every action continuously. An agent's score below threshold triggers automatic demotion to QUARANTINE. Score sustained above threshold enables operator-confirmed promotion. OversightLevel is not declared. It is earned. Read the full Agent Autonomy SLA →

The Solution

Trust is earned.
Act out of bounds and you lose it.

Every agent starts at Quarantine with restricted privileges - no tools, no external access, no autonomy. Agents earn their way up through demonstrated behavior and human approval, one verified action at a time. And agents can lose it instantly. Demotion skips levels. An agent that misbehaves enough goes back to Quarantine, no matter how high it climbed.

Quarantine

All actions require human approval. Read-only tools only. Zero autonomous execution.

→

Probation

Internal tools allowed. External calls still gated. Write access requires approval.

→

Resident

Read/write autonomous. High-risk actions (financial, delete, new domains) still gated.

→

Citizen

Full autonomous operation. Anomaly-flagged actions require approval. Demonstrated reliability.

→

Agent

Full earned autonomy. Anomalies are advisory only - logged, not gating. Pre-authorized action profile. Trust fully earned.

ClawFilters measures and enforces. HITL gates promote. Promotion is sequential - every step requires an explicit human decision through the approval gate. Demotion is instant and can skip levels. A behavioral score below 50% triggers automatic demotion to Quarantine. Your data never leaves your network unless you authorize it.

How It Works

The Behavioral Filter.

ClawFilters' active scoring mechanism - evaluating every AI agent action in real time against five behavioral principles. The score moves with every call. It is the number a human reads when deciding whether an agent has earned the next tier. Not a report generated after the fact. Not a checkbox. A live measurement that drives the gate.

Human Control

Agents operate autonomously within defined boundaries. Destructive, irreversible, or trust-crossing actions require explicit human approval before execution.

Transparency

Every agent action is logged to a cryptographic audit chain. Users see what agents did, why, and what they plan to do next. Nothing is hidden.

Value Alignment

Agents act within their defined role. Behavioral baselines detect deviations. When uncertain, agents escalate to humans rather than assume.

Privacy

Data never crosses tenant boundaries. No telemetry, no cloud callbacks. All agent operations run on your own hardware - your data stays yours.

Security

Zero-trust architecture with cryptographic message signing between all agents. Nonce replay protection. Tamper-evident audit chain on every action.

Every principle is scored at runtime with measurable KPIs.

See It Work

Pick a tier. Pick a tool.
See it governed.

Every AI agent that talks to ClawFilters passes through an 8-step pipeline before any tool executes. Select a trust tier and a tool below to see the decision. Watch what happens to the behavioral score when an action is blocked.

Trust tiers define what an AI agent is allowed to do autonomously, what requires human approval, and what is blocked outright. Tiers are earned through demonstrated behavior and human authorization - never assigned at setup.

Agent Trust Tier

Tool / Action

Behavioral Score

1.00 Live Score

Submit a blocked action - watch the score drop.

1.00 - 0.75 Satisfactory

0.74 - 0.50 Warning

Below 0.50 Auto-demote to Quarantine

Below 0.25 Auto-suspend

The Problem

AI agents ship without a governance layer. That gap has consequences.

The leading AI agent frameworks have 337,000+ GitHub stars and no built-in governance layer. No mandatory oversight. No behavioral scoring. No trust tiers. API keys exposed at scale. Known supply-chain risks in installable plugins. Agents get capability by default - oversight has to be added deliberately.

0+

GitHub stars, ungoverned by default

Leading AI agent frameworks reached massive adoption before any governance layer existed. Every install is ungoverned by default.

0+

Exposed MCP instances

Live on the public internet with no authentication. Direct agent access to files, APIs, and execution environments. (Kaspersky, 2025)

0

Malicious skills found

Supply-chain attacks in installable agent plugins - credential theft, privilege escalation, silent data exfiltration baked in at install time.

1-Click

RCE exploit chain

CVE-2026-25253: one request steals auth tokens, disables safety guardrails, escapes the sandbox, and hands over full host control.

Guiding Layer

You provide direction.
ClawFilters provides enforcement.

★

You (Strategic Direction) Set policy, approve promotions, define boundaries

↓ HITL approval gates ↓

♦

ClawFilters (Deterministic Enforcement) Trust levels, behavioral scoring, anomaly detection, audit chain

↓ governed MCP proxy ↓

⚙

AI Agent (Earned Autonomy) Operates within earned trust level, never self-promotes

Enforcement that doesn’t depend on the model being right

ClawFilters doesn't just restrict AI agents - it governs them. You provide strategic direction. The platform provides deterministic enforcement that can’t be prompt-injected, hallucinated away, or bypassed by a clever instruction.

This is the difference: model-level guardrails can be prompt-injected. ClawFilters' enforcement is architectural. Even if an agent produces a malicious instruction, it cannot execute unless the agent's machine identity has the specific, time-scoped rights to perform that action.

8-step governance pipeline evaluated on every action
SHA-256 hash-chained cryptographic audit trail
Kill switch - instantly suspend any agent, all actions rejected
Behavioral scoring against five governance principles
Nonce replay protection on every request
Egress control - no unauthorized external calls

Verified, Not Just Claimed

Five levels of automated security testing

We don't just say it's secure. We run injection attacks, kill infrastructure mid-request, fuzz every API endpoint with 100,000+ generated payloads, and measure what happens.

0

API operations fuzz-tested

0

Generated test cases

0

Server errors under fuzzing

0

Lines of code scanned

0

High-severity findings

0

Test levels passed

0

Concurrent requests handled

0

Third-party data dependencies

Security · Chaos/Resilience · API Contract · Performance/Load · Static Analysis - all passing. Tested with Schemathesis, Bandit, and pip-audit.

In Action

See ClawFilters work

Real governance decisions. Real kill switches. Real human-in-the-loop approvals. Your agents, your rules.

Watch Now

Governance Block

An AI agent tries to call a blocked tool. ClawFilters evaluates the call at step 4 of the governance pipeline and rejects the action before it executes.

Watch Now

Kill Switch

One API call suspends any agent instantly. All subsequent actions are rejected at the governance gate - no re-entry until a human administrator reinstates.

Watch Now

HITL - Reject

A high-risk action triggers the human-in-the-loop gate. The reviewer rejects it. The agent receives a denial and cannot proceed.

Watch Now

HITL - Approve

A gated action awaits human review. The reviewer approves it. The agent proceeds with the confirmed action logged to the audit trail.

Full source and governance pipeline at github.com/QuietFireAI/ClawFilters.

Try the Live Demo →

The Promise

Keep your data where it belongs.

Every AI platform asks you to trust their cloud with your most sensitive data. ClawFilters doesn't. All AI processing runs on your hardware. All encryption keys are yours. Data only leaves your network when you explicitly allow it - and every outbound request is logged, governed, and auditable.

Attorney-Client Privilege Preserved

Client communications, case strategy, and work product stay on your infrastructure. No cloud provider can be subpoenaed for data they never received.

Patient Data Protected

Patient health information is encrypted, de-identified using all 18 HIPAA Safe Harbor identifiers, and never transmitted without explicit authorization.

Your Hardware, Your AI

All AI processing runs on your own machines via Ollama for local inference. No OpenAI. No Google. No data sent to third-party services. Your information physically stays on your hardware - your data stays where it belongs - unless you choose otherwise.

Open Source, Enterprise-Grade

The same security stack that clears regulated industry audits runs on your home server. Every line of code is public. Every claim is verifiable. Open source under Apache 2.0 - free for any use, personal or commercial.

What You Get

Contract-ready documentation, out of the box

The documentation standard was set by regulated industries. You get all of it by default - whether you're a law firm or a pizza shop. Every deployment includes the evidence packages your customers, partners, and IT teams will ask for. Audit reports, compliance mappings, disaster recovery, shared responsibility. The bar is high. You clear it automatically.

SOC 2

SOC 2 Type I Report

The security audit report enterprise customers and enterprise procurement teams require before signing. 64 controls across 5 Trust Service Criteria with management assertion and evidence mapping.

DPA

Data Processing Agreement

The contract your legal team needs before any customer data touches an AI system. Required under GDPR, HIPAA, and most enterprise vendor agreements. 13-section template with 3 annexes, ready to fill in and sign.

PEN

Pen Test Preparation

The package you hand to a security firm before they test your system. Saves days of scoping work. Attack surface inventory of 162 endpoints, OWASP Top 10 mapping, scoped test plan for third-party assessors.

DR

Disaster Recovery

Proof that your system can survive and recover from failure - required by most regulated industries and enterprise security reviews. RPO=24hr (data loss window), RTO=15min (recovery time) - both verified by automated test script.

SRM

Shared Responsibility Matrix

"Who is responsible for what?" - the first question every auditor and customer legal team asks. 12-domain table that answers it clearly: what ClawFilters handles, what you handle, and what you configure together.

HA

High Availability Architecture

Docker Swarm and Kubernetes deployment paths with component HA strategies and data replication matrix.

Self-Hosted Stack

Everything runs on your hardware

No SaaS dependencies. No OpenAI, Google cloud or external API calls for core functionality. Your local VRAM, your residential IP, your data sovereignty.

Py

FastAPI

Pg

PostgreSQL

Rd

Redis

Ol

Ollama

Tk

Traefik

Cl

Celery

Mq

MQTT

Pm

Prometheus

Gf

Grafana

Dk

Docker

Strong enough for a law firm.
Made for you and me.

Because everybody deserves the best.

This is how you start.

Whether you're a solo user with a spare PC or a firm with a server rack, the steps are the same. Three commands and you're running.

1

Clone from GitHub

ClawFilters is live on GitHub under Apache 2.0. No sign-up, no waitlist - just clone and go. Full deployment guide here →

→

2

Install on your machine

A computer, a NAS, a mini-PC in a closet. ClawFilters runs wherever Docker runs. The installer downloads everything you need, including your local AI model via Ollama.

→

3

You're in control

Your AI agents start at Quarantine with restricted privileges. You decide when they earn more. Every action is logged, every decision is yours. That's it.

Get notified of releases, security advisories, and project updates.

No spam. We’ll reach out when milestones hit - nothing else.

Common Questions

FAQ

What does "Control Your Claw" mean?

"Claw" refers to AI agents that take actions on your behalf - reading files, calling APIs, executing code, sending messages. These agents are powerful, but without governance they're a security crisis. ClawFilters acts as a governed MCP proxy: your AI agent connects to ClawFilters, and every action is evaluated against trust levels, behavioral scoring, anomaly detection, and approval gates before execution. You control the claw. It doesn't control you.

How do trust levels work?

Every AI agent starts at Quarantine with restricted privileges. Promotion to Probation, Resident, Citizen, and Agent requires explicit human approval and demonstrated behavioral compliance. Demotion is instant and can skip levels - any agent whose behavioral score drops below 50% is automatically demoted to Quarantine. The fifth tier, Agent, represents full earned autonomy: anomalies are advisory only, not gating. Trust is earned sequentially and revoked immediately at any level.

Does any client data leave my network?

No. ClawFilters ships with Ollama - a local AI model runner that operates entirely on your hardware. Your AI inference never touches OpenAI, Anthropic, Google, or any cloud LLM service. Ollama handles all local inference so your data stays where it belongs. You do not need a cloud API key, a cloud account, or an internet connection once the initial setup is complete. No prompt you send, no data your AI agents process, and no governance decision ever leaves your network. Your encryption keys, your data, your infrastructure. We cannot access your data even if we wanted to.

What compliance frameworks does ClawFilters support?

SOC 2 Type I (64 controls documented), HIPAA/HITECH (full Security Rule mapping), HITRUST CSF (12 domains), CJIS, GDPR, PCI DSS, ABA Model Rules, and FRCP Rule 37(e) for legal hold. Every control maps to a source file and a passing test.

What happens if an agent goes rogue?

ClawFilters has a kill switch. One API call suspends any AI agent instance immediately. All actions are rejected at step 2 of the governance pipeline - before trust levels, before behavioral scoring, before everything. The agent cannot reinstate itself. Only a human administrator can restore it after review.

How is this different from ChatGPT Enterprise or Microsoft Copilot?

Those products send your data to their clouds and give agents broad autonomy by default. ClawFilters does neither. Your data physically cannot leave your network. And every AI agent starts at Quarantine with restricted privileges, earning trust through demonstrated behavior. For anyone handling sensitive data - business records, client communications, personal information - both of those distinctions are the entire point.

Can I deploy this on my own hardware?

Yes. ClawFilters is designed for self-hosted deployment via Docker Compose. It runs on a NAS, a rack server, or a VM. Your local VRAM for inference via Ollama, your residential IP for network identity. No cloud account required.

Do I need to be technical to use this?

You'll need basic comfort with installing software. If you've ever set up a home media server, installed an app on a NAS, or followed a step-by-step guide to set up a router, you can run ClawFilters. We're building plain-language setup guides and a guided installer to make this as approachable as possible. The same platform that clears regulated industry audits will run on your home server - and we want both audiences to succeed.

Is ClawFilters free?

Yes. ClawFilters is open source under the Apache License 2.0. The full codebase - every security rule, every governance engine, every audit mechanism - is public. Use it for any purpose: personal, commercial, production, research. No paywalls, no commercial license required. Enterprise support and consulting are available through Quietfire AI.

What's on the roadmap?

The current release is the governance engine: trust tiers, behavioral scoring, kill switch, HITL approval gates, cryptographic audit trail, and the full API. What's next is the interface that makes it approachable without reading API docs. The first build sprint after launch focuses on: a browser-based AI agent dashboard (trust level, behavioral score, violation history, and recent actions in one view), demotion explanation cards (when a score drops, you see exactly which actions caused it and which principle was violated), a guided agent registration flow, and a read-only audit log viewer. The API already exposes everything needed for all of it. The governance engine is done - the dashboard catches up next.

Stay in the loop.

Open source under Apache 2.0. Self-hosted, free for any use. Drop your email and we’ll reach out when something worth knowing happens - major releases, security advisories, what’s next.

Real-Time BehavioralFilters forAI Agents.

The question was published.ClawFilters is the answer.

The dial, expressed as tiers

Trust is earned.Act out of bounds and you lose it.

Quarantine

Probation

Resident

Citizen

Agent

The Behavioral Filter.

Human Control

Transparency ⓘ

Value Alignment

Privacy

Security

Pick a tier. Pick a tool.See it governed.

AI agents ship without a governance layer. That gap has consequences.

You provide direction.ClawFilters provides enforcement.

Enforcement that doesn’t depend on the model being right

Five levels of automated security testing

See ClawFilters work

Governance Block

Kill Switch

HITL - Reject

HITL - Approve

Keep your data where it belongs.

Attorney-Client Privilege Preserved

Patient Data Protected ⓘ

Your Hardware, Your AI ⓘ

Open Source, Enterprise-Grade ⓘ

Contract-ready documentation, out of the box

SOC 2 Type I Report

Data Processing Agreement

Pen Test Preparation

Disaster Recovery

Shared Responsibility Matrix

High Availability Architecture

Everything runs on your hardware

This is how you start.

Clone from GitHub

Install on your machine

You're in control

FAQ

Stay in the loop.

Real-Time Behavioral
Filters for
AI Agents.

The question was published.
ClawFilters is the answer.

Trust is earned.
Act out of bounds and you lose it.

Transparency

Pick a tier. Pick a tool.
See it governed.

You provide direction.
ClawFilters provides enforcement.

Patient Data Protected

Your Hardware, Your AI

Open Source, Enterprise-Grade