AIR Blackbox - The Flight Recorder for Autonomous AI Agents

Q: What is AIR Blackbox?

AIR Blackbox is an open-source EU AI Act compliance scanner for Python AI agents. It runs 51 automated checks across 6 EU AI Act articles (9, 10, 11, 12, 14, 15) plus GDPR and bias/fairness scanning. Version 1.12.0 adds the A2A Transaction Layer for signed, tamper-evident agent-to-agent compliance auditing, plus ML-DSA-65 quantum-safe signing, self-verifying evidence bundles, and hybrid static + runtime analysis. It supports 7 frameworks: LangChain, CrewAI, OpenAI Agents SDK, Google ADK, Claude Agent SDK, AutoGen, and Haystack.

Q: What EU AI Act articles does it check?

AIR Blackbox checks 6 articles: Article 9 (Risk Management), Article 10 (Data Governance), Article 11 (Technical Documentation), Article 12 (Record-Keeping), Article 14 (Human Oversight), and Article 15 (Accuracy & Security). Each check is classified as static (verifiable from source code) or runtime (requires gateway/trust layer).

Q: How does two-tier scoring work?

Two-tier scoring separates the 51 checks into two categories: 44 Static checks analyze code patterns, documentation, and configuration that can be verified from source code alone. 7 Runtime checks require a running gateway or trust layer to verify. This gives teams a realistic compliance score even without the full gateway deployed.

Q: When is the EU AI Act deadline?

Under the Digital Omnibus political agreement of May 2026, high-risk obligations for standalone Annex III systems, which include employment and recruiting AI, are deferred to December 2, 2027, and AI embedded in regulated products under Annex I to August 2, 2028. These dates take legal effect once the Omnibus is formally adopted and published, expected before August 2, 2026. Article 50 transparency obligations remain on the original schedule and still apply from August 2, 2026. Prohibited AI practices have been enforced since February 2, 2025, and GPAI model obligations since August 2, 2025. Penalties can reach 35 million euros or 7% of global annual turnover.

Q: What frameworks does AIR Blackbox support?

AIR Blackbox has trust layer integrations for 7 frameworks: LangChain, CrewAI, OpenAI Agents SDK, Google ADK, Claude Agent SDK, AutoGen, and Haystack - plus a standalone air-openai-trust SDK. 4 PyPI packages: air-blackbox, air-trust, air-openai-trust, and air-blackbox-mcp. The compliance scanner works on any Python AI code regardless of framework. There's also an MCP server for Claude Desktop and Cursor integration.

Q: How does it compare to Credo AI, Holistic AI, or OneTrust?

Enterprise AI governance platforms typically cost $50,000+/year and require sending code to their cloud. AIR Blackbox is free, open source (Apache 2.0), and runs 100% locally. It focuses specifically on EU AI Act technical requirements for Python AI agents.

Q: Is AIR Blackbox free?

The core scanner and all PyPI packages are 100% free and open source under the Apache 2.0 license. For teams that need managed infrastructure, we offer a Pro tier ($299/mo managed VPS) and an Enterprise tier (custom pricing, air-gapped deployment) with the same gap analysis engine plus dedicated infrastructure, fine-tuned models, and support.

Q: What is the HMAC-SHA256 audit chain?

Every AI action logged through the AIR gateway or trust layers is written as a tamper-evident .air.json record. Each record is linked to the previous one via HMAC-SHA256 cryptographic hashes - creating a blockchain-style chain without the blockchain. If anyone modifies a record after the fact, the hash chain breaks and the tampering is detectable.

The Real Problem

Nobody remembers why the AI did that.

No audit trail. No replay. No evidence. When something goes wrong with an AI agent - and it will - most teams discover they have nothing to show a regulator, a client, or a board.

🧠

"Why did the AI recommend that?" - and nobody knows

Teams make faster decisions with weaker memory. No one remembers what the AI suggested vs. what the human chose, or what assumptions were true at the time. AIR Blackbox captures the logic path behind every AI-assisted decision.

⚡

The AI handled something it should have handed off

AI automates support, operations, and decisions - but failures happen when something that should have been escalated stays automated. AIR trust layers detect when AI output requires human judgment and route accordingly.

📉

Your AI codebase diverged from policy. Somewhere.

As teams use AI to move faster, they slowly accumulate undocumented process changes, inconsistent standards, and broken assumptions. AIR scans your codebase on every commit to detect drift before it becomes an incident.

✅

Your auditor wants proof a human actually reviewed this

When everything can be AI-generated, the premium shifts to verified human review. AIR trust layers create cryptographic proof that a human reviewed, approved, and signed off on AI-assisted output.

🛡️

Your AI made a decision. Who's legally on the hook?

People use AI for taxes, contracts, decisions, and healthcare without understanding what they're personally on the hook for. AIR compliance reports tell you exactly where your AI creates real-world legal exposure.

🔍

The rough draft became the policy. Nobody caught it.

AI flattens context aggressively - a rough draft becomes a policy, an internal brainstorm becomes customer-facing copy. AIR audit chains preserve the boundary between draft and final, speculative and approved.

Why AIR Blackbox

Security tools filter threats. We build the record of everything.

Arthur AI, Lasso, and Lakera protect the input. When regulators audit the output - who said what, who approved it, whether anyone tampered with the log - they have nothing. AIR Blackbox does.

VERIFY

Cryptographic Proof

Every AI call generates an HMAC-SHA256 tamper-evident record, signed by default with Ed25519 and optionally with ML-DSA-65 (FIPS 204) post-quantum signatures. Modify any record and the chain breaks.

FILTER

PII & Injection Scanning

Automatically detect personal data leaking into prompts and prompt injection attempts - before they reach the model. Real-time, inside the call.

STABILIZE

Drift Detection in CI/CD

51 compliance checks run on every commit. Catch when your AI codebase drifts from EU AI Act, GDPR, ISO 42001, or your own policies - before it ships.

PROTECT

Human Oversight Attestation

Art. 14 delegation logging proves a human authorized AI-assisted actions. Decision lineage that shows who approved what, when, and why.

The Four Layers: Verify. Filter. Stabilize. Protect.
Every piece of AIR Blackbox maps to one of these four functions. Together they form a complete audit infrastructure for AI systems.

How It Works

One line change. Complete coverage.

Point your app at the gateway instead of the provider. That's it. Everything else is automatic.

Your App Sends a Request

Change your base URL from api.openai.com to localhost:8080. Same SDK. Same code. Same everything.

Gateway Authenticates & Proxies

The gateway checks your gateway key, forwards the request to the upstream provider, and streams the response back in real-time. Sub-millisecond overhead.

AIR Record Created (Background)

A tamper-evident .air.json record is written asynchronously. Contains: request, response, model, tokens, timestamp, run ID. Never blocks your response.

Secrets Vault-Encrypted

API keys and auth headers are stripped from the AIR record and encrypted separately. Even if someone gets the audit file, they can't extract credentials.

Features

Built for production. Not a weekend hack.

Security

Gateway-Level Auth

X-Gateway-Key header authentication. Your upstream API keys never leave the server. Developers hit the gateway, not the provider.

Security

Encrypted Vault

Secrets are AES-encrypted and stored in a separate vault (local or S3-compatible). AIR records contain zero plaintext credentials.

Performance

Non-Blocking Writes

Audit records write in background goroutines. Vault writes are async. Your response latency is the provider's latency. Period.

Performance

SSE Streaming

Full support for streaming responses (Server-Sent Events). Tokens stream to your app in real-time while the gateway records the complete response.

Compliance

Tamper-Evident Records

Each AIR record includes cryptographic hashes. If anyone modifies a record after the fact, the hash breaks. Provable integrity.

Compliance

Replay & Diff

Replay recorded requests against current models. Compare outputs. Detect when a model update changed behavior. Regression testing for AI.

Trust Layer

Cryptographic Audit Chain

Every AIR record is linked via HMAC-SHA256 into a tamper-proof chain. Modify any record and the chain breaks. Blockchain-grade integrity without the blockchain.

Trust Layer

ML-DSA-65 Quantum-Safe Signing

Evidence is signed with FIPS 204 ML-DSA-65 (Dilithium3) - a post-quantum digital signature algorithm. Signatures remain secure even against future quantum computers. Keys are generated locally and never leave your machine.

Trust Layer

Self-Verifying Evidence Bundles

One command generates a .air-evidence ZIP containing the audit chain, scan results, a SHA-256 manifest, and a standalone verify.py script. The auditor extracts it and runs verify.py with the signing key on plain Python, no install needed, and gets pass or fail.

Developer Experience

Docker Compose Ready

One command to run. No MinIO dependency required. Works with local filesystem or S3-compatible storage. Your choice.

Developer Experience

Provider Agnostic

Works with any OpenAI-compatible API. OpenAI, Anthropic (via proxy), Azure OpenAI, local models, custom endpoints. Same format.

Security

Prompt Injection Detection

20 weighted patterns across 5 attack categories: role override, delimiter injection, privilege escalation, data exfiltration, and jailbreak. Configurable sensitivity and auto-blocking.

Compliance

GDPR Scanner

8 automated checks: consent management, data minimization, right to erasure, retention policies, cross-border transfer, DPIA patterns, processing records, and breach notification.

Compliance

Bias & Fairness Scanner

6 checks for fairness metrics, bias detection libraries, protected attribute handling, dataset balance, model card bias documentation, and output bias monitoring.

Compliance

ISO 42001 + NIST AI RMF + Colorado SB 24-205

Maps every scan result to EU AI Act, ISO/IEC 42001:2023, NIST AI RMF, and Colorado SB 24-205. One scan, four compliance frameworks. Export as markdown or JSON.

Trust Layer

A2A Compliance Protocol

Agent-to-Agent verification: compliance cards, peer verification gates, and HMAC-signed handshakes. Agents prove their compliance posture before communicating.

Developer Experience

Pre-Commit Hooks

Block non-compliant code before it merges. Four configs: basic, strict, GDPR, and full. Integrates with the pre-commit framework in one line of YAML.

Developer Experience

Feedback Loop

Correct false positives and they flow into training data for the fine-tuned model. The scanner gets smarter with every correction your team makes.

Article 12 Compliance Layer

One chain. One signer. One bundle. Provable traceability for the EU AI Act.

Article 12 requires high-risk AI systems to automatically record events over the system's lifetime, with traceability appropriate to their purpose, and Article 19 requires keeping those logs for at least six months. Tamper-evidence is not named in the text, it is how you prove that traceability and exceed the regulatory floor. For Annex III high-risk systems like hiring and lending AI, the logs must also record who verified each result, which is exactly the human-oversight record AIR Blackbox captures.

HMAC-SHA256 Audit Chain

Every agent action gets a chained hash. Each record links to the previous one. Tamper with one record and every record after it breaks. Formally specified in audit-chain-v1.md (RFC 2119).

Ed25519 or ML-DSA-65 Signing

The chain head is signed for third-party verification, with Ed25519 by default and FIPS 204 ML-DSA-65 post-quantum signing available as an option. Keys are generated locally and never leave your machine. Proves who signed, when, and that nothing was altered.

Evidence Bundle

Everything gets packaged into a self-verifying .air-evidence ZIP. Auditor extracts it, runs python verify.py - gets PASS/FAIL in 2 seconds. No pip install needed.

Static + Runtime Scanner

The Article 12 scanner detects logging infrastructure, tamper-evident patterns, and retention config in your codebase. Combines static analysis with runtime chain verification for hybrid coverage.

Who It's For

Any team where AI makes decisions that matter.

Healthcare

Your AI recommended a treatment. The chart says the AI did it. Your log says nothing.

When your AI suggests a diagnosis, regulators want the decision lineage: what was asked, what was returned, who reviewed it, and what was overridden. AIR captures that entire chain.

Financial Services

The model flagged a trade. Three people acted on it. None of that is in the audit trail.

Trading desks and advisory platforms need to prove what the model said, who approved it, and whether it should have been escalated to a human. AIR provides decision traceability and escalation intelligence.

Legal

Your associate used AI to draft the brief. Can you prove a human reviewed every line?

Law firms using AI for contract review and brief drafting need to prove a human actually reviewed the output - not just rubber-stamped it. AIR trust layers create cryptographic human oversight attestation.

Enterprise AI Teams

Your AI agents touched 40 workflows last quarter. How many drifted from policy?

AIR scans on every commit, detects where AI usage diverges from your standards, and blocks violations before they ship. Compliance drift is invisible until it becomes an incident.

Independent Validation

The market is confirming this category.

Three independent signals - academic research, analyst coverage, and market data - all point to the same conclusion.

RESEARCH

AEGIS (arXiv, March 2026)

Academic researchers independently published the same interception-layer architecture for AI agent governance - pre-execution firewalls with tamper-evident audit chains. When academia converges on your approach, it validates the thesis. Read the paper

ANALYST

McKinsey: State of AI Trust in 2026

McKinsey's 2026 report identifies trust infrastructure as critical for the agentic AI era. The shift from model capabilities to operational trust systems is now a named category. McKinsey report

DATA

28% of US Firms Have Zero AI Trust

AnalyticsWeek reports that 28% of US organizations have "zero confidence" in the data quality feeding their LLMs. They call it the "Truth Layer Crisis." That crisis is what AIR Blackbox solves. Read the report

Capability	Security Players (Arthur, Lasso, Lakera)	AIR Blackbox
Filter - PII, injection, toxicity	✅	✅
Verify - tamper-evident decision traceability	❌	✅
Stabilize - compliance drift detection in CI/CD	❌	✅
Protect - human oversight attestation	❌	✅

Pricing

Open source core. Scale when you're ready.

Same gap analysis engine at every tier. Enterprise gets air-gapped isolation - zero data leaves your network.

1,012

PyPI Downloads (30d)

Gap Analysis Checks

EU AI Act Articles

PyPI Packages

Free

forever

✓ CLI scanner -- 51 gap analysis checks

✓ AI-powered deep scan (local Ollama)

✓ PDF gap analysis reports

✓ AI-BOM generation (CycloneDX)

✓ HMAC-SHA256 audit chain

✓ All framework trust layers

↗ Anonymized telemetry improves model

pip install air-blackbox

Popular

Pro

$299/mo

hosted VPS - we manage it

✓ Everything in Free

★ Managed VPS deployment

★ Fine-tuned compliance model

★ Private telemetry (your data stays)

★ Benchmarking dashboard

★ Priority model updates

★ Jaeger trace dashboard

Get Started →

Enterprise

Custom

air-gapped - your infrastructure

✓ Everything in Pro

🔒 Air-gapped VPS - zero data leaves

🔒 On-prem / private cloud deploy

🔒 Model baked into Docker image

🔒 SOC 2 + ISO 27001 mapping

🔒 Dedicated support + SLA

🔒 Custom compliance frameworks

Contact Sales →

Deployment

How it gets into your infrastructure.

Free runs on your laptop. Pro and Enterprise run on dedicated servers - yours or ours.

Free - Your Laptop

$ pip install air-blackbox $ air-blackbox comply --scan . 51 checks across 6 EU AI Act articles + GDPR $ air-blackbox export --format pdf ✅ Report saved

One pip install. Runs locally with Ollama. Anonymized scan metadata helps improve the compliance model for everyone.

Pro - We Manage It

# We deploy a VPS for your team $ export AIR_GATEWAY=https://your-co.airblackbox.ai $ air-blackbox comply --scan . Fine-tuned model · Private telemetry ✅ Jaeger dashboard: your-co.airblackbox.ai:16686

We set up a dedicated VPS with the fine-tuned compliance model, Jaeger traces, and benchmarking dashboard. Your team just points the CLI at it. We handle updates.

Enterprise - Your Infrastructure

# Your DevOps runs one script $ bash deploy-enterprise.sh → Ollama + model baked in (no download) → Gateway + collector + Jaeger + MinIO → All ports bound to 127.0.0.1 ✅ Air-gapped. Zero external connections.

Everything ships inside Docker - including the fine-tuned LLM. Deploy on-prem, in your VPC, or on an air-gapped server. No code or data ever leaves your network.

Docker containers
gateway · ollama · collector · jaeger · minio

<5min

deploy time
one script, fresh Ubuntu VPS

external connections
enterprise air-gapped mode

gap analysis checks
same engine at every tier

FAQ

Everything you need to know about AIR Blackbox.

What is AIR Blackbox?

AIR Blackbox is an open-source EU AI Act compliance scanner for Python AI agents. It runs 51 automated checks across 6 EU AI Act articles (9, 10, 11, 12, 14, 15) plus GDPR and bias/fairness scanning to identify compliance gaps in your source code and runtime configuration. It does not certify or guarantee compliance. v1.12.0 adds the A2A Transaction Layer for signed, tamper-evident agent-to-agent compliance auditing with bilateral proof, ML-DSA-65 (FIPS 204) quantum-safe digital signatures, self-verifying .air-evidence bundles, and trust layers for 7 frameworks: LangChain, CrewAI, OpenAI Agents SDK, Google ADK, Claude Agent SDK, AutoGen, and Haystack. Install with pip install air-blackbox and scan with air-blackbox comply --scan . -v. The entire tool runs locally.

What EU AI Act articles does it check?

AIR Blackbox checks 6 articles: Article 9 (Risk Management), Article 10 (Data Governance), Article 11 (Technical Documentation), Article 12 (Record-Keeping), Article 14 (Human Oversight), and Article 15 (Accuracy & Security). Each check is classified as static (verifiable from source code) or runtime (requires gateway/trust layer).

How does two-tier scoring work?

Two-tier scoring separates the 51 checks into two categories: 44 Static checks analyze code patterns, documentation, and configuration that can be verified from source code alone. 7 Runtime checks require a running gateway or trust layer to verify. This gives teams a realistic compliance score even without the full gateway deployed -- you can pass all static checks immediately and work toward runtime compliance incrementally.

When is the EU AI Act deadline?

Under the Digital Omnibus political agreement of May 2026, high-risk obligations for standalone Annex III systems, which include employment and recruiting AI, are deferred to December 2, 2027, and AI embedded in regulated products under Annex I to August 2, 2028. These dates take legal effect once the Omnibus is formally adopted and published, expected before August 2, 2026. Article 50 transparency obligations remain on the original schedule and still apply from August 2, 2026. Prohibited AI practices have been enforced since February 2, 2025, and GPAI obligations since August 2, 2025. Penalties can reach €35 million or 7% of global annual turnover.

What frameworks does AIR Blackbox support?

AIR Blackbox has trust layer integrations for 7 frameworks: LangChain, CrewAI, OpenAI Agents SDK, Google ADK, Claude Agent SDK, AutoGen, and Haystack - plus a standalone air-openai-trust SDK. 4 PyPI packages: air-blackbox, air-trust, air-openai-trust, and air-blackbox-mcp. The compliance scanner works on any Python AI code regardless of framework. There's also an MCP server for Claude Desktop and Cursor integration.

How does it compare to Credo AI, Holistic AI, or OneTrust?

Enterprise AI governance platforms typically cost $50,000+/year and require sending code to their cloud. AIR Blackbox is free, open source (Apache 2.0), and runs 100% locally. It focuses specifically on EU AI Act technical requirements for Python AI agents. The developer experience is fundamentally different: pip install and scan in 10 seconds, versus weeks of procurement and enterprise deployment.

Is AIR Blackbox free?

The core scanner and all PyPI packages are 100% free and open source under the Apache 2.0 license. For teams that need managed infrastructure, we offer a Pro tier ($299/mo managed VPS) and an Enterprise tier (custom pricing, air-gapped deployment) with the same gap analysis engine plus dedicated infrastructure, fine-tuned models, and support.

What is the HMAC-SHA256 audit chain?

Every AI action logged through the AIR gateway or trust layers is written as a tamper-evident .air.json record. Each record is linked to the previous one via HMAC-SHA256 cryptographic hashes - creating a blockchain-style chain without the blockchain. If anyone modifies a record after the fact, the hash chain breaks and the tampering is detectable.

The flight recorderfor autonomous AI agents.

Nobody remembers why the AI did that.

"Why did the AI recommend that?" - and nobody knows

The AI handled something it should have handed off

Your AI codebase diverged from policy. Somewhere.

Your auditor wants proof a human actually reviewed this

Your AI made a decision. Who's legally on the hook?

The rough draft became the policy. Nobody caught it.

Security tools filter threats. We build the record of everything.

Cryptographic Proof

PII & Injection Scanning

Drift Detection in CI/CD

Human Oversight Attestation

One line change. Complete coverage.

Your App Sends a Request

Gateway Authenticates & Proxies

AIR Record Created (Background)

Secrets Vault-Encrypted

Built for production. Not a weekend hack.

Gateway-Level Auth

Encrypted Vault

Non-Blocking Writes

SSE Streaming

Tamper-Evident Records

Replay & Diff

Cryptographic Audit Chain

ML-DSA-65 Quantum-Safe Signing

Self-Verifying Evidence Bundles

Docker Compose Ready

Provider Agnostic

Prompt Injection Detection

GDPR Scanner

Bias & Fairness Scanner

ISO 42001 + NIST AI RMF + Colorado SB 24-205

A2A Compliance Protocol

Pre-Commit Hooks

Feedback Loop

One chain. One signer. One bundle. Provable traceability for the EU AI Act.

HMAC-SHA256 Audit Chain

Ed25519 or ML-DSA-65 Signing

Evidence Bundle

Static + Runtime Scanner

Simple by design.

Your App

AIR Gateway

LLM Provider

AIR Records

Secret Vault

Trust Layer

Any team where AI makes decisions that matter.

Your AI recommended a treatment. The chart says the AI did it. Your log says nothing.

The model flagged a trade. Three people acted on it. None of that is in the audit trail.

Your associate used AI to draft the brief. Can you prove a human reviewed every line?

Your AI agents touched 40 workflows last quarter. How many drifted from policy?

The market is confirming this category.

AEGIS (arXiv, March 2026)

McKinsey: State of AI Trust in 2026

28% of US Firms Have Zero AI Trust

They do one thing. We do four.

Change one line. That's the integration.

Lightweight. Auditable. Open.

Four commands. Full coverage.

Open source core. Scale when you're ready.

How it gets into your infrastructure.

Everything you need to know about AIR Blackbox.

The high-risk deadline moved. The preparation did not.

The flight recorder
for autonomous AI agents.