Record. Replay. Enforce. Audit. Open-source, runs locally, ten seconds to your first scan. The tamper-evident record of every AI decision - before regulators, clients, or your board ever ask.
No audit trail. No replay. No evidence. When something goes wrong with an AI agent - and it will - most teams discover they have nothing to show a regulator, a client, or a board.
Teams make faster decisions with weaker memory. No one remembers what the AI suggested vs. what the human chose, or what assumptions were true at the time. AIR Blackbox captures the logic path behind every AI-assisted decision.
AI automates support, operations, and decisions - but failures happen when something that should have been escalated stays automated. AIR trust layers detect when AI output requires human judgment and route accordingly.
As teams use AI to move faster, they slowly accumulate undocumented process changes, inconsistent standards, and broken assumptions. AIR scans your codebase on every commit to detect drift before it becomes an incident.
When everything can be AI-generated, the premium shifts to verified human review. AIR trust layers create cryptographic proof that a human reviewed, approved, and signed off on AI-assisted output.
People use AI for taxes, contracts, decisions, and healthcare without understanding what they're personally on the hook for. AIR compliance reports tell you exactly where your AI creates real-world legal exposure.
AI flattens context aggressively - a rough draft becomes a policy, an internal brainstorm becomes customer-facing copy. AIR audit chains preserve the boundary between draft and final, speculative and approved.
Arthur AI, Lasso, and Lakera protect the input. When regulators audit the output - who said what, who approved it, whether anyone tampered with the log - they have nothing. AIR Blackbox does.
Every AI call generates an HMAC-SHA256 tamper-evident record, signed by default with Ed25519 and optionally with ML-DSA-65 (FIPS 204) post-quantum signatures. Modify any record and the chain breaks.
Automatically detect personal data leaking into prompts and prompt injection attempts - before they reach the model. Real-time, inside the call.
51 compliance checks run on every commit. Catch when your AI codebase drifts from EU AI Act, GDPR, ISO 42001, or your own policies - before it ships.
Art. 14 delegation logging proves a human authorized AI-assisted actions. Decision lineage that shows who approved what, when, and why.
The Four Layers: Verify. Filter. Stabilize. Protect.
Every piece of AIR Blackbox maps to one of these four functions. Together they form a complete audit infrastructure for AI systems.
Point your app at the gateway instead of the provider. That's it. Everything else is automatic.
Change your base URL from api.openai.com to localhost:8080. Same SDK. Same code. Same everything.
The gateway checks your gateway key, forwards the request to the upstream provider, and streams the response back in real-time. Sub-millisecond overhead.
A tamper-evident .air.json record is written asynchronously. Contains: request, response, model, tokens, timestamp, run ID. Never blocks your response.
API keys and auth headers are stripped from the AIR record and encrypted separately. Even if someone gets the audit file, they can't extract credentials.
X-Gateway-Key header authentication. Your upstream API keys never leave the server. Developers hit the gateway, not the provider.
Secrets are AES-encrypted and stored in a separate vault (local or S3-compatible). AIR records contain zero plaintext credentials.
Audit records write in background goroutines. Vault writes are async. Your response latency is the provider's latency. Period.
Full support for streaming responses (Server-Sent Events). Tokens stream to your app in real-time while the gateway records the complete response.
Each AIR record includes cryptographic hashes. If anyone modifies a record after the fact, the hash breaks. Provable integrity.
Replay recorded requests against current models. Compare outputs. Detect when a model update changed behavior. Regression testing for AI.
Every AIR record is linked via HMAC-SHA256 into a tamper-proof chain. Modify any record and the chain breaks. Blockchain-grade integrity without the blockchain.
Evidence is signed with FIPS 204 ML-DSA-65 (Dilithium3) - a post-quantum digital signature algorithm. Signatures remain secure even against future quantum computers. Keys are generated locally and never leave your machine.
One command generates a .air-evidence ZIP containing the audit chain, scan results, a SHA-256 manifest, and a standalone verify.py script. The auditor extracts it and runs verify.py with the signing key on plain Python, no install needed, and gets pass or fail.
One command to run. No MinIO dependency required. Works with local filesystem or S3-compatible storage. Your choice.
Works with any OpenAI-compatible API. OpenAI, Anthropic (via proxy), Azure OpenAI, local models, custom endpoints. Same format.
20 weighted patterns across 5 attack categories: role override, delimiter injection, privilege escalation, data exfiltration, and jailbreak. Configurable sensitivity and auto-blocking.
8 automated checks: consent management, data minimization, right to erasure, retention policies, cross-border transfer, DPIA patterns, processing records, and breach notification.
6 checks for fairness metrics, bias detection libraries, protected attribute handling, dataset balance, model card bias documentation, and output bias monitoring.
Maps every scan result to EU AI Act, ISO/IEC 42001:2023, NIST AI RMF, and Colorado SB 24-205. One scan, four compliance frameworks. Export as markdown or JSON.
Agent-to-Agent verification: compliance cards, peer verification gates, and HMAC-signed handshakes. Agents prove their compliance posture before communicating.
Block non-compliant code before it merges. Four configs: basic, strict, GDPR, and full. Integrates with the pre-commit framework in one line of YAML.
Correct false positives and they flow into training data for the fine-tuned model. The scanner gets smarter with every correction your team makes.
Article 12 requires high-risk AI systems to automatically record events over the system's lifetime, with traceability appropriate to their purpose, and Article 19 requires keeping those logs for at least six months. Tamper-evidence is not named in the text, it is how you prove that traceability and exceed the regulatory floor. For Annex III high-risk systems like hiring and lending AI, the logs must also record who verified each result, which is exactly the human-oversight record AIR Blackbox captures.
Every agent action gets a chained hash. Each record links to the previous one. Tamper with one record and every record after it breaks. Formally specified in audit-chain-v1.md (RFC 2119).
The chain head is signed for third-party verification, with Ed25519 by default and FIPS 204 ML-DSA-65 post-quantum signing available as an option. Keys are generated locally and never leave your machine. Proves who signed, when, and that nothing was altered.
Everything gets packaged into a self-verifying .air-evidence ZIP. Auditor extracts it, runs python verify.py - gets PASS/FAIL in 2 seconds. No pip install needed.
The Article 12 scanner detects logging infrastructure, tamper-evident patterns, and retention config in your codebase. Combines static analysis with runtime chain verification for hybrid coverage.
SDK / HTTP client
Auth + Record + Proxy
OpenAI / Anthropic / etc
Tamper-evident JSON
AES-encrypted keys
HMAC chain + compliance
When your AI suggests a diagnosis, regulators want the decision lineage: what was asked, what was returned, who reviewed it, and what was overridden. AIR captures that entire chain.
Trading desks and advisory platforms need to prove what the model said, who approved it, and whether it should have been escalated to a human. AIR provides decision traceability and escalation intelligence.
Law firms using AI for contract review and brief drafting need to prove a human actually reviewed the output - not just rubber-stamped it. AIR trust layers create cryptographic human oversight attestation.
AIR scans on every commit, detects where AI usage diverges from your standards, and blocks violations before they ship. Compliance drift is invisible until it becomes an incident.
Three independent signals - academic research, analyst coverage, and market data - all point to the same conclusion.
Academic researchers independently published the same interception-layer architecture for AI agent governance - pre-execution firewalls with tamper-evident audit chains. When academia converges on your approach, it validates the thesis. Read the paper
McKinsey's 2026 report identifies trust infrastructure as critical for the agentic AI era. The shift from model capabilities to operational trust systems is now a named category. McKinsey report
AnalyticsWeek reports that 28% of US organizations have "zero confidence" in the data quality feeding their LLMs. They call it the "Truth Layer Crisis." That crisis is what AIR Blackbox solves. Read the report
Arthur AI ($60M raised), Lasso Security, and Lakera Guard are AI security platforms. They filter threats. When a regulator asks for the audit trail, they have nothing.
They do one thing. We do four. They're a firewall. We're infrastructure.
Run this on any Python AI project and get a gap analysis report, shadow AI scan, replayable audit trail, and signed evidence package - in under 60 seconds.
Same gap analysis engine at every tier. Enterprise gets air-gapped isolation - zero data leaves your network.
Free runs on your laptop. Pro and Enterprise run on dedicated servers - yours or ours.
One pip install. Runs locally with Ollama. Anonymized scan metadata helps improve the compliance model for everyone.
We set up a dedicated VPS with the fine-tuned compliance model, Jaeger traces, and benchmarking dashboard. Your team just points the CLI at it. We handle updates.
Everything ships inside Docker - including the fine-tuned LLM. Deploy on-prem, in your VPC, or on an air-gapped server. No code or data ever leaves your network.
.air-evidence bundles, and trust layers for 7 frameworks: LangChain, CrewAI, OpenAI Agents SDK, Google ADK, Claude Agent SDK, AutoGen, and Haystack. Install with pip install air-blackbox and scan with air-blackbox comply --scan . -v. The entire tool runs locally.pip install and scan in 10 seconds, versus weeks of procurement and enterprise deployment..air.json record. Each record is linked to the previous one via HMAC-SHA256 cryptographic hashes - creating a blockchain-style chain without the blockchain. If anyone modifies a record after the fact, the hash chain breaks and the tampering is detectable.51 gap analysis checks. 6 EU AI Act articles + GDPR. ML-DSA-65 signing. Self-verifying evidence bundles. 7 framework trust layers. GDPR + bias scanning. One pip install. Find out where your Python AI agents stand today.
AIR Blackbox identifies potential compliance gaps. It does not certify or guarantee regulatory compliance. Terms of Service