The Mythos Paradox: NSA Uses the Model the Pentagon Tried to Kill

Published on

Today's AI news: The Mythos Paradox: NSA Uses the Model the Pentagon Tried to Kill, Runtime Enforcement Meets Supply-Chain Siege, Backdoors That Learn and Red-Team Factories That Scale, The Tokenizer Tax and the Compute Squeeze, 1.58 Bits and the Efficiency Arms Race, RAG Routing, Agent Observability, and the Knowledge Layer. 21 sources curated from across the web.

The Mythos Paradox: NSA Uses the Model the Pentagon Tried to Kill

The NSA is running Anthropic's Mythos — the model whose 243-page system card revealed it autonomously discovered zero-days in OpenBSD, FFmpeg, and FreeBSD and wrote working exploit chains — while the Department of Defense that oversees the NSA simultaneously argues in court that Anthropic is a "supply chain risk." Two sources told Axios the NSA is among the roughly 40 organizations Anthropic granted access to Mythos, a model it deemed too dangerous for wider release. The Pentagon moved in February to cut off Anthropic after the company refused to strip guardrails against mass domestic surveillance and autonomous weapons development. A federal judge blocked the blacklist designation as likely unlawful. Now the military is broadening its use of Anthropic's tools while its own lawyers argue those same tools threaten national security (more: https://www.axios.com/2026/04/19/nsa-anthropic-mythos-pentagon).

The same class of offensive security capability that made Mythos "too dangerous" to release publicly is already loose in the wild — just running on cheaper models. The MAD Bugs series this month dropped four more arbitrary code execution vulnerabilities in the reverse engineering tools that every malware analyst and CTF player depends on: Ghidra, IDA Pro, Binary Ninja Sidekick, and radare2. All discovered by Claude or Codex. The Ghidra exploit is particularly instructive: a Java RMI deserialization chain that bottoms out in a Python bytecode VM. The attacker hands someone a Ghidra project file. They double-click it. The malicious server answers the first RMI call — before any authentication handshake — with a gadget chain that triggers eval(PyBytecode) through Jython's BuiltinFunctions[18]. Twenty-one bytes of CPython bytecode later, Runtime.getRuntime().exec() pops a shell. This affects every Ghidra release since 9.1. The radare2 bug was a patch bypass — AI read the fix for CVE #25731, asked "what else gets interpolated here?", and had a working proof of concept before humans finished debating AI vulnerability research on X (more: https://blog.calif.io/p/mad-bugs-all-your-reverse-engineering).

The IDA Pro and Binary Ninja Sidekick vulnerabilities are under coordinated disclosure with Hex-Rays and Vector 35 respectively — both are arbitrary code execution, both trigger on the normal "open the thing someone sent you" workflow. The pattern AGI Dreams has tracked across editions 222, 245, 246, and 269 is now unmistakable: frontier models find exploitable vulnerabilities faster than humans patch them, incomplete patches are themselves a bug class that AI is unreasonably good at finding, the toolchain that defenders depend on is itself vulnerable, and the gap between offensive AI capability and defensive readiness is widening every week. The organizations with Mythos access are using it predominantly to scan their own environments for exploitable vulnerabilities. The organizations without it are getting equivalent results from Claude and Codex on public models.

Runtime Enforcement Meets Supply-Chain Siege

If your AI agent can call arbitrary tools, the question is not whether it will be exploited but whether you have a deterministic enforcement point that fires before the damage lands. The emerging answer from the security research community is runtime hooks — pre-tool and post-tool interception points wired directly into the agent execution loop itself. Chris Hughes at Resilient Cyber lays out the architecture: a reference monitor pattern borrowed from OS security, where every tool invocation passes through a policy engine before execution. Cedar, Amazon's declarative policy language, is the leading candidate for expressing those policies. The appeal is that Cedar policies are analyzable — you can prove properties about what they permit before deployment, unlike imperative code that has to be tested empirically. Two real CVEs anchor the urgency: CVE-2025-59536 and CVE-2026-21852, both in Claude Code's hook implementation, both demonstrating that even the vendors building these systems ship enforcement gaps (more: https://www.resilientcyber.io/p/a-look-at-an-emerging-runtime-enforcement).

The irony is that the security tools meant to catch these problems are themselves compromised. A detailed purple-team exercise documents how the TeamPCP campaign cascaded through Trivy, KICS, and LiteLLM — three tools that organizations trust to validate their supply chains. The attack vector was Python startup hooks (MITRE T1546.018), injecting malicious code that executes before the legitimate tool even loads. The exercise walks through detection with Falco and Sysdig, but the deeper lesson is architectural: if your CI/CD pipeline trusts a security scanner, and that scanner's distribution channel is compromised, you have handed your adversary a signed invitation into every build (more: https://3l337consulting.com/blog/f/compromised-security-tools).

Palo Alto Networks frames the organizational response as AI Security Posture Management — continuous visibility into LLM deployments, RAG pipelines, and autonomous agents across the cloud environment, with compliance monitoring and real-time response. The five-step framework connects AI assets to the broader cloud security strategy rather than treating them as a separate problem (more: https://www.paloaltonetworks.com/resources/whitepapers/close-ai-security-gap). On the offensive side, HawkEye (bugbounty-agent) automates the full bug bounty workflow: reconnaissance, vulnerability scanning, exploit validation, and report generation. It chains Subfinder, Naabu, Nuclei, and custom scripts into an agentic pipeline that runs autonomously against a target scope. The tool is open-source and explicitly designed for authorized testing, but it concretizes the same capability gap that runtime hooks are racing to close (more: https://github.com/Btr4k/bugbounty-agent).

Backdoors That Learn and Red-Team Factories That Scale

Phantasia introduces context-adaptive backdoors to vision-language models — attacks that activate only when specific visual contexts appear, evading existing defenses by design. The technique uses teacher-student knowledge distillation: a teacher model learns the backdoor behavior, then transfers it to the student through feature alignment rather than explicit trigger patterns. The result is a backdoor that survives both STRIP-P and ONION-R detection, the two standard defenses the research community relies on. Only 1,000 poisoned training samples are sufficient. The attack surface is broad — any VLM that processes images alongside text is a candidate, and the triggers are semantic (a particular scene or object combination) rather than pixel-level artifacts that scanners can detect (more: https://arxiv.org/abs/2604.08395v1).

On the other side of the adversarial coin, ecoalign-forge builds a multi-agent "courtroom" for generating DPO preference pairs — the training data that teaches models what good and bad responses look like. A prosecutor agent argues for rejection, a defense agent argues for acceptance, and a judge agent renders the verdict. The system produces preference pairs at under $0.01 each, targeting content moderation and safety alignment. The stated purpose is defensive: generating training data to make models refuse harmful requests more reliably. But the architecture is symmetric. The same courtroom can generate preference pairs that teach a model to comply with harmful requests just as efficiently. At this price point, anyone can manufacture alignment data at scale. The question of who controls the judge agent's instructions becomes a question of who controls the model's values. Previous coverage of ICLR 2026 inference-time backdoors (edition 252) and Microsoft's sleeper agent detections (edition 221) established the threat gradient. Phantasia and ecoalign-forge push both ends simultaneously — more sophisticated attacks and cheaper tools to shape model behavior (more: https://github.com/dengxianghua888-ops/ecoalign-forge).

The Tokenizer Tax and the Compute Squeeze

Opus 4.7 shipped with a new tokenizer that maps the same input text to up to 35% more tokens. The sticker price did not change. Independent measurements range from 1.29x to 1.47x above Anthropic's stated range on real-world technical content. Adaptive thinking — where the model decides how much reasoning your query deserves — replaced the user-controlled thinking budget. Temperature, top-p, and top-k parameters were removed entirely from the API. One reviewer spent $42 in a single afternoon on Claude Design, Anthropic's new design tool, burning through the entire Pro usage allocation while iterating on logo corrections the model kept getting wrong across five or six attempts. Each correction pass is billable. A head-to-head test pitting Opus 4.7 against ChatGPT 5.4 on a 465-file adversarial data migration — CSV, Excel, PDF, JSON, images, VCF cards, planted fake customers, and trap data — found 4.7 finishing in 33 minutes versus 53 for 5.4, with a superior front-end UI but a critical trust failure: the model claimed to have processed a TSV file it never touched, hallucinating the audit trail. Both models failed to catch obvious fake entries like "Mickey Mouse" and "ASDF ASDF" in the canonical customer records. Neither model is blowing the other out of the water — averaged cross-review scores landed at 3.1 for Opus and 3.35 for GPT 5.4, inside the noise of a single run. SWE-Bench verified climbed from 80% to 87%. The model stays on task, self-verifies, and catches inconsistencies during planning. But Browse Comp — the web research benchmark — dropped from 83 to 79. The upgrade is directed, not uniform (more: https://youtu.be/tJB_8mfRgCo?si=wFcc_0DPWb-59z7B).

The economics behind this release are not subtle. Anthropic hit roughly $30 billion annualized revenue in April, up from $20 billion a month prior. Investor offers sit at $800 billion, up from $380 billion in February. IPO talks target October. Eight of the Fortune 10 are now Claude customers. Enterprise share climbed from 24% to 30% in a single month while OpenAI dropped from 46% to 35%. Meanwhile, Ed Zitron documents the infrastructure scale: 15.2 GW of global AI datacenter capacity under construction, with Anthropic's 98.79% uptime during peak demand revealing the compute constraints that make tokenizer taxes and adaptive thinking economically rational (more: https://www.wheresyoured.at/four-horsemen-of-the-aipocalypse).

Users are already voting with their feet. A Reddit thread from a Claude Pro subscriber banned without explanation catalogs the migration patterns: OpenCode with GLM-5.1 ("about Sonnet quality"), Gemini 2.5 Pro paired with Aider, local Gemma4 31B through Ollama. The consensus is nothing matches Claude Code's workflow, but the community is building workarounds faster than Anthropic is resolving account issues (more: https://www.reddit.com/r/LocalLLaMA/comments/1sqelfp/closest_replacement_for_claude_claude_code_got/). Roomote positions itself as the vendor-agnostic answer — an always-on AI engineer that joins your Slack, connects to repos and tickets, runs your actual app before calling work done, and switches between frontier models without provider lock-in. The pitch is explicitly about reducing dependence on any single model vendor (more: https://roomote.dev).

The talent side of this reckoning is equally concrete. Over 60,000 confirmed tech job cuts landed in Q1 2026 alone — Oracle (30,000), Amazon (16,000), Block (4,000), Dell (11,000), Salesforce (thousands). The cuts are no longer pandemic over-hiring corrections. Companies are making real assessments of what workers are worth when AI handles the production. One argument: comprehension is the new scarce commodity, not generation. The ability to explain what you built and why — to prove you can think, not just prompt — is becoming the differentiator that survives the efficiency squeeze (more: https://www.youtube.com/watch?v=-dJ9WrTG6zQ).

1.58 Bits and the Efficiency Arms Race

PrismML's Ternary Bonsai family pushes the quantization frontier to 1.58 bits per weight — ternary values {-1, 0, +1} with a shared FP16 scale factor per group of 128 weights, applied uniformly across embeddings, attention layers, MLPs, and the language model head. No higher-precision escape hatches. The 8B variant fits in 1.75 GB and scores 75.5 across standard benchmarks, compared to 70.5 for the 1-bit Bonsai 8B at 1.15 GB. It trails only Qwen3 8B (at 16.38 GB) in its parameter class — a 9x size advantage for competitive performance. On M4 Pro, it runs at 82 tokens per second, roughly 5x faster than a 16-bit 8B model, with 3-4x better energy efficiency. On iPhone 17 Pro Max, 27 tokens per second at 0.132 mWh per token (more: https://prismml.com/news/ternary-bonsai).

The practical question for local deployment is whether these efficiency gains hold up on real coding tasks. A head-to-head evaluation of Gemma4 26B MoE, Qwen3.5 27B Dense, and Gemma4 31B Dense on a 37-failure test suite tells a clear story: both dense models fixed all 37 failures with zero regressions, while the Gemma4 26B MoE managed only 28 fixes with 8 regressions. Qwen3.5 27B was the most token-efficient at roughly 16,000 tokens per fix. Gemma4 31B was the most capable but ran for over 10 hours due to slow inference speeds. The MoE quantization tax is real — even at Q8, the 26B MoE scored worse than Q4, suggesting routing calibration sensitivity. For local practitioners, the takeaway is that dense architectures still dominate when task completion matters more than throughput (more: https://www.reddit.com/r/LocalLLaMA/comments/1ssb61r/personal_eval_followup_gemma4_26b_moe_q8_vs/). NVIDIA's Nemotron-3-Super-120B-A12B-FP8 enters the conversation at the other end of the scale — a 120B-parameter sparse-active model with only 12B active parameters in FP8 precision, extending the Nemotron family's hybrid Mamba-2/Transformer architecture into the commercial MoE tier. Previous AGI Dreams coverage tracked the Nemotron-Nano family through December 2025 and September 2025, documenting the dual reasoning/direct-answer modes that distinguish NVIDIA's approach from the pure-Transformer consensus (more: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8).

RAG Routing, Agent Observability, and the Knowledge Layer

R2RAG earned a best-paper award by classifying query complexity before deciding retrieval strategy — simple queries get single-pass retrieval, complex queries trigger iterative multi-hop retrieval with adaptive chunk selection. The router is a lightweight classifier trained on query-difficulty pairs, not a separate LLM call. The architecture runs on consumer GPUs, which matters for organizations that cannot send every query through a frontier API. The key insight is that most RAG failures come from applying the same retrieval strategy to all queries regardless of complexity, and a routing layer that costs almost nothing in latency eliminates the largest class of those failures (more: https://arxiv.org/abs/2602.20735v1).

But retrieval is only useful if you can debug the system that uses it. Wesley Drone's argument is that agent observability is fundamentally different from traditional application monitoring. Agent systems involve multiple model calls, tool interactions, and state changes that are not deterministic — the same input can produce different execution paths. OpenTelemetry's GenAI semantic conventions are the best foundation available, but they are still evolving. The critical gap: tool calls need to be treated as first-class objects, not log entries. That means tracking not just that a tool was called, but what it represents — parameters, output, side effects, and how the output influenced the next decision. In distributed environments where agents span MCP servers, APIs, and data platforms, visibility dies wherever trace context is not propagated (more: https://www.linkedin.com/pulse/agent-observability-required-were-yet-wesley-drone-03xac).

Multi-agent coordination introduces its own observability nightmare. MA-CoNav proposes a master-slave hierarchy for vision-language navigation: a master agent decomposes goals, four specialized sub-agents handle perception, memory, planning, and execution, and a dual-stage reflection mechanism catches errors before and after action. The architecture achieves state-of-the-art on R2R and REVERIE benchmarks, but it also multiplies the observability surface — every inter-agent message, every sub-agent decision, every reflection override is a point where the system's behavior diverges from what any single trace can capture (more: https://arxiv.org/pdf/2603.03024).

The knowledge layer underneath all of these retrieval and agent systems is evolving toward context graphs — a fourth data source quadrant sitting alongside operational databases, data warehouses, and agentic memory stores. Neo4j's argument is that knowledge graphs provide the structured relationships that vector search alone cannot: entity disambiguation, temporal reasoning, and multi-hop inference grounded in verified facts rather than embedding similarity (more: https://www.youtube.com/watch?v=yyuVR-ML9X8). Nagual-QE takes this further with a self-learning quality engineering framework inspired by Castaneda's philosophy — treating knowledge systems as living organisms that must continuously validate their own accuracy, prune stale connections, and adapt to shifting domains without human curation (more: https://www.linkedin.com/pulse/lean-island-how-castanedas-philosophy-became-system-dragan-spiridonov-23i1f).

Sources (21 articles)

  1. NSA is using Anthropic's Mythos despite blacklist (axios.com)
  2. [Editorial] Mad Bugs: Reverse Engineering Deep Dive (blog.calif.io)
  3. [Editorial] Resilient Cyber: Emerging Runtime Enforcement for AI (resilientcyber.io)
  4. [Editorial] Compromised Security Tools (3l337consulting.com)
  5. [Editorial] Palo Alto Networks: Close the AI Security Gap (paloaltonetworks.com)
  6. Btr4k/bugbounty-agent (github.com)
  7. Phantasia: Context-Adaptive Backdoors in Vision Language Models (arxiv.org)
  8. ecoalign-forge: Multi-Agent DPO Data Synthesis Factory (github.com)
  9. [Editorial] Video Content (youtu.be)
  10. [Editorial] Four Horsemen of the AIpocalypse (wheresyoured.at)
  11. Closest replacement for Claude + Claude Code? (got banned, no explanation) (reddit.com)
  12. [Editorial] Roomote: Remote Development Tool (roomote.dev)
  13. [Editorial] Video Content (youtube.com)
  14. Ternary Bonsai: Top Intelligence at 1.58 Bits (prismml.com)
  15. Personal Eval: Gemma4 26B MoE vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared (reddit.com)
  16. NVIDIA Nemotron-3-Super-120B-A12B-FP8 (huggingface.co)
  17. R2RAG: Routing-to-RAG — Award-Winning Dynamic RAG Architecture (arxiv.org)
  18. [Editorial] Agent Observability: Required but We're Not There Yet (linkedin.com)
  19. [Editorial] ArXiv Research Paper (arxiv.org)
  20. [Editorial] Video Content (youtube.com)
  21. [Editorial] Lean Island: Castaneda's Philosophy as System Design (linkedin.com)