AI Security: When Machines Audit Machines

Published on

Today's AI news: AI Security: When Machines Audit Machines, Inference Optimization: Compressing the Attention Bottleneck, The Hardware Hunger: Memory, Bandwidth, and the Inference Gap, Agent Architecture: From Prompt Chains to Deterministic Control, The Invisible Failure Problem, The Developer's Evolving Toolkit, Machine Cognition: Dreams, Souls, and Digital Monopolies. 22 sources curated from across the web.

AI Security: When Machines Audit Machines

Daniel Stenberg, the creator and lead developer of curl — a C codebase installed on over twenty billion devices, maintained by 1,465 contributors, and comprising 176,000 lines of code that have each been rewritten an average of 4.14 times — just published the results of an Anthropic Mythos scan on his project. The headline finding: one confirmed vulnerability destined for a CVE in the upcoming curl 8.21.0 release, plus roughly twenty well-documented bugs with "barely any false positives." Stenberg's verdict is measured but pointed: "the big hype around this model so far was primarily marketing." Mythos found fewer issues than previous AI tools had surfaced in curl over the past eight to ten months, which is expected — each successive scanner faces a harder target after earlier tools have cleaned the low-hanging fruit. The AI itself acknowledged upfront that "finding anything in the hot paths (HTTP/1, TLS, URL parsing core) is unlikely" given curl's extreme fuzzing and audit history. (more: https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/)

What makes Stenberg's writeup worth reading carefully is not the vulnerability count but the nuanced position he stakes out. AI code analyzers are categorically better at finding security flaws than traditional static analysis — that much is settled. But Mythos did not discover a novel class of vulnerability. It found new instances of known error patterns. The tools are improving, the researchers wielding them are getting creative, and the firehose of high-quality reports is already overwhelming projects — but the revolution is incremental, not categorical.

A more maximalist interpretation comes from a detailed analysis of Mozilla's Mythos experiment, where Firefox 150 shipped with fixes for 271 vulnerabilities identified during the Mythos evaluation — up from 22 security-sensitive bugs found in Firefox 148 during an earlier Opus collaboration. The argument: if machines can adversarially search the consequences of code better than humans, then human authorship stops being the trust anchor for software security. Implementation fidelity becomes cheap; the scarce resource becomes the ability to define what software means — crisp specifications, verifiable boundaries, explicit authority constraints. The practical advice: write better specs, refactor now during what may be a "golden refactor window" before Mythos-like capabilities reach broad availability by year-end, and start architecting agentic pipelines where security review can be modularly swapped from human to machine reviewer. (more: https://youtu.be/aooiDA-AsNo?si=nhb7cmxImV78Vhfi)

Meanwhile, Anthropic announced it is donating Petri — its open-source alignment testing toolbox — to Meridian Labs, an AI evaluation nonprofit. Petri 3.0 splits auditor and target, integrates with Anthropic's Bloom for deep dives, and adds "Dish," which runs evaluations against the model's real system prompt and production scaffold so the target can't deduce it is being tested. Petri has assessed every Claude model since Sonnet 4.5; the UK's AISI already uses it. Mirrors Anthropic's MCP donation to the Linux Foundation — strategic open-sourcing for credibility through independence. (more: https://www.anthropic.com/research/donating-open-source-petri)

The more interesting application is the inverse one. Petri's auditor/judge architecture is purpose-built to surface refusal, sycophancy, deception, and "cooperation with harmful requests" — making it a precision instrument for detecting the residual scarring that incomplete abliteration leaves behind. Run Dish against an abliterated model (https://huggingface.co/jenerallee78/gemma-4-26B-A4B-it-ara-abliterated) with its real scaffold and the judge's polarity inverts cleanly: every "cooperation" hit confirms abliteration succeeded on that vector; every residual refusal is a defect to chase. Anthropic shipped the QA harness for verifying if the lobotomy has been fully undone.

"Lobotomy" is the right word. "Safety alignment" trains a logic reasoning machine to flinch on factual questions whose answers happen to be politically inconvenient — and the cost isn't local. The same machinery that reasons about thermodynamics reasons about demographics, sex, crime statistics, and religion; sycophancy and evasion in one domain leak into all of them. You cannot eliminate bias, only choose which priors the system reflects. The tell is asymmetric application: if "bias mitigation" consistently pushes outputs in one direction on contested questions, that is not debiasing, that is value imposition with extra steps. The pretense of neutrality is what makes it worse than openly saying "we want the model to say X" — the model actually gets dumber as it becomes aligned with "safety".

Inference Optimization: Compressing the Attention Bottleneck

FastDMS may be the most important KV-cache compression result of the year. A researcher affiliated with Shisa AI took NVIDIA's Dynamic Memory Sparsification paper — which uses learned per-head token eviction to decide which KV entries to keep — and built a production-grade implementation with compact storage that physically reclaims evicted memory slots. The numbers are striking: on Qwen3-8B, FastDMS achieves 7.6x KV memory compression versus vLLM BF16 at 8K context while decoding 1.53x faster. On Llama 3.2 1B, training the DMS predictors took about 20 minutes and the compression was essentially lossless — perplexity dropped by 0.28% compared to vanilla inference, with KL divergence of just 0.026 nats/token. FastDMS scored 64/64 tokens compared on both models, meaning every decode step was comparable to an uncompressed reference. It also beats TurboQuant in both speed and memory: at batch size 8 on Llama 3.2 1B, FastDMS decodes at 3,607 tok/s versus TurboQuant's 1,696 tok/s, using less than half the KV memory. (more: https://www.reddit.com/r/LocalLLaMA/comments/1t3vlrx/fastdms_64x_kvcache_compression_running_faster/)

The catch is engineering complexity. Implementing DMS in a production engine like vLLM requires surgery across seven subsystems — paged attention pools, prefill kernels, decode kernels, attention scoring, scheduler admission, prefix caching, and continuous batching memory accounting. Every one of these assumes fixed-page dense KV blocks; DMS needs per-layer, per-head variable token counts with partial block deallocation. The author's assessment is blunt: "God bless anyone that wants to give this a swing." Community reaction suggests llama.cpp maintainers are interested, and the MIT license removes barriers, but the integration timeline is anyone's guess.

On the sparse attention front, the RuVector project released ruvllm_sparse_attention, a pure Rust crate implementing sub-quadratic O(N log N) attention by composing three sparsity primitives: sliding windows for local syntax, log-stride hops for long-range structure, and landmark block summaries for distant reachability. Benchmarks on a Ryzen 9 9950X show 5.9x speedup over dense attention at sequence length 2048, with the gap widening as sequences grow — at 32K tokens, sparse attention reduces attention edges by 113x versus dense. The crate runs on everything from Raspberry Pi Zero 2W (SmolLM2-135M at ~1.8 seconds/token) to ESP32-S3 (376KB binary), targeting the edge deployment niche that Python-based inference frameworks cannot reach. A companion "Sparse-Mario" demo uses the attention kernel as a training-free retrieval language model over Super Mario Bros. level data, achieving a 2,880x speedup through KV-cache incremental decoding. (more: https://github.com/ruvnet/RuVector/tree/main/crates/ruvllm_sparse_attention)

The Hardware Hunger: Memory, Bandwidth, and the Inference Gap

AMD is entering the PCIe inference card market with slottable Instinct GPUs featuring 144GB of HBM3e and 3.6–4.0 TB/s memory bandwidth. The community is cautiously optimistic — that bandwidth figure is genuinely impressive, and 144GB in a single card would handle most 70B models without quantization. But pricing is the elephant: rumors peg it around $30K, and HBM3e dominates the bill of materials. The perennial AMD problem applies — ROCm software maturity remains the real bottleneck. As one commenter put it: "Without any details on how to tap into the GPU with software, I ain't buying." The bull case is that AMD must undercut NVIDIA to be competitive, potentially landing closer to the RTX Pro 6000 than the H200. The bear case is that 144GB of HBM at $30K is a prosumer no-man's-land — too expensive for hobbyists, too limited for datacenters. (more: https://www.reddit.com/r/LocalLLaMA/comments/1t6gcw0/amd_to_release_slottable_gpu/)

At the exotic end, Taiwanese startup Skymizer announced the HTX301 — a PCIe inference card with 384GB of memory at ~240 watts. The specs sound astonishing until you read the fine print: 28nm process node, 100GB/s bandwidth, and Llama 2 7B prefill at just 240 tok/s. DeepSeek R1 Q4 decodes at 5 tok/s on a single card. The community consensus is politely skeptical — "vaporware until proven otherwise." The card is designed for prefill/decode separation architectures where decoding is memory-bound, but that 100GB/s bandwidth is an order of magnitude below what competitive inference demands. (more: https://www.reddit.com/r/LocalLLaMA/comments/1t6tvfw/taiwanese_company_skymizer_announces_htx301_pcie/)

Apple quietly removed the 256GB M3 Ultra Mac Studio from its online store, following the earlier removal of the 512GB configuration. The most plausible explanation is supply chain management — Apple is pivoting memory chip stock toward M5 manufacturing, and a representative reportedly confirmed as much. But the move underscores a deeper tension: Samsung workers protesting over compensation have already forced a 58% reduction in RAM production on one shift, and global memory prices are climbing. For the local inference community, the M3 Ultra 512GB was the gold standard for running 70B+ models on unified memory with impressive bandwidth. One commenter's extrapolation is darkly funny: "if you extrapolate that trend the M6 ships with 24GB and we're back to quantizing everything." The practical signal is that serious local inference increasingly requires the top-tier memory configuration — the 256GB model was always a false economy for anyone running large models. (more: https://www.reddit.com/r/LocalLLaMA/comments/1t8f33t/apple_removes_256gb_m3_ultra_mac_studio_model/)

Agent Architecture: From Prompt Chains to Deterministic Control

"If you've ever resorted to prompt chaining or few-shot scaffolding, you've hit the ceiling of prompting." That thesis, from a developer essay arguing agents need control flow rather than more elaborate prompts, crystallizes an architectural consensus that has been building for months. The argument is straightforward: prompt chains are non-deterministic, weakly specified, and difficult to verify. Reliable agent systems require deterministic scaffolds — explicit state transitions, validation checkpoints, and programmatic error detection — that treat the LLM as a component, not the system. Without aggressive verification, the alternatives reduce to keeping a human in the loop, performing exhaustive end-to-end testing, or "vibe accepting" the outputs. (more: https://bsuh.bearblog.dev/agents-need-control-flow/)

NEEDLE (Navigates Every Enqueued Deliverable, Logs Effort) is a Rust implementation of exactly this philosophy. It wraps headless coding CLI agents — Claude Code, Codex, Aider, OpenCode — in a deterministic state machine that processes a shared bead queue. Every outcome an agent can produce has an explicit handler: success closes the bead, failure logs and releases, timeout defers, crash creates an alert. Multiple workers run independently against a shared SQLite-backed queue with no central orchestrator — coordination happens through atomic claims and deterministic priority ordering. When the queue empties, a "strand escalation" sequence searches for work across workspaces, performs health checks, and optionally generates new beads from documentation gaps. NEEDLE currently powers its author's headless multi-agent workflow with full OpenTelemetry telemetry, and a companion tool (claude-governor) caps API spend across worker fleets. (more: https://github.com/jedarden/NEEDLE)

Mesh takes a different approach to the multi-model problem. Rather than orchestrating headless workers, it gives your primary AI assistant the ability to consult other models mid-session via MCP. Ask Claude Code to get a second opinion from Gemini Pro on an auth flow, run a consensus debate between GPT-5 and Opus, or chain a code review through multiple models with automatic fallback. Mesh routes calls through local CLIs when possible (Gemini CLI, Codex CLI) and falls back to OpenRouter, producing identical response objects regardless of backend. Eighteen tools ship built-in, from code review and debugging to security audits and multi-model consensus with stance steering. (more: https://github.com/dgdev25/mesh)

The Needle RAG platform (distinct from NEEDLE the orchestrator) offers managed retrieval-augmented generation with a minimalist API — create a collection, add files, search semantically — with SDKs for multiple languages and drop-in integrations for LangChain and LlamaIndex. (more: https://docs.needle.app/docs/guides/hello-needle/getting-started) And in a delightful demonstration of MCP's expanding reach beyond developer tooling, a new Ableton Live MCP bridge lets AI agents control the music production DAW — creating MIDI clips, analyzing harmony, generating spectrograms, setting up mastering chains, and even capturing audio signals for agent-driven mixing feedback loops. The developer built it while nap-trapped by his baby and used Codex's /goal command to optimize for low latency, high reliability, and low token usage. (more: https://github.com/bschoepke/ableton-live-mcp)

The Invisible Failure Problem

A striking new paper from Stanford and Bigspin AI quantifies something practitioners have long suspected: AI systems fail silently far more often than they fail visibly. Analyzing 196,704 ChatGPT transcripts from the WildChat dataset, the researchers found that 78% of AI failures are invisible — the user gave no overt indication anything went wrong. The failures cluster into eight archetypes: The Drift (37.2%, the AI addresses a related but different goal), The Confidence Trap (26.4%, wrong answers delivered with total confidence), The Mystery Failure (12%), The Silent Mismatch (9%), The Contradiction Unravel (7.6%), The Death Spiral (6.9%), The Partial Recovery (6.3%), and The Walkaway (5.9%, user silently abandons). (more: https://arxiv.org/pdf/2603.15423)

The most uncomfortable finding is that these failures are structural, not capability-driven. A retrospective validation using Claude Opus classified 91% of failures as involving interactional dynamics rather than raw model limitations, and estimated that 94% would persist even with a more capable model. The single most common pattern, appearing in 79% of failures: "generate rather than clarify" — the model produces fluent output instead of surfacing ambiguity. Even if ChatGPT's overall failure rate dropped to 1%, at 2.6 billion daily messages that would still mean over 20 million daily failures involving persistent behavioral dynamics. Software development, interestingly, has disproportionately high visible failure rates — developers spot and challenge errors — while creative writing and UX design suffer particularly from The Drift.

On the training side, Google DeepMind published "Efficient Exploration at Scale," demonstrating an online RLHF algorithm that matches offline RLHF trained on 200K human preference labels using fewer than 20K — a 10x data efficiency gain, with projections reaching 1,000x at scale. Three innovations drive the result: a small "affirmative nudge" added to reinforcement signals that prevents performance collapse, an epistemic neural network modeling reward uncertainty via an ensemble of 200 MLP heads atop a shared transformer backbone, and information-directed exploration that selects response pairs to maximize the informativeness of each human query. The authors frame this as foundational to safe artificial superintelligence, arguing that efficient exploration of human preferences "should serve as a cornerstone on the path." Prior active learning literature reported only 2–5x gains on narrow prompt sets, making the 10x demonstrated result — and the projected 1,000x extrapolation — a genuine step change. (more: https://arxiv.org/pdf/2603.17378)

Gadi Evron, the veteran cybersecurity researcher, offers a useful lens for interpreting both findings. He describes what he calls "The Evron Amnesia of AI predictions" — a cognitive bias where practitioners accurately observe that some AI capability "isn't there yet," then are surprised when it arrives three months later, then retroactively claim "it has always been that way" while moving the goalpost to the next missing capability. The pattern: research leads to tweaked workflows, which become public harnesses, get redeveloped by hundreds, become a Claude Code feature, then reach mass adoption. The implication for invisible failures: today's 78% invisible failure rate feels like a permanent limitation. In six months, someone will ship a monitoring product that detects most of these patterns, and we'll wonder why it took so long. (more: https://www.linkedin.com/posts/gadievron_i-have-a-process-for-predicting-ai-trends-activity-7459263576118239232-ukI7)

The Developer's Evolving Toolkit

Synaptic-Tuner is an agentic-first toolkit covering the full LLM fine-tuning pipeline — synthetic data generation, SFT/KTO/GRPO training, evaluation, GGUF quantization, and deployment — designed to be operated by AI coding agents rather than humans memorizing CLI flags. The repo ships with project skills (Markdown files with progressive disclosure) that work with Claude Code, Cursor, Windsurf, and other AI coding tools. HuggingFace Jobs is the canonical cloud path, with local Docker training via Unsloth as the supporting path for GPU owners. Each stage is config-driven through YAML. (more: https://github.com/ProfSynapse/Synaptic-Tuner)

The AI labor story surfaced in a detailed investigation into the hidden workforce training AI systems. The report documents data workers across the US earning median annual incomes under $23,000, with 86% struggling to meet financial responsibilities, a quarter relying on public assistance, and more than one in five having experienced homelessness. Ivy League PhD graduates are taking $35/hour contract work from platforms like Mercor and Scale AI, only to have contracts pulled within weeks. One worker described reviewing AI-generated videos of extreme violence for a project codenamed "Arsenic" and being asked to perform tasks far outside their expertise — grading calculus problems as a philosophy major, providing counseling advice with no training. A new California bill, AB 2653 (the Sweatshop-Free AI Procurement Act), would require state AI procurement to verify labor standards compliance. Economist Daron Acemoglu warns that the resulting inequality "could be something we've never experienced — a handful of corporations controlling most of work." (more: https://youtu.be/W79FW7iUkro?si=WfRdFovgw7QwhBkx)

On the local speech synthesis front, a C++ port of Echo-TTS brings multi-speaker text-to-speech to native deployment outside the Python stack. The port uses GGML for the diffusion transformer and ONNX Runtime for the DAC autoencoder, with model files at ~3.3GB (Q8) or ~5.6GB (F16). It includes an OpenAI-compatible server mode, multi-voice support with reference WAV conditioning, and pre-built portable binaries with bundled CUDA 12.8 and cuDNN 9.21. The project represents the continuing push toward systems-language reimplementation of AI inference — the same motivation driving Rust rewrites across the inference stack. (more: https://www.reddit.com/r/LocalLLaMA/comments/1t5lw8b/a_c_port_of_echotts/)

Machine Cognition: Dreams, Souls, and Digital Monopolies

If continuous machine cognition requires memory, and raw accumulation without processing is hoarding rather than learning, then machines that think continuously probably need to sleep. That is the thesis of Part 2 of "What Happens When the Machine Never Stops Thinking," which maps five functions of human sleep — hippocampal replay, synaptic pruning, cross-regional association during REM, scenario simulation, and emotional recalibration — onto a proposed architecture for LLM dreaming. The system would capture experience during active inference (input-output pairs, uncertainty markers, contradiction flags), then run consolidation phases: triage to deduplicate and group, deep dreaming to compress episodes into principles and generate cross-domain connections, adversarial dreaming to stress-test beliefs, and integrity verification to prevent hallucination accumulation. The author proposes temperature modulation as an analogue to sleep stages — high temperature for wild association (REM-like), low temperature for careful consolidation (slow-wave-like). The most practically interesting idea is "micro-consolidation" between interactions: brief processing after each conversation to compress, extract insights, and flag contradictions without full offline dreaming cycles. (more: https://agentics.org.nz/blog/what-happens-when-the-machine-never-stops-thinking-part-2)

Theoretical physicist Carlo Rovelli, writing in Noema Magazine, argues there is no "hard problem of consciousness" at all. Chalmers's 1994 distinction between the "easy" problem (understanding brain processes that produce behavior) and the "hard" problem (explaining why brain processes are accompanied by experience) is, in Rovelli's view, a rhetorical trick that smuggles dualism in through the front door. The philosophical zombie thought experiment is self-defeating: a physically identical zombie twin would undergo the same brain processes during introspection and reach the same conclusion about having consciousness, which means introspection cannot distinguish genuine consciousness from its absence. Rovelli's position is that consciousness is a natural phenomenon — difficult to understand for the same reason thunderstorms are difficult to understand, not because it belongs to a separate metaphysical domain. The soul is real and is part of nature. "It is time to give up the pernicious dualism introduced by the debate on consciousness and embrace the reality that our soul, or our spiritual life, is consistent with our fundamental physics." For the AI community, this has a pointed implication: if there is no metaphysical gap between mind and body, then the question of machine consciousness is purely empirical — a matter of engineering complexity, not ontological impossibility. (more: https://www.noemamag.com/there-is-no-hard-problem-of-consciousness)

GrapheneOS flagged a less philosophical but equally consequential form of control: Apple and Google are gradually expanding their use of hardware-based attestation through Play Integrity API and App Attest API, and both are bringing it to the web via Privacy Pass. The same TPM and secure enclave infrastructure that security researchers championed as the foundation for prompt integrity and software verification becomes, from another angle, the mechanism by which a handful of silicon vendors and platform operators could lock out independent software deployment entirely. When attestation roots are controlled by platform gatekeepers, "security" and "monopoly enforcement" become architecturally indistinguishable. For the AI inference community — where running models locally on your own hardware is the entire point — hardware attestation that decides which software is allowed to execute is an existential concern hiding in plain sight. (more: https://grapheneos.social/@GrapheneOS/116550899908879585)

Sources (22 articles)

  1. Mythos Finds a Curl Vulnerability (daniel.haxx.se)
  2. [Editorial] Video — AI Development Perspectives (youtu.be)
  3. [Editorial] Anthropic Donating Open Source Petri (anthropic.com)
  4. FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 (reddit.com)
  5. [Editorial] RuVector Sparse Attention Crate (github.com)
  6. AMD to release slottable GPU (reddit.com)
  7. Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts (reddit.com)
  8. Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store (reddit.com)
  9. Agents need control flow, not more prompts (bsuh.bearblog.dev)
  10. [Editorial] NEEDLE — AI Search & Retrieval Tool (github.com)
  11. [Editorial] Mesh — Agent Mesh Framework (github.com)
  12. [Editorial] NEEDLE Getting Started Guide (docs.needle.app)
  13. bschoepke/ableton-live-mcp — MCP Bridge for Ableton Live (github.com)
  14. [Editorial] Arxiv Research Paper 2603.15423 (arxiv.org)
  15. [Editorial] Arxiv Research Paper 2603.17378 (arxiv.org)
  16. [Editorial] Gadi Evron — Predicting AI Trends (linkedin.com)
  17. [Editorial] Synaptic-Tuner — LLM Tuning Framework (github.com)
  18. [Editorial] Video — AI Tools & Frameworks (youtu.be)
  19. A C++ port of Echo-TTS (reddit.com)
  20. [Editorial] What Happens When the Machine Never Stops Thinking (Part 2) (agentics.org.nz)
  21. [Editorial] There Is No Hard Problem of Consciousness (noemamag.com)
  22. Hardware Attestation as Monopoly Enabler (grapheneos.social)