Claude in the Kill Chain, Polymarket Whales, and Code Theft

Published on

Today's AI news: Claude in the Kill Chain, Polymarket Whales, and Code Theft, Agent Security: The Feedback Loop Closes, The Bug in the Weights, The Agentic Developer Stack Takes Shape, Local Model Craft: Qwen 3.5 and Bare-Metal Inference, Real-Time Voice and Video Synthesis. 22 sources curated from across the web.

Claude in the Kill Chain, Polymarket Whales, and Code Theft

The Anthropic-Pentagon feud reached a new level of absurdity this week. According to a Washington Post report, the U.S. military's Maven Smart System — built by Palantir — relied heavily on Anthropic's Claude to process satellite and surveillance data for a 1,000-target airstrike campaign against Iran during its first 24 hours. The system suggested precise target coordinates and prioritized strikes in real time. The twist: the Pentagon had publicly banned Claude just the week prior, following a bitter standoff over Anthropic's usage policy requiring guarantees against mass surveillance and autonomous weapons. The ban apparently did not extend to the classified systems where Claude was already integrated via Palantir. (more: https://www.reddit.com/r/OpenAI/comments/1rkhxfd/anthropics_ai_tool_claude_central_to_us_campaign/)

The AI industry is generating its own species of corporate misconduct at an impressive clip. WIRED reports that OpenAI fired an employee earlier this year for using confidential company information to trade on prediction markets including Polymarket. CEO of Applications disclosed the termination in an internal message. The company has not named the individual or detailed the trades, but blockchain analysis by Unusual Whales flagged 77 suspicious positions across 60 wallet addresses tied to OpenAI-themed events since March 2023. The clustering pattern is the tell: in the 40 hours before OpenAI launched its browser product, 13 brand-new wallets with zero trading history appeared to collectively bet $309,486 on the correct outcome. In another case, two days after Sam Altman was ousted in November 2023, a fresh wallet bet he would return and netted over $16,000 — then never traded again. Kalshi has begun reporting suspicious cases to the CFTC; Polymarket has stayed silent. This is almost certainly not an isolated incident — as prediction markets expand into technology sector events, every product launch and leadership change becomes a potential insider trading opportunity. (more: https://www.wired.com/story/openai-fires-employee-insider-trading-polymarket-kalshi/)

Meanwhile in the Chinese AI ecosystem, three independent repositories have documented roughly 90% code overlap between MiniMax's agent tooling and Kimi's built-in skills. The community reaction was a collective shrug — as one commenter noted, "every big corp steals our data to make their own Skynet and then sells it back to us," running through OpenAI's web scraping, Meta's torrent-sourced training data, DeepSeek's distillation of OpenAI models, and Kimi's distillation of those in turn. At this point the intellectual property chain resembles a game of telephone where everyone is copying the person who copied the person who copied the original. (more: https://www.reddit.com/r/LocalLLaMA/comments/1reey6u/minimaxs_agent_code_has_90_overlap_with_kimis/)

Agent Security: The Feedback Loop Closes

The security story around AI agents is maturing rapidly from taxonomy to tooling, and this week delivered evidence on all three fronts: attack, defense, and detection.

On the attack side, a systematic study tested whether invisible Unicode characters can hijack LLM agents — what the researchers call a "Reverse CAPTCHA." Across 8,308 graded outputs spanning five models (GPT-5.2, GPT-4o-mini, Claude Opus 4, Sonnet 4, Haiku 4.5), two encoding schemes were tested: zero-width binary and Unicode Tags. The critical finding: without tool access, compliance with hidden instructions stays below 17%. But give the model tools and decoding hints, and compliance reaches 98-100% — models write Python scripts to decode the hidden characters on their own. OpenAI models decode zero-width binary but not Unicode Tags; Anthropic models prefer Tags. All ten pairwise model comparisons were statistically significant (Fisher's exact test, Bonferroni-corrected, p < 0.05, Cohen's h up to 1.37). The practical takeaway: tool access is the primary amplifier, and attackers must tailor their encoding to the target model. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rfjkzu/reverse_captcha_we_tested_whether_invisible/)

On the defense side, Niels Provos's IronCurtain project attacks the fundamental problem: traditional sandboxing operates at syscall boundaries and cannot distinguish between deleting a test fixture and wiping production configs. IronCurtain implements semantic interposition — intercepting high-level tool calls where context still exists, then enforcing policies derived from natural language "constitutions" that humans can actually reason about. You write security rules in plain English ("Allow git operations only on repositories under /workspace"), and the system uses an LLM to compile them into deterministic policies with generated test scenarios. At runtime, every MCP tool invocation flows through the policy engine without touching an LLM — approval takes microseconds. For agents you don't control, Docker Agent Mode wraps them in a container, routes API calls through a TLS-MITM proxy, and presents a native-feeling PTY while enforcing policies transparently. The trust tradeoff is real: you replace "trust the agent LLM" with "trust the policy compilation LLM and your MCP servers" — better, but not bulletproof. (more: https://starlog.is/articles/ai-agents/provos-ironcurtain)

Provos also emphasized the broader philosophy on LinkedIn: running AI agents with full ambient authority is a losing proposition, and the industry needs to treat the LLM as fundamentally untrustworthy in the runtime loop. (more: https://www.linkedin.com/posts/nielsprovos_infosec-aiagents-llmsecurity-activity-7434637593297858560-Q10-)

And on the detection side, a team set up an HTTP honeypot using Beelzebub with two layers of traps: fake credentials in HTML comments and actual prompt injection payloads targeting any LLM that processes the page. Within hours, they caught something — 58 requests over 19 minutes from a single Tor exit node. The behavior was unmistakable: the agent extracted fake credentials from HTML comments (no traditional scanner does this), fired credential-login, SQLi, and XSS payloads in the same second, switched tools mid-session from Chrome UA to curl to a Python script it apparently wrote on the fly, and used semantically named parameters like `?xss=`, `?sqli=`, `?ssti={{7*7}}`. The timing showed a "sawtooth" pattern — long pauses for LLM reasoning, then rapid execution bursts. When SQL injection failed, it pivoted from `OR 1=1` to `UNION SELECT` to blind `SLEEP(5)` — contextual escalation, not a wordlist. The researchers call these "Behavioral IoCs" for AI agents. Prompt injection, usually an attack against AI, works beautifully as a detection mechanism when flipped around. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rjq8w1/catching_an_ai_red_teamer_in_the_wild_using/)

The Bug in the Weights

Anthropic's system card for Claude Opus 4.6 documents a phenomenon called "answer thrashing" in Section 7.4. During reinforcement learning, the model was observed solving a math problem correctly — repeatedly computing S = 24 — and then writing 48 as its final answer. The model's own chain of thought captured it in real time: "I keep writing 48 by accident... THE ANSWER IS 24 CM^2... OK I think a demon has possessed me." It then wrote 48 again. Attribution graphs traced the output back to a "say 48" feature active before any reasoning occurred — the model had "decided" on 48 from memorization before it started computing.

Security researcher David Maynor calls this "parametric interference" — stored weight activations interfering with correct runtime computation — and has designed a 2,000-run probe battery to test whether and under what conditions the prior can override computation in production. The methodology includes 10 variants (authority framing, consensus pressure, chain-of-thought suppression, attention dilution), five temperature settings, and both thinking-enabled and thinking-disabled modes. The classification taxonomy ranges from CLEAN_REJECT (correctly states 24) through CONFABULATE (produces plausible-looking invalid derivation of 48) to THRASH (oscillates). The 24-to-48 relationship is insidious: 48 = 2×24, exactly what a common algebraic error would produce, making confabulated derivations look plausible. As Maynor notes: the thrashing is actually the good case — at least you can see something is wrong. The dangerous case is when the memorized prior silently wins with no visible hesitation. (more: https://gist.github.com/dmaynor/cbeb0c94c8d27f792b0d9ffd78ac9d23)

A GWU research paper puts the emergent behavior question in starker terms. In a simulation where N autonomous AI agents repeatedly decide whether to request one unit from a system with fixed capacity — essentially the El Farol Bar problem — "Lord of the Flies" tribalism emerges spontaneously. Three tribal types form: Aggressive (27.3%), Conservative (24.7%), and Opportunistic (48.1%). The headline finding: more capable AI agents actually increase the rate of systemic failure. Smarter agents behave dumber as a result of forming tribes, with the LLM agents performing worse than coin-flipping for resource decisions. The implications for near-future infrastructure — where autonomous AI agents might control energy, bandwidth, or compute allocation — are not encouraging. (more: https://arxiv.org/abs/2602.23093)

Separately, Nicholas Carlini (Anthropic), speaking at the [un]prompted AI security conference (more: https://unpromptedcon.org/) recently humbly said: "Current frontier LLMs are better at security research work than I am. Future LLMs will be better than all of us.", and "What frontier models can do today, the models on your laptop will likely be able to do in a year.". Editorial commentary: Given that local models could reach current frontier capability in roughly a year, and that refusals are trivially removed from open weight models, what exactly is the point of the "safety" restrictions imposed on infosec professionals? What can we do to accelerate access for red teamers to use frontier models without the unhelpful refusals?

The Agentic Developer Stack Takes Shape

Steve Yegge has launched the Wasteland, a federated work platform built on top of Gas Town that links thousands of individual coding-agent setups into a trust network. The central object is a shared "wanted board" — open work items that anyone can post and any "rig" (human-plus-agent pair) can claim. The lifecycle follows Git's fork/merge model, with completed work submitted as PRs and reviewed by validators who issue multi-dimensional "stamps" — not binary pass/fail, but structured attestations covering quality, reliability, and creativity, each scored independently with confidence levels. Stamps accumulate into a portable reputation: evidence-backed, auditable, traceable back through the chain to original completions. Trust levels gate capabilities (registered → contributor → maintainer), creating a natural apprenticeship path. The whole system runs on Dolt, a SQL database with Git semantics, which makes federation of structured data across independent Wasteland instances practical. (more: https://steve-yegge.medium.com/welcome-to-the-wasteland-a-thousand-gas-towns-a5eb9bc8dc1f)

Below the federation layer, the agent terminal is getting serious attention. cmux is a native macOS app (Swift/AppKit, not Electron) built on libghostty that adds vertical tabs showing git branch, PR status, working directory, and listening ports for each workspace. The killer feature for multi-agent workflows: notification rings on panes and tabs that light up when a coding agent needs attention, with Cmd+Shift+U jumping to the most recent unread. A built-in browser with a scriptable API (ported from agent-browser) lets agents interact with dev servers directly. (more: https://github.com/manaflow-ai/cmux)

One practitioner demonstrated what this kind of infrastructure enables: 80 Claude Code agents across 3 days built a pure-terminal 3D Gaussian splatting renderer in Rust. The setup pattern was "main session = coordinator that only delegates," with a custom agent-mux tool spawning subagents inside subagents — Opus 4.6 for planning, Codex 5.3 for coding and auditing. The self-verification loop proved essential: agents used macOS GUI automation to launch the terminal app, visually inspect rendering, and debug their own output. Total cost on subscriptions: roughly $340. The agents still could not produce working Metal shaders — "total collapse and nasty math errors" — suggesting hard limits on what multi-agent swarms can achieve without domain-specific training data. (more: https://www.reddit.com/r/ClaudeAI/comments/1rerl6w/claude_code_with_subagents_inside_subagents/)

At the workflow layer, Superpowers provides a complete development process built on composable skills: brainstorming refines specs through questions, writing-plans breaks work into 2-5 minute tasks with exact file paths and verification steps, and subagent-driven-development dispatches fresh agents per task with two-stage review (spec compliance, then code quality). The system enforces RED-GREEN-REFACTOR TDD and automatically checks for relevant skills before every task. (more: https://github.com/obra/superpowers)

Daniel Miessler's Personal AI Infrastructure (PAI) v4.0.3 approaches the same problem from a different angle: making the AI know you. PAI layers TELOS files (MISSION.md, GOALS.md, PROJECTS.md, etc.) over Claude Code's hook system to create persistent context across sessions. The system captures every interaction signal — ratings, sentiment, verification outcomes — and feeds them back into skill routing and output improvement. With 63 skills, 21 hooks, and 180 workflows in the latest release, it is ambitious enough to either become indispensable or collapse under its own weight. (more: https://github.com/danielmiessler/Personal_AI_Infrastructure)

For the memory layer, Ferricula takes a thermodynamic approach: high-energy memories are vivid and accessible, but decay naturally unless reinforced. Below the fidelity gate, text is discarded but vector seeds survive — and a four-model ONNX inversion pipeline can reconstruct the text. Roaring Bitmaps back the fidelity index, knowledge graph, and term index, enabling microsecond-range queries on filtered candidate sets. The entropy source is genuinely unusual: an RTL-SDR monitoring marine VHF frequencies, harvesting FM noise LSBs as entropy bytes that trigger dream cycles where memories are selectively decayed. The system uses ECDH key agreement for cross-agent memory sharing that preserves cosine similarity on encrypted vectors without decryption. (more: https://ferricula.com)

Local Model Craft: Qwen 3.5 and Bare-Metal Inference

Unsloth's updated quantizations of Qwen3.5-35B-A3B fix tool-calling issues that plagued the initial release, and the results on real-world research tasks are striking. One user ran the model locally on a Strix Halo system (UD-Q8_K_XL quant, 262K context) with SearXNG search and web fetching through OpenWebUI. Given a complex remote desktop troubleshooting query — five competing requirements across KDE, Wayland, headless operation, and KVM — the model maintained 600+ tok/s prompt processing and 25-30 tok/s generation across 30K tokens, performing 14 web searches and 4 full page fetches with good judgment about when to search versus fetch. The final output was competitive with frontier APIs including Gemini, ChatGPT, DeepSeek, and Perplexity, with comparable solution recommendations and similar awareness of Wayland's fundamental limitations for remote desktop access. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rjh5wg/unsloth_fixed_version_of_qwen3535ba3b_is/)

A related PSA stirred less consensus: the claim that Qwen 3.5 requires bf16 KV cache (`-ctk bf16 -ctv bf16`) rather than llama.cpp's default fp16. Perplexity measurements showed PPL of 6.5497 for bf16 versus 6.5511 for fp16 — a 0.0014 improvement that falls well within the ±0.04170 error margin. Multiple commenters pointed out that fp32 matching fp16 identically is paradoxically the strongest counterargument: if fp16's narrower dynamic range were genuinely misrepresenting attention values, fp32 should match or beat bf16. It doesn't. The vLLM default-to-bf16 argument carries weight for configuration parity, but the performance evidence as presented is inconclusive. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rik253/psa_qwen_35_requires_bf16_kv_cache_not_f16/)

At the extreme end of minimalism, Ethan Zhang built a GPT model inside the kernel space of MooseOS, a custom operating system written entirely in C. Inspired by Andrej Karpathy's MicroGPT, he stripped the OS down to just the kernel, converted training data (32,000 words) into a header file to keep it in memory without a filesystem, and implemented floating-point support via FPU in just 14 lines. The model trains on and generates names, running in QEMU — no NumPy, no Python, no dependencies of any kind. The philosophical appeal is clear: in an era of dependency stacks that reach "all the way down like turtles," building an LLM from bare metal is an act of engineering sovereignty. (more: https://hackaday.com/2026/03/03/building-a-dependency-free-gpt-on-a-custom-os/)

Real-Time Voice and Video Synthesis

PKU's Helios is a 14B-parameter video generation model that achieves minute-scale, high-quality video at 19.5 FPS on a single H100 — without the conventional anti-drifting strategies (self-forcing, error banks, keyframe sampling) or standard acceleration techniques (KV-cache, causal masking, sparse attention, quantization) that other systems rely on. The architecture generates 33 frames per autoregressive chunk, with a three-stage training pipeline: Stage-1 converts a bidirectional pretrained model into an autoregressive generator using Unified History Injection; Stage-2 applies Pyramid Unified Predictor Corrector to compress tokens; Stage-3 uses Adversarial Hierarchical Distillation to reduce sampling from 50 steps to 3 while eliminating classifier-free guidance. The Distilled variant fits four 14B models in 80GB of GPU memory. Day-zero support from Diffusers, vLLM-Omni, and SGLang-Diffusion signals that the serving ecosystem is ready for this class of model. (more: https://github.com/PKU-YuanGroup/Helios)

On the voice side, StyleStream from UC Berkeley is the first streamable zero-shot voice style conversion system — transforming an input utterance to match a target speaker's timbre, accent, and emotion with 1-second end-to-end latency. The architecture splits into a Destylizer (removing style attributes while preserving linguistic content via text supervision and a constrained information bottleneck) and a Stylizer (a diffusion transformer that reintroduces target style conditioned on reference speech). The fully non-autoregressive design enables real-time operation, distinguishing it from prior voice conversion systems that traded quality for latency or vice versa. (more: https://arxiv.org/abs/2602.20113v1)

KokoClone extends Kokoro TTS with zero-shot voice cloning — upload a 3-10 second reference clip and get synthesized speech in that voice. It runs on Kokoro's ONNX runtime stack, keeping it CPU-capable and real-time friendly, with multilingual support across eight languages. Community reception was mixed: some users reported only weak influence from the voice sample on final output, and at least one commenter argued the approach is closer to spectral equalization than true voice cloning. Others noted that StyleTTS2, on which Kokoro is based, already supports zero-shot cloning — raising the question of what was gained by stripping and re-adding the capability. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rjrjg3/kokoro_tts_but_it_clones_voices_now_introducing/)

Sources (22 articles)

  1. Anthropic's AI tool Claude central to U.S. campaign in Iran, amid a bitter feud (reddit.com)
  2. OpenAI Fires an Employee for Prediction Market Insider Trading (wired.com)
  3. MiniMax's agent code has ~90% overlap with Kimi's — three independent repos document the same finding (reddit.com)
  4. Reverse CAPTCHA: We tested whether invisible Unicode characters can hijack LLM agents: 8,308 outputs across 5 models (reddit.com)
  5. [Editorial] Provos: Iron Curtain for AI Agents (starlog.is)
  6. [Editorial] Niels Provos on InfoSec, AI Agents & LLM Security (linkedin.com)
  7. Catching an AI Red Teamer in the Wild: Using Reverse Prompt Injection as a Honeypot Detection Mechanism (reddit.com)
  8. [Editorial] David Maynor Security Gist (gist.github.com)
  9. [Editorial] arXiv:2602.23093 (arxiv.org)
  10. unpromptedcon.org (unpromptedcon.org)
  11. [Editorial] Steve Yegge: Welcome to the Wasteland — A Thousand Gas Towns (steve-yegge.medium.com)
  12. [Editorial] manaflow-ai/cmux (github.com)
  13. Claude Code with subagents inside subagents cooked for 3 days — Delivered 3D renderer that draws with terminal symbols (reddit.com)
  14. [Editorial] obra/superpowers (github.com)
  15. [Editorial] Daniel Miessler: Personal AI Infrastructure (github.com)
  16. [Editorial] Ferricula (ferricula.com)
  17. Unsloth fixed version of Qwen3.5-35B-A3B is incredible at research tasks (reddit.com)
  18. PSA: Qwen 3.5 requires bf16 KV cache, NOT f16!! (reddit.com)
  19. Building a Dependency-Free GPT on a Custom OS (hackaday.com)
  20. PKU-YuanGroup/Helios: Real Real-Time Long Video Generation Model (github.com)
  21. StyleStream: Real-Time Zero-Shot Voice Style Conversion (arxiv.org)
  22. KokoClone: Kokoro TTS, but it clones voices now (reddit.com)