Claude Code: The Morning After

Published on April 2, 2026

Today's AI news: Claude Code: The Morning After, Agentic Coding: Freedom, Friction, and Trust, Agent Infrastructure: Sandboxes, Screens, and Swarms, Local AI: Silicon, Bits, and Battlefields, Model Architecture and Training Frontiers, AI Governance: Two Paradigms, One Collision. 22 sources curated from across the web.

Claude Code: The Morning After

Anthropic's source map fiasco dropped the full Claude Code TypeScript source onto npm for the world to mirror, and the learnings are still expanding. Two detailed teardowns — one by Alex Kim, one by Kuber Mehta — have cataloged interesting artifacts from the 900-file, 512,000-line bundle. We now have sketches of their unpublished product roadmap: KAIROS, an unreleased always-on autonomous agent mode with 5-minute refresh cycles and push notifications; an anti-distillation system that injects fake tool definitions into API requests to poison competitor training data; "Undercover Mode," which tells Claude to hide the fact it's an AI when Anthropic employees contribute to open-source repos; and native client attestation that uses Zig-level hash replacement below the JavaScript runtime — essentially DRM for API calls, and the technical enforcement behind Anthropic's legal fight with OpenCode. (more: https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/)

The anti-distillation mechanism sounds impressive until you read the activation logic: a MITM proxy stripping one field from request bodies would bypass it entirely, and setting a single environment variable disables the whole thing. The real protection, as Kim notes, was always legal rather than technical. The client attestation is similarly fragile — it only works inside the official Bun binary, and a server-side function that "tolerates unknown extra fields" suggests the validation may be more forgiving than a DRM system should be. The 250,000 wasted API calls per day from a compaction bug, fixed with three lines of code, is the kind of detail that makes the Tamagotchi companion pet system ("Buddy," with 18 species, gacha rarity tiers, and RPG stats like DEBUGGING and SNARK) feel less whimsical and more indicative of a codebase growing faster than its review processes. The leak's root cause — a known Bun bug shipping source maps in production mode, filed on March 11 and still open — is the kind of irony that writes itself when your toolchain was partly built by your own AI. (more: https://github.com/Kuberwastaken/claurst)

What happened next is arguably more consequential than the leak itself. Within hours, a clean-room Rust reimplementation appeared — spec first, implementation second, explicitly modeled on the Phoenix v. IBM BIOS precedent. Meanwhile, Dragos Ruiu posted a detailed walkthrough of Claude building a complete FreeBSD remote kernel RCE (CVE-2026-4747), from advisory to root shell in about four hours. The target was relatively soft — FreeBSD 14.x ships without KASLR, the overflowed buffer had no stack canaries — but the model solved ROP chain construction, multi-packet shellcode delivery, and kernel-to-userland transition with roughly 44 human redirections across the session. Strip away those steering prompts and the model doesn't get there on its own. Not yet. But the gap between "controlled demo on a soft target" and "reliable autonomous exploitation" is closing faster than most defenders are calibrating for. (more: https://www.linkedin.com/posts/dragosruiu_mad-bugs-claude-wrote-a-full-freebsd-remote-activity-7444984299243855873-n479)

Agentic Coding: Freedom, Friction, and Trust

The most thought-provoking piece this week argues that AI coding agents are about to make free software matter again — not open source in the corporate-friendly sense, but Stallman's original four freedoms. The reasoning is straightforward: when an agent can read, understand, and modify source code on your behalf, access to that source stops being a symbolic right for programmers and becomes a practical capability for everyone. The author tested this by trying to build a simple tweet-to-task workflow using Sunsama, a closed SaaS product. The result: six layers of workarounds, three authentication mechanisms, a dependency on a stranger's reverse-engineered API, and an iOS Shortcut he had to build by hand because Apple provides no programmatic way to create one. With free software, the agent reads the source, understands the data model, makes the change. Ten minutes. The AGPL's limited adoption, Google's public ban on AGPL code, and the licensing chaos from MongoDB to HashiCorp to Redis all get a clear-eyed review. The counterpoint is equally honest: a CEU working paper argues vibe-coding kills open source by severing the user-maintainer feedback loop, Adam Wathan reports Tailwind documentation traffic down 40% and revenue down 80%, and Mitchell Hashimoto moved Ghostty to a vouch-based contribution model to stem the AI-generated PR flood. (more: https://www.gjlondon.com/blog/ai-agents-could-make-free-software-matter-again/)

GitHub's own relationship with developers took another hit when Copilot started injecting ads — GitHub calls them "tips" — into pull requests. A developer discovered that after a coworker asked Copilot to fix a typo, the bot inserted a promotion for Raycast into his PR description. Over 11,400 PRs received similar injections. GitHub's VP of developer relations acknowledged that letting Copilot touch PRs it didn't create "became icky," and the feature was killed within hours. The official line — "GitHub does not and does not plan to include advertisements" — is technically true only if you accept that a promotional message with a download link inserted into someone else's code review is a "tip" rather than an ad. (more: https://www.theregister.com/2026/03/30/github_copilot_ads_pull_requests/)

The trust problem extends deeper than ads. A developer building Claude Code skills for iOS design system auditing discovered his tools missed more than half the violations in his codebase. The root cause: many icons inherited their color from the system accent — there was nothing in the code to grep for. The tool was searching for things that looked wrong but couldn't find things that looked like nothing. His fix — enumerate every candidate file, then verify each one for the correct pattern — found 71 violations where grep-based scanning found 31. The insight generalizes: any AI audit tool that searches for anti-patterns will miss violations defined by the absence of correct patterns. (more: https://www.reddit.com/r/ChatGPTCoding/comments/1s6cjaq/how_do_you_know_your_ai_audit_tool_actually/) Greptile's analysis pushes back on the doom narrative from the other direction: AI models will write good code because economic incentives demand it. Complex code requires more tokens and more compute; simple, maintainable code is cheaper to generate and cheaper to modify. Competition between models will select for quality. The argument is sound in theory, though the current evidence — median PR size up 33%, outages steadily increasing since 2022 — suggests the selection pressure hasn't kicked in yet. (more: https://www.greptile.com/blog/ai-slopware-future)

Agent Infrastructure: Sandboxes, Screens, and Swarms

NVIDIA's OpenShell landed this week as an open-source runtime that sits between coding agents and infrastructure, governing execution, filesystem visibility, GPU pass-through, and inference routing. One policy update to wire agents to local inference, zero code changes — Claude Code, OpenCode, or any agent runs unmodified inside the sandbox. The pitch is "toddlers with sudo privileges need a chaperone with security clearance," which is more honest than most agent infrastructure marketing. (more: https://www.linkedin.com/posts/ownyourai_i-just-moved-my-coding-agents-to-the-new-activity-7445359702517059584-vCBz)

On the desktop agent front, H Company's Holo3 posted 78.85% on OSWorld-Verified, establishing a new state of the art for computer use — and it does so with only 10B active parameters (122B total MoE), at a fraction of the cost of GPT 5.4 or Opus 4.6. The training pipeline is built around a synthetic environment factory that reproduces enterprise systems and generates verifiable tasks, with a 486-task corporate benchmark spanning e-commerce, collaboration, and multi-app workflows. The harder tasks require coordinating across multiple applications — retrieving equipment prices from a PDF, cross-referencing budgets, and sending personalized approval emails — the kind of sustained multi-step reasoning where most agents lose state or intent. The 35B-A3B variant ships under Apache 2.0. (more: https://huggingface.co/blog/Hcompany/holo3)

OpenYak takes the desktop agent local: an open-source AGPL-licensed app built on Tauri v2 (Rust) + Next.js + FastAPI that connects to 100+ models from 20+ providers and runs 20+ built-in tools for file management, data analysis, and document creation — all without data leaving your machine. It auto-detects local Ollama models with full tool-calling support. (more: https://github.com/openyak/openyak) Meanwhile, Cole Medin's live demo of multi-agent coding workflows with persistent memory shows where the orchestration layer is heading: agents that know your codebase, remember decisions from weeks ago, and coordinate research, implementation, and verification phases in parallel. (more: https://www.youtube.com/live/hdCZUNQ40VY)

Local AI: Silicon, Bits, and Battlefields

Apple's M5 Max is delivering real gains for local inference. Benchmarks against M3 Max (both 128GB, 40 GPU cores) across three Qwen 3.5 models show 1.4x to 1.7x improvement in token generation speed, with the gap widening dramatically at longer contexts — the 27B dense model hits 19.6 tok/s at 65K context on M5 Max versus 6.8 on M3 Max, a 2.9x advantage. Prefill improvements reach 4x at long context, driven by GPU Neural Accelerators. For agentic workloads, batching matters most: M5 Max scales to 2.54x throughput at 4x batch on the 35B-A3B MoE, while M3 Max batching on dense models actually degrades. The 614 GB/s versus 400 GB/s bandwidth gap is significant for multi-step agent loops. (more: https://www.reddit.com/r/LocalLLaMA/comments/1s5np41/m5_max_vs_m3_max_inference_benchmarks_qwen35_omlx/)

At the extreme end of efficiency, Prism's 1-Bit Bonsai models claim the title of first commercially viable 1-bit LLMs. The 8B variant requires only 1.15GB of memory — a 14x reduction from full precision — runs 8x faster, uses 5x less energy, and reportedly matches leading 8B models on benchmarks spanning IFEval, GSM8K, HumanEval+, BFCL, MuSR, and MMLU-Redux. The 4B model hits 132 tok/s on an M4 Pro; the 1.7B reaches 130 tok/s on an iPhone 17 Pro Max. If the benchmark claims hold under independent scrutiny, this moves 1-bit quantization from research curiosity to deployment option for robotics, real-time agents, and edge computing. (more: https://prismml.com/)

AMD's Lemonade server positions itself as the open-source local inference layer that works across GPUs and NPUs — a 2MB service with automatic dependency configuration, multi-model concurrency, and OpenAI-compatible APIs across Windows, Linux, and macOS. It integrates llama.cpp, Ryzen AI SW, and FastFlowLM, with Qwen3.5-4B now available on NPU. (more: https://lemonade-server.ai) On the quantization craft side, one developer spent 48 hours saturating Qwen 3.5 with 2 million tokens of high-density calibration data to produce what he calls the "Sovereign Series" — 75 GGUF models from 0.8B to 27B targeting what he terms "quantization-slop," the subtle blurring of logic and linguistic nuance that standard 40K-line calibration gists introduce. The 27B "Colossus" hit 8.71 PPL after 12 hours of dedicated compute. Community testing was mixed: one user found the 9B model couldn't follow a "respond in 30 words" constraint that vanilla Qwen3.5 handled cleanly. (more: https://www.reddit.com/r/ollama/comments/1s7sg3i/i_spent_48_hours_saturating_qwen_35_with_2000000/)

The most ambitious application of local inference this week is a solo developer's medieval RPG where every significant NPC runs on a local uncensored LLM. The stack: Unreal Engine 5, Ollama as a child process, Dolphin-Mistral 7B Q4, Whisper for voice input, and Piper TTS for per-NPC voices with lip sync. No dialogue wheel — you say whatever you want, and the NPC responds and remembers. The community response was instructive: experienced builders immediately flagged context window degradation over long sessions, suggested fine-tuning over system prompts for character consistency, recommended switching from Ollama to raw llama.cpp for performance, and from Piper to Kokoro TTS for quality. The consensus: the concept is compelling, but "production hell awaits." Memory management as context grows is the decisive bottleneck, and no one has solved it at scale yet. (more: https://www.reddit.com/r/LocalLLaMA/comments/1s9kpn4/im_building_a_medieval_rpg_where_every/)

Model Architecture and Training Frontiers

TII's Falcon Perception is a 0.6B-parameter early-fusion Transformer that handles open-vocabulary grounding and segmentation from natural language prompts — no separate vision backbone, no decoder pipeline, just one model processing image patches and text tokens in a shared parameter space with a hybrid attention mask. On the SA-Co benchmark it reaches 72.7 mAP versus SAM 3's 68.3, with outsized gains on attribute-heavy (+8.2) and food/drink (+12.2) categories. The Chain-of-Perception output interface decomposes each instance into coordinate, size, and segmentation steps using Fourier features and dot-product masks rather than Hungarian matching. Alongside it, Falcon OCR — a 0.3B variant trained from scratch — hits 80.3% on olmOCR and 88.6 on OmniDocBench while running 3x faster than 0.9B-class competitors. Trained on 54M images with 195M positive expressions and 488M hard negatives across 700 GPU-hours, the model treats presence calibration as a first-class target with strict 1:1 positive-to-negative sampling. (more: https://huggingface.co/blog/tiiuae/falcon-perception)

Hugging Face's TRL reaches v1.0, marking the post-training library's transition from research codebase to stable infrastructure. Now implementing 75+ methods across SFT, DPO, GRPO, RLOO, and reward modeling, TRL is downloaded 3 million times a month. The design philosophy is deliberately anti-abstraction: independent implementations over shared base classes, accepted code duplication over premature generalization. The roadmap targets asynchronous RL (decoupling generation from training), making training legible to agents via structured actionable warnings, and deeper MoE support. (more: https://huggingface.co/blog/trl-v1)

LeWorldModel (LeWM) is the first stable end-to-end JEPA that trains from raw pixels using only two loss terms — next-embedding prediction and a Gaussian regularizer — reducing tunable hyperparameters from six to one compared to alternatives. With ~15M parameters trainable on a single GPU in hours, it plans up to 48x faster than foundation-model-based world models while remaining competitive across 2D and 3D control tasks. The latent space encodes meaningful physical structure, confirmed through probing and surprise evaluation that detects physically implausible events. This is the first concrete implementation in the lineage of LeCun's argument that autoregressive models are fundamentally limited by error compounding. (more: https://github.com/lucas-maes/le-wm) On the abliteration front, Jim Lai's ORBA paper provides geometric grounding for why difference-of-means contrast vectors work: unit-normalized difference-of-means is exactly the normal of the Householder reflector mapping harmless to forbidden directions. The practical finding is negative — Householder reflection amplifies angular error into misdirected sign-flips that are more disruptive than projection-based zeroing, because models have effectively encountered magnitude-reduced activations via dropout but never encountered negated projections during training. Directional ablation remains the recommended primitive. (more: https://huggingface.co/blog/grimjim/orthogonal-reflection-bounded-ablation)

AI Governance: Two Paradigms, One Collision

A federal judge has blocked the Pentagon from designating Anthropic a national security threat, with Judge Rita Lin writing that the "financial and reputational harm" from the "likely unlawful" designation "risks crippling the company." The designation — historically reserved for foreign adversaries like Huawei — was apparently triggered by Anthropic's refusal to remove safety guardrails on autonomous weapons and mass surveillance applications. Lin noted the Pentagon retains discretion over which AI products it uses but argued the supply-chain-risk label doesn't appear directed at the government's stated national security interests. The administration has seven days to appeal. (more: https://www.reddit.com/r/Anthropic/comments/1s4ob7t/anthropic_wins_court_order_blocking_pentagons/)

On the other side of the Atlantic, governance is being formalized rather than weaponized. A pilot study from TriStiX evaluates three t-norm operators — Lukasiewicz, Product, and Gödel — as logical conjunction mechanisms in a neuro-symbolic system for EU AI Act compliance classification. Using the LGGT+ engine (Logic-Guided Graph Transformers Plus) across 1,035 annotated AI system descriptions spanning four risk categories, the study finds that Gödel's min-semantics achieves the highest accuracy (84.5%) and best borderline recall (85%) but introduces a 0.8% false positive rate via over-classification. Lukasiewicz and Product maintain zero false positives but miss more borderline cases (81.2% and 78.5% respectively). The deeper finding: operator choice is secondary to rule base completeness. Most false negatives stem from systems where AI plays an advisory but not decisive role — the classifier correctly identifies weak conditions, but the expert applies proportionality reasoning beyond what the rules encode. The paper maps a genuine legal interpretive question — whether rule conditions require strong joint confirmation or merely individual threshold-crossing — onto formal aggregation operators, with a mixed-semantics classifier (rule-specific t-norms annotated by legal experts) as the proposed next step. The full engine (201/201 tests passing) ships under Apache 2.0. (more: https://arxiv.org/abs/2603.28558v1)

Sources (22 articles)

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode (alex000kim.com)
[Editorial] (github.com)
[Editorial] (linkedin.com)
Coding agents could make free software matter again (gjlondon.com)
GitHub backs down, kills Copilot pull-request ads after backlash (theregister.com)
How do you know your AI audit tool actually checked everything? I was fairly confident that my skill suite did. It didn't. (reddit.com)
Slop is not necessarily the future (greptile.com)
[Editorial] (linkedin.com)
Holo3: Breaking the Computer Use Frontier (huggingface.co)
openyak/openyak (github.com)
[Editorial] (youtube.com)
M5 Max vs M3 Max Inference Benchmarks (Qwen3.5, oMLX, 128GB, 40 GPU cores) (reddit.com)
Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs (prismml.com)
Lemonade by AMD: a fast and open source local LLM server using GPU and NPU (lemonade-server.ai)
I spent 48 hours saturating Qwen 3.5 with 2,000,000 tokens to kill 'Quantization-Slop'. Here is the Sovereign Series (0.8B to 27B). (reddit.com)
I'm building a medieval RPG where every significant NPC runs on a local uncensored LLM — no cloud, no filters, no hand-holding. Here's the concept. (reddit.com)
Falcon Perception (huggingface.co)
TRL v1.0: Post-Training Library Built to Move with the Field (huggingface.co)
lucas-maes/le-wm (github.com)
Toward explaining why traditional ablation/abliteration works (huggingface.co)
Anthropic wins court order blocking Pentagon's security threat designation (reddit.com)
T-Norm Operators for EU AI Act Compliance Classification: An Empirical Comparison of Lukasiewicz, Product, and Gödel Semantics in a Neuro-Symbolic Reasoning System (arxiv.org)