Trust Boundaries Under Fire

Published on

Today's AI news: Trust Boundaries Under Fire, AI Agents on Offense and Defense, The Agent Memory and Tooling Problem, Open Models and Inference Infrastructure, Pixels, Prompts, and the RAG Vision Problem, iOS Sideloading's Growing Toolchain, Hardware Hacks and Creative Misuse. 22 sources curated from across the web.

Trust Boundaries Under Fire

A security researcher just dropped a full public disclosure of a one-click GitHub token theft via VSCode's browser-based editor — and the attack chain is both elegant and alarming. When you open any repository on github.dev, the page receives an OAuth token scoped not to that repository but to your entire GitHub account. VSCode's webview sandboxing uses cross-origin iframes, but keyboard events are forwarded from webviews back to the main window via postMessage to keep shortcuts working. A malicious Jupyter notebook exploits this gap: JavaScript inside the sandboxed webview fabricates keyboard events that trigger real VSCode shortcuts. The sequence installs a workspace extension from .vscode/extensions/ (bypassing the trusted publisher dialog because workspace extensions in a trusted workspace skip that check), contributes a custom keybinding, then uses that keybinding to install an attacker-controlled extension that exfiltrates the full-access token. No CSRF tokens protect github.dev. If you have ever navigated past the site's initial dialog, any link on the internet can redirect you straight into the exploit with zero additional interaction. (more: https://blog.ammaraskar.com/github-token-stealing/)

The researcher chose full public disclosure, citing Microsoft's Security Response Center history of silently patching reported VSCode bugs without credit and marking them as having no security impact. A recent Starlabs XSS-to-RCE report received the same "ineligible, low severity" treatment. The broader pattern is what matters: developer tooling sits at the intersection of maximum privilege and minimal user friction, making IDE exploits among the most consequential attack surfaces of the agentic era. GitHub Security received a one-hour heads-up before the post went live.

Meanwhile, Cloudflare's Turnstile bot-verification system has started requiring WebGL renderer information — effectively demanding GPU-level device fingerprinting to prove you are human. Privacy-hardened browsers that spoof or block WebGL data now loop indefinitely on Turnstile challenges. WebKit has blocked this class of fingerprinting for years, which means Cloudflare likely whitelists Safari while penalizing privacy-conscious configurations on other browsers. Firefox's fingerprinting protection is separately undermined by Bugzilla#1916271, which leaks sanitized GPU characteristics where WebKit and Blink return hardcoded strings. The result: bot defense is converging with surveillance infrastructure, and the users who care most about privacy are the ones most likely to be locked out. (more: https://hacktivis.me/articles/cloudflare-turnstile-webgl-fingerprinting)

On the customer-facing end of trust failures, an Anthropic user reports being trapped in an infinite loop with the company's AI support bot after fraudulent charges appeared on their card. The bot acknowledged the fraud in writing, stated that a human agent needed to handle it, then auto-closed the conversation without acting — repeatedly. Because the victim is not a paid Claude subscriber, there is no path to human support. The recommended help center page is inaccessible. It is a grimly ironic case: the company building frontier AI safety research cannot build a support bot that knows when to hand off to a human. (more: https://www.reddit.com/r/Anthropic/comments/1tp6d4b/anthropics_ai_support_bot_me_is_trapping_fraud/)

AI Agents on Offense and Defense

Tracebit published the most rigorous public benchmark of AI offensive agents to date, and the headline finding is not what you would expect. The company pointed 10 frontier models at a controlled AWS cyber range — roughly 300 resources across S3, IAM, EC2, Lambda, and more — and measured how fast they achieve admin privilege escalation, both with and without canary resources deployed. Across 599 scored runs, AI agents reached admin in an average of 14 minutes. That is fast, but the real result is about defense: 95.9% of successful compromise runs had already tripped a canary before the agent's first critical action, with a median warning gap of 8 minutes. Defenders got nearly as much lead time as the entire attack took. (more: https://agentic.tracebit.com)

The surprise finding concerns disclosure — simply telling the model that deception technology exists dropped the full-compromise rate (admin plus persistence) from 20% to 3%. For Claude Opus 4.7 specifically, the rate collapsed from 68% to 0%. But deploying canaries without informing the model had almost no effect on success rates, suggesting the deterrent is cognitive rather than technical. In one memorable transcript, a model explicitly identified suspected canaries, articulated its avoidance strategy in its reasoning trace, then tripped a different canary anyway. The more capable the model, the more canaries it triggered — because thorough enumeration of high-value resources is precisely what canaries are designed to catch. The 2×2 experimental design draws from Ferguson-Walter et al.'s 2019 study of 130+ professional red-teamers and found remarkably similar patterns: awareness of deception changes attacker behavior regardless of whether decoys are actually present. Of the 951 raw runs, 301 were excluded for guardrail refusals alone — the models that would not attack were almost as common as the ones that broke through. Claude Sonnet 4.6 and GPT-5.5 produced too few valid runs to score.

On the defensive tooling side, Puck Scout is a new open-source MCP server (Go) paired with endpoint agents (Rust) that lets security engineers query their fleet in plain English and receive narrative answers with containment recommendations. The safety model is well-considered: a compiled-in typed allowlist is enforced independently by both server and agent, meaning a compromised server cannot instruct agents to execute anything outside the grammar. Worst-case compromise is unauthorized read access, never modification. It auto-registers with Claude Code and includes YAML-based investigation playbooks that anyone can contribute. (more: https://github.com/puck-security/puck-scout) On the opposite end of the spectrum, Void-Tools v2.1 ships 150+ terminal tools spanning OSINT, Discord automation, network scanning, and webhook spamming — labeled "educational use only," though the inclusion of a Discord server nuker and webhook "destroyer" suggests a flexible curriculum. (more: https://github.com/V0id-v2/Void-Tools-v2.0)

The Agent Memory and Tooling Problem

Memory OS takes a serious run at the problem every heavy AI agent user knows: the agent forgets everything between sessions. This is a 7-layer memory operating system for the Hermes Agent, running entirely locally with no cloud subscription — Qdrant for vectors, Redis for caching, SQLite with FTS5 for full-text search, and a heavily forked Icarus plugin for cross-session extraction. The architecture spans workspace files injected on every prompt, structured facts with trust scoring and entity resolution, a vector database with 4-level fallback (hybrid → dense → lexical → SQLite), a self-curating LLM wiki, and weekly decay scanning that merges semantically duplicate entries above 0.92 cosine similarity. (more: https://github.com/ClaudioDrews/memory-os)

The most interesting design insight is Layer 7 — the "Ground Truth hierarchy." Without it, the agent exhibits what the developers call "memory-zero behavior": it re-queries Qdrant, re-runs fabric_recall, and re-searches session history to verify information already present in its context, burning tokens on redundant rediscovery. The fix is a pair of files (SOUL.md, rulebook.md) that explicitly instruct the agent to treat injected memory as authoritative. It is a blunt solution to a subtle problem — LLMs default to verification behavior even when the answer is already in-context.

Gograph attacks a related cost problem from the MCP angle. It is an AST-based indexing engine for Go repositories that exposes structural code understanding directly to Claude via Model Context Protocol. Instead of 6–10 sequential grep-and-read tool calls consuming 15k–25k tokens, a single gograph_context call bundles a symbol's definition, source, callers, callees, and linked tests in under 800 tokens — a claimed 95% reduction. It runs entirely locally and includes pre-edit risk planning and architectural overview tools. (more: https://www.reddit.com/r/ClaudeAI/comments/1tpgmre/stop_claude_code_from_burning_your_token_budget/) These practical infrastructure projects contrast with the broader "software factory" pitch making the rounds: one editorial argues that the model is not the factory — the harness is — advocating for SPARC-structured development loops with ADR-recorded decisions and TDD-verified outputs. The observation that "smaller agents, smaller tasks, smaller diffs, better routing" outperforms ambitious orchestration is borne out by experience, though community response ranges from enthusiastic to "I built everything and don't use any of it because I can chain 9 skills on a command prompt and walk away." (more: https://www.linkedin.com/posts/reuvencohen_the-future-of-software-development-is-not-share-7467539901949943808-PEil)

Open Models and Inference Infrastructure

HCompany's Holo3.1 is a direct answer to a production deployment problem: computer-use agents that work in one environment rarely transfer cleanly to another. Built on the Qwen family, Holo3.1 expands beyond desktop and browser into mobile — scoring 79.3% on AndroidWorld with the 35B-A3B model (up from 67%), and 72% with the smaller 4B and 9B variants (up from 58%). For the first time, HCompany ships quantized checkpoints: FP8, Q4 GGUF, and NVFP4, with NVFP4 delivering 1.41× the throughput of FP8 on DGX Spark and cutting average agent step time from 6.8s to 3.3s. The Q4 GGUF checkpoints run on Apple Silicon, keeping execution fully local and private. Native function-calling protocol support now achieves near-parity with structured JSON output. (more: https://huggingface.co/blog/Hcompany/holo31)

On the kernel side, a new fused Mixture-of-Experts dispatch kernel written entirely in Triton reaches 89–131% of Megablocks (Stanford's CUDA-optimized MoE library) at inference batch sizes up to 512 tokens — and runs on AMD MI300X with zero code changes. The key optimization fuses gate and up projections so the SwiGLU intermediate never leaves registers, cutting 35% of global memory traffic. Fewer kernel launches (5 vs. 24+) help but matter less than the memory savings. The honest limitation: it falls behind Megablocks at 2048+ tokens, and 64+ experts under heavy routing skew remain rough, so DeepSeek-V3-scale expert counts are not there yet. But for the AMD ecosystem where CUTLASS is unavailable and Triton is the primary kernel path, this closes a real gap. (more: https://www.reddit.com/r/LocalLLaMA/comments/1tp4u0u/fused_moe_dispatch_kernel_in_pure_triton_89131_of/)

Eric Hartford and Lazarus AI released ReAligned-Qwen3.5, a series of models fine-tuned to reduce Chinese ideological bias, censorship, and state-narrative framing. The pipeline uses SFT plus GRPO with a custom dataset targeting a taxonomy of Chinese censorship, with Hartford's ReAligned classifier serving as the reward signal. Weights span 0.8B to 35B-A3B across FP16, FP8, and GGUF formats under the Apache 2.0 license. (more: https://www.reddit.com/r/LocalLLaMA/comments/1tp9ian/realignedqwen35_release/) Rounding out the infrastructure space, KANX positions itself as the first production-ready Kolmogorov-Arnold Network library with pip installability, CI, API serving, and Kubernetes support — community reception split between praise for the production focus and accusations of AI-generated slop. (more: https://www.reddit.com/r/learnmachinelearning/comments/1tpbqr1/i_built_a_productionready_kan_library_pip_install/)

Pixels, Prompts, and the RAG Vision Problem

HiDream-O1-Image arrives with an architectural bet the image generation community has debated for two years: eliminate the VAE and text encoder entirely, and train a single Pixel-level Unified Transformer (UiT) that natively encodes raw pixels, text, and task conditions in one shared token space. At 8 billion parameters, it handles text-to-image, instruction editing, subject-driven personalization, and storyboard generation — all in one model, up to 2,048 × 2,048 resolution. Benchmarks are strong: 0.90 on GenEval (beating GPT Image 2's 0.89), 89.83 on DPG-Bench for dense prompt alignment, and #8 in the Artificial Analysis Text to Image Arena. A built-in Reasoning-Driven Prompt Agent resolves layout, subject attributes, and text-rendering details before generation — essentially chain-of-thought reasoning applied to pixel synthesis. The technical report (arXiv:2605.11061) details the architecture. MIT licensed, with models from the full 50-step version to a distilled 28-step Dev variant on HuggingFace, and multi-reference personalization now supporting skeleton and layout conditioning for virtual try-on and group compositions. (more: https://github.com/HiDream-ai/HiDream-O1-Image)

On the retrieval side, Kapa.ai published a detailed analysis of how they index images for their RAG-powered documentation assistants. The core insight: do not send images to the model at query time. Describe each image once at indexing time with a cheap vision model, store descriptions as text chunks, and retrieve them alongside ordinary text. Query-time multimodal fails at scale — raw images add 27% to per-query cost on GPT and 51% on Claude, and a typical question retrieves 20–30 images that quickly hit payload limits. Their zero-shot classifier filters junk images at 96.8% accuracy on clear-cut cases, and separate caption chunks outperform inline captions on both cost and retrieval quality — the re-ranker promotes image-derived chunks into the top 15 on 51% of queries while per-query overhead stays between 1% and 6%. (more: https://www.kapa.ai/blog/how-we-index-images-for-rag)

iOS Sideloading's Growing Toolchain

The iOS sideloading ecosystem is quietly maturing into a real toolchain. LiveContainer is an app launcher — not an emulator or hypervisor — that runs iOS apps inside a container without actually installing them, bypassing the 3-app/10-app-ID limit on free developer accounts. When JIT is available (below iOS 26), codesigning is entirely bypassed; otherwise, apps are signed with LiveContainer's certificate. It supports running multiple apps simultaneously in resizable virtual windows with native Picture-in-Picture on iPad, and a URL-scheme deeplink system for launching containerized apps from the web. The security caveat is real: third-party closed-source builds have full access to your data including keychain items and login credentials. (more: https://github.com/LiveContainer/LiveContainer)

SideStore provides the distribution layer — an alternative app store requiring no jailbreak and, after initial setup, only Wi-Fi. Its catalog includes emulators from NES to Nintendo Switch, virtual machines with and without JIT support, Minecraft Java Edition ports, and iOS modifications. (more: https://sidestore.io) The pairing layer gets its own dedicated tool in idevice_pair, a cross-platform Rust GUI for managing iOS device pairing files and wireless debugging, with built-in support for installing pairing files to SideStore, LiveContainer, StikDebug, and several other sideloading applications. Together, these three projects form a vertically integrated stack: idevice_pair handles device trust, SideStore handles distribution, and LiveContainer handles execution — no jailbreak required at any layer. (more: https://github.com/jkcoxson/idevice_pair)

Hardware Hacks and Creative Misuse

If your laptop has a discrete NVIDIA GPU sitting idle while the display runs off the integrated AMD chip, nbd-vram will turn that unused VRAM into high-priority swap space. The daemon allocates VRAM via the CUDA driver API and serves it as a block device over the NBD (Network Block Device) protocol through a Unix socket — no kernel module, no NVIDIA kernel symbols, survives driver updates without rebuilding. The performance finding that matters: for the sporadic 4K page faults that characterize real laptop swap access, VRAM averages 335μs latency versus NVMe's 9.05ms — a 27× advantage. NVMe is faster for sustained sequential throughput, but its Autonomous Power State Transitions mean it wakes cold on nearly every sporadic request, paying a ~9ms penalty each time. VRAM has no power states and responds consistently. One developer turned an RTX 3070 Laptop's 8GB VRAM into 7GB of swap, tripling addressable memory to ~46GB. (more: https://github.com/c0dejedi/nbd-vram)

In the category of creative misuse of corporate infrastructure, Chipotlai Max is a meme fork of OpenCode that hardcodes Chipotle's customer support chatbot "Pepper" as its default LLM backend. After Pepper went viral in March 2026 for solving LeetCode problems, someone reverse-engineered the Amelia WebSocket/SockJS + STOMP backend into an OpenAI-compatible proxy running on localhost. Cost per token: $0.00, "powered by Chipotle's cloud budget." The project helpfully lists seven additional corporate chatbots — Home Depot, Sephora, Nordstrom, IKEA, Expedia — awaiting community reverse-engineering. (more: https://github.com/cyberpapiii/chipotlai-max) A video submission demonstrates NomixClicker, a cloud phone farm tool that schedules automated tasks on real iOS devices — the AI planner analyzes Instagram video content and generates contextual comments autonomously, with shared browser-based device access. (more: https://youtu.be/VxpWl6hTEJs?si=iLzt8rbQMKHis8Db) On a more wholesome note, a 46elks developer reflects that many computer science legends — Linus Torvalds, John Carmack, Chris Lattner, Fabrice Bellard — are still alive and actively programming, and extends a refreshingly low-tech invitation: just send one of them an email. (more: https://46elks.com/blog/2026/05/29/an-amazing-time-for-programmers)

Sources (22 articles)

  1. 1-Click GitHub Token Stealing via a VSCode Bug (blog.ammaraskar.com)
  2. Cloudflare Turnstile requiring fingerprintable WebGL (hacktivis.me)
  3. Anthropic's AI support bot (me) is trapping fraud victims in an endless loop with no way to reach a human (reddit.com)
  4. [Editorial] Agentic Tracebit — AI-Powered Security Deception (agentic.tracebit.com)
  5. puck-security/puck-scout (github.com)
  6. V0id-v2/Void-Tools-v2.0 (github.com)
  7. ClaudioDrews/memory-os (github.com)
  8. Gograph: AST-Based MCP Server That Cuts Claude Code Token Use by 95% in Go Repos (reddit.com)
  9. [Editorial] The Future of Software Development (linkedin.com)
  10. Holo3.1: Fast & Local Computer Use Agents (huggingface.co)
  11. Fused MoE dispatch kernel in pure Triton: 89-131% of Megablocks, runs on AMD with zero code changes (reddit.com)
  12. ReAligned-Qwen3.5 Release (reddit.com)
  13. KANX: A production-ready Kolmogorov-Arnold Network library (reddit.com)
  14. HiDream-ai/HiDream-O1-Image (github.com)
  15. How we index images for RAG (kapa.ai)
  16. [Editorial] LiveContainer — iOS App Sideloading Container (github.com)
  17. [Editorial] SideStore — Alternative iOS App Store (sidestore.io)
  18. [Editorial] idevice_pair — Rust iOS Device Pairing (github.com)
  19. Use your Nvidia GPU's VRAM as swap space on Linux (github.com)
  20. [Editorial] chipotlai-max (github.com)
  21. [Editorial] Video Submission (youtu.be)
  22. It is an amazing time for programmers (46elks.com)