Hardware Trust and the Attack Surface You Forgot About

Published on

Today's AI news: Hardware Trust and the Attack Surface You Forgot About, Self-Organizing Agent Teams and the Agentic OS, Who Pays for the Token Tsunami?, Agentic Retrieval and Formal Synthesis, Seeing at Every Scale, When AI Comes for Mathematics. 22 sources curated from across the web.

Hardware Trust and the Attack Surface You Forgot About

A security researcher wanted to write a Linux tool for his Creative Sound Blaster Katana V2X soundbar. What he found instead was a chain of vulnerabilities that lets any attacker within Bluetooth range โ€” roughly 15 meters โ€” silently upload custom firmware to the speaker, turn it into a covert listening device, and inject keystrokes into the connected PC as if someone plugged in a USB Rubber Ducky. No pairing required. No authentication. No physical contact with the device whatsoever. The attack is possible because Creative bridged its internal CTP command protocol to Bluetooth Low Energy without requiring the same challenge-response authentication enforced over USB. Anyone can connect and start issuing commands, including firmware updates. The firmware container uses a SHA-256 checksum but no cryptographic signature verification, so patched images are accepted without complaint. The researcher's proof-of-concept replaces a dormant diagnostic task in the FreeRTOS-based firmware with 102 bytes of hand-written ARM/Thumb assembly that types arbitrary commands into the host PC's terminal after boot โ€” all delivered wirelessly over BLE in about ten minutes. Bluetooth is always on, even in sleep mode, with no apparent way to disable it. Creative, contacted via SingCERT after two ignored support tickets, responded that they "do not consider this to be a vulnerability, as it does not present a cybersecurity risk." The researcher released a community patch that blocks CTP-over-Bluetooth entirely (more: https://blog.nns.ee/2026/06/03/katana-badusb/).

This is not an isolated class of neglect. The Berkeley Vulnerability Initiative, an open record of CVEs discovered by agentic systems, paints a structural picture across the past 90 days. Total CVEs grew 59.2% year-over-year, yet traditional web vulnerability classes โ€” XSS, SQLi, CSRF โ€” dropped by 320 filings. Agentic CVEs skew high or critical more often (58.2% versus 51.3% for the rest of 2026), but severity has become a weaker predictor of actual exploit risk: the Spearman correlation between CVSS and EPSS scores essentially collapses for agentic findings. Memory-safety CVEs found by agents cross the threshold for likely in-the-wild exploitation 4ร— less often than in 2025. The takeaway is that AI-driven discovery is changing the shape of the vulnerability landscape, not just its volume (more: https://vuln.cs.berkeley.edu).

Meanwhile, a practitioner spent $1,500 pitting LLMs against a deliberately vulnerable React Native app with a wide-open Firebase backend โ€” the exact class of broken access control that plagues real production apps. GPT o3-mini solved it 7 out of 10 times; DeepSeek V4 Pro managed 3 out of 10; both Claude models hit 2 out of 10, with Opus hampered by late-firing safety guardrails. Gemini 3.1 Pro refused immediately every time. The Chinese models were notably more comfortable directly attacking the database, while Western models occasionally hesitated about affecting "live" data. At $10 max and two hours per run, the cost-per-success ranged from $4.29 (GPT o3-mini) to never (several models). The experiment is small and unscientific, as the author readily admits, but it adds a useful data point to the growing body of evidence that AI offensive capability is real, uneven, and heavily model-dependent (more: https://kasra.blog/blog/i-spent-1500-seeing-if-llms-could-hack-my-app/).

Self-Organizing Agent Teams and the Agentic OS

Harvard's MIMS lab has released AutoScientists, a decentralized team of AI agents built on Claude Code subagents that self-organize around promising hypotheses for long-running computational experiments. Unlike prior multi-agent systems that follow a single research trajectory or coordinate through a central planner, AutoScientists agents form teams around hypotheses, critique each other's proposals before spending compute, and share successes and failures to avoid redundant exploration. The system coordinates through a local ClawInstitute server handling workshops, workspaces, and message-board posts, while the orchestrator never trains anything itself โ€” it launches agents and harvests results. On BioML-Bench (24 biomedical ML tasks spanning drug discovery, protein engineering, and single-cell omics), AutoScientists achieved 74.4% mean leaderboard percentile, outperforming the strongest prior AI agent by 8.33 percentage points. On nanoGPT training optimization it reached the target validation metric 1.9ร— faster, and on ProteinGym's ACE2-Spike binding assay it gained 12.5% over the baseline (more: https://github.com/mims-harvard/AutoScientists).

The agentic tooling ecosystem around these workflows is consolidating fast. RuvOS packages the entire orchestration layer โ€” semantic memory with HNSW/RaBitQ search, resumable signed sessions, GOAP-planned multi-agent pipelines, cross-terminal coordination, and a signed audit log โ€” into a single static Rust binary with 24 MCP tools. No Node.js, no SQLite, no background service. It consolidates ideas from several earlier open-source projects (Ruflo, RuVector, SONA) into one coherent package. The design philosophy is that your coding assistant should remember decisions across days, resume where it left off, and spin up specialist agents, all stored on your own disk with tamper-evident provenance (more: https://github.com/dgdev25/ruvos).

A different approach to the orchestration problem comes from a mixed-provider workflow that routes each phase of a full-stack web build to the model that earns its tokens at that step: Sonnet explores the repo, Opus plans the architecture and page copy, Gemini 3.5 Flash designs the UI, Opus wires integrations, and Sonnet validates and smoke-tests. Each step runs as a separate coding agent session, communicating via a single markdown handoff document on disk. The rationale is twofold โ€” you cannot switch providers mid-conversation, and LLMs perform better when each session focuses on one task. The workflow handles everything from exploration through deployment and browser-based smoke testing (more: https://github.com/coleam00/frontend-mix). A detailed video walkthrough demonstrates the approach end-to-end, showing that Gemini 3.5 Flash consistently produces better-looking UIs than Claude while Opus avoids the hallucinated page copy that plagues Gemini on factual content (more: https://www.youtube.com/watch?v=iICZTWcryac). The broader vision โ€” that daily workflows should be codified into skills and then skills into automations โ€” is gaining traction among power users who see Claude Code not as an assistant but as the kernel of an agentic operating system (more: https://www.youtube.com/shorts/-9-kS6TxoWs).

Who Pays for the Token Tsunami?

Uber blew its 2026 AI budget in four months and responded by capping every employee at $1,500 per month per AI coding tool. As Simon Willison points out, this is actually a rational signal about what these tools are worth. Assuming two actively used tools per engineer, the cap works out to $36,000 per year โ€” roughly 11% of the median $330,000 total compensation package for Uber software engineers. Willison notes his own token usage runs about $1,000 per month per provider, which he currently pays just $100 for thanks to subsidized individual plans that companies like Uber cannot access. The implication: even at list prices, a heavy user stays under Uber's cap with $500 per month to spare. The spending blowout was not about individual profligacy but about thousands of engineers simultaneously discovering token-burning agentic workflows that nobody had budgeted for (more: https://simonwillison.net/2026/Jun/3/uber-caps-usage/).

The engineering response to runaway AI bills is arriving in the form of billing circuit breakers. Loopers is a Go-based reverse proxy that intercepts AI API requests with atomic Redis Lua transactions, enforcing budget limits across five granular time windows (minute, hourly, daily, weekly, monthly). It claims zero budget leakage under a 1,000-request flood test versus 215% leakage for LiteLLM, 25ร— higher throughput, 190ร— lower P99 latency, and a 23ร— smaller memory footprint. The fail-closed design blocks all requests if Redis goes down, and mid-stream SSE cutoffs sever streaming connections the moment a token budget is exceeded. The architecture is pass-through: API keys are kept only in memory during request lifecycles, never persisted to disk. It supports ten providers and ships with Python and TypeScript SDKs (more: https://github.com/CURSED-ME/loopers-oss).

Underneath the token economics sits the hardware cost question. NVIDIA's new Grace, Vera, and RTX Spark processors are not simply faster โ€” they are architectures optimized for persistent context, memory locality, and minimized data movement. Massive unified memory, tight CPU-GPU integration, and reduced data shuttling reflect a thesis that intelligence becomes more useful when it operates continuously on local context rather than constantly moving information between disconnected systems. The observation that most AI applications still shoehorn models designed for large CUDA clusters into architectures built for something fundamentally different suggests the hardware is evolving toward contextual intelligence while much of the software ecosystem remains stuck on batch inference and token generation (more: https://www.linkedin.com/posts/reuvencohen_the-most-interesting-thing-about-nvidias-share-7467809828715802624-6W7n).

Agentic Retrieval and Formal Synthesis

A team from Shanghai Jiao Tong University and the Syft news platform introduces DynaTree, a two-stage framework that decouples the expensive semantic expansion work in agentic RAG from the real-time retrieval that downstream systems actually need. In Stage I, coordinated agents โ€” a planner, retriever, augmenter, and reflector โ€” collaboratively build a reusable retrieval tree that maps the semantic space of a query topic. Each branch captures a distinct subtopic or refinement trajectory, and a Shapley-value analysis confirms that the Planning agent contributes most to recall while Reflection contributes most to ranking quality. In Stage II, a lightweight daily subtree selection mechanism adapts to evolving news distributions without re-running any agentic reasoning โ€” just LLM-based weak labeling over a compact evaluation proxy of about 200 examples. Deployed in Syft's production pipeline and evaluated through online A/B testing from January 28 to February 6, 2026, DynaTree's dynamically adapted variant consistently outperformed all existing production recallers on every evaluation day, with survival rate improvements of up to 8% over a fixed offline subtree. The paper is accepted at KDD 2026 (more: https://arxiv.org/abs/2605.31377v1).

A separate line of research tackles a fundamental weakness of LLM-based program synthesis: most approaches use simple numeric scores to rank candidates, which tell you nothing about why a program failed. Researchers at UFRGS, Oxford, and Linkรถping propose property-guided synthesis with counterexample-driven repair. Given a PDDL planning domain, they ask the LLM to synthesize a heuristic function in Python, then validate a formally defined property: every state reachable by strictly improving transitions must have a strictly improving successor (making hill climbing go directly to a goal). When the property is violated, the concrete failing state and its successor values are fed back to the LLM. Across ten IPC 2023 domains, this approach generates 7ร— fewer programs than the best prior method, reduces evaluation cost by roughly 1000ร—, and solves more test tasks โ€” all via hill climbing without any combinatorial search. The synthesized heuristics remain effectively direct on virtually all out-of-distribution test tasks, a striking generalization result (more: https://arxiv.org/abs/2605.16142v1).

In a completely different application domain, researchers at TU Wien introduce Conversational Demand Response (CDR), where aggregators and energy prosumers coordinate through bidirectional natural language powered by agentic AI. A two-tier multi-agent system pairs an aggregator agent with a prosumer Home Energy Management System (HEMS) that calls a MILP optimizer as a tool. When a grid operator needs 3 kW of flexibility for an evening window, the aggregator dispatches the request; the HEMS evaluates feasibility via dual-solve optimization (baseline versus DR-committed), translates the cost-benefit tradeoff into plain language, and presents it to the homeowner before any commitment. Downstream interactions complete in under 12 seconds. The architecture is fully open source (more: https://arxiv.org/abs/2603.06217v1).

Seeing at Every Scale

Creating images from noise is generation; reconstructing fine details from coarse inputs is super-resolution. A team at MIT argues these are the same problem โ€” reversing information loss across scales โ€” and introduces SKILD, a scale-invariant frequency-space diffusion model that unifies both tasks within a single unconditional framework. The forward process attenuates image content from fine to coarse scales while injecting spectrum-matched Gaussian noise (noise that carries the statistical profile of the dataset itself), making scale an explicit coordinate of the diffusion dynamics. The same trained reverse process handles generation and continuous super-resolution by varying only the starting timestep: no conditioning branch, no classifier-free guidance, no retraining per scale factor. SKILD reaches FID 1.73 and Inception Score 10.22 on unconditional CIFAR-10, performs 2ร—โ€“4ร— super-resolution on ImageNet from a single checkpoint while outperforming conditional baselines on perceptual metrics, and reconstructs critical Ising model fields whose connected four-point correlations closely track the ground truth โ€” a stronger test of scale invariance than any perceptual quality metric (more: https://arxiv.org/abs/2605.26032v1).

At the rendering end of the visual pipeline, Gaussian Point Splatting offers a stochastic method for rendering Gaussian splats that scales to hundreds of millions of Gaussians in real time. The core idea is to sample pixel-sized opaque points from the Gaussians and splat them to a framebuffer using 64-bit atomics, achieving even workload distribution across millions of threads. The authors formalize and solve the nontrivial problem of determining how many points to splat per Gaussian and how to distribute them to achieve the desired opacity, keeping renders faithful to the original Gaussian splatting. The work is presented at SIGGRAPH 2026 (more: https://momentsingraphics.de/Siggraph2026.html).

DaVinci Resolve 21 brings AI tools to a new domain: still photography. The new Photo page integrates Hollywood's node-based color grading workflow with a dedicated image editing environment, while AI IntelliSearch lets users find specific objects, scenes, or individual faces across entire projects using plain language. AI text-to-speech generates unique voices from 10-second clips, AI CineFocus simulates adjustable depth of field, AI Face Age Transformer changes an actor's apparent age, and AI UltraSharpen recovers detail from upscaled or slightly out-of-focus footage. The release also adds native OGraf HTML graphics and Lottie animation support, Krokodove's 70+ Fusion compositing tools, MultiMaster HDR/SDR trim passes, and expanded immersive/VR workflows including foveated rendering for Apple Immersive (more: https://www.blackmagicdesign.com/products/davinciresolve/whatsnew).

For those building the models that power these tools, Hugging Face has published a thorough beginner's guide to torch.profiler that walks through reading profiler tables and traces from a single matrix multiplication, identifying overhead-bound versus compute-bound regimes, understanding the CPU-to-GPU dispatch chain (including why cudaOccupancyMaxActiveBlocksPerMultiprocessor fires before matmul but not add), and what torch.compile actually changes under the hood โ€” the fusion is at the dispatcher level, not the kernel level, with the bias add folded into a GEMM epilogue prefixed by a Device-to-Device memcpy (more: https://huggingface.co/blog/torch-profiler). On the model release front, Microsoft has published Harrier OSS v1, a 270M-parameter model (more: https://huggingface.co/microsoft/harrier-oss-v1-270m), and Cohere Labs has released tiny-aya-global, a small multilingual model (more: https://huggingface.co/CohereLabs/tiny-aya-global).

When AI Comes for Mathematics

Sixteen mathematics specialists have turned the discipline's growing unease with AI into a public declaration. The Leiden Declaration on Artificial Intelligence and Mathematics, endorsed by the International Mathematical Union, argues that mathematics is more than a machine for producing correct answers โ€” it is a deeply human endeavor built on creativity, understanding, collaboration, and pursuit of knowledge for its own sake. Those values clash directly with the incentives driving AI development. The authors warn that AI-generated papers could overwhelm peer review with low-quality work, make it difficult to assign proper credit for discoveries, and disadvantage researchers who choose not to use AI tools. They also raise concerns about mathematical work being used to train AI systems for military and surveillance purposes. The declaration will be discussed at next month's International Congress of Mathematicians in Philadelphia. The timing is pointed: OpenAI recently reported that one of its newest reasoning models independently solved an 80-year-old conjecture, the latest in a series of breakthroughs pushing AI to the frontiers of the subject (more: https://www.science.org/content/article/mathematicians-issue-warning-ai-rapidly-gains-ground).

The deeper question underneath the mathematicians' alarm is one that a commentary on the "solipsistic superintelligence" paper articulates well: AI does not just enter an operating environment โ€” it changes the operating environment. Humans respond. Institutions respond. Markets respond. Other AI systems respond. Performance is not just a property of the model; it becomes a property of the coupled system the model enters. The practical implication for boards and executives is to stop asking only "does the system perform?" and start asking "what behaviors, incentives, dependencies, skills, workarounds, and second-order adaptations will this system create once people start responding to it?" That is where the real risk sits โ€” but also where the real value lies. Not in treating AI as intelligence dropped into a passive, frozen world, but in understanding the new operating environment that forms around it (more: https://www.linkedin.com/posts/stuart-winter-tear_solipsistic-superintelligence-unlikely-to-ugcPost-7467901637643177984-V-r3).

Sources (22 articles)

  1. Hacking your PC using your speaker without ever touching it (blog.nns.ee)
  2. [Editorial] (vuln.cs.berkeley.edu)
  3. I built a vulnerable app and spent $1,500 seeing if LLMs could hack it (kasra.blog)
  4. mims-harvard/AutoScientists (github.com)
  5. [Editorial] (github.com)
  6. [Editorial] (github.com)
  7. [Editorial] (youtube.com)
  8. You Don't Understand the Power of a Claude Code Agentic OS (youtube.com)
  9. Uber's $1,500/month AI limit is a useful signal for AI tool pricing (simonwillison.net)
  10. CURSED-ME/loopers-oss (github.com)
  11. [Editorial] (linkedin.com)
  12. DynaTree: Dynamic Agentic Retrieval Tree for Time-Sensitive News Retrieval (arxiv.org)
  13. Property-Guided LLM Program Synthesis for Planning (arxiv.org)
  14. Conversational Demand Response: Bidirectional Aggregator-Prosumer Coordination through Agentic AI (arxiv.org)
  15. Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution (arxiv.org)
  16. Gaussian Point Splatting (momentsingraphics.de)
  17. DaVinci Resolve 21 (blackmagicdesign.com)
  18. Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler (huggingface.co)
  19. microsoft/harrier-oss-v1-270m (huggingface.co)
  20. CohereLabs/tiny-aya-global (huggingface.co)
  21. Mathematicians issue warning as AI rapidly gains ground (science.org)
  22. [Editorial] (linkedin.com)