The Panopticon Wears Ray-Bans

Published on

Today's AI news: The Panopticon Wears Ray-Bans, Cracking the Silicon, Qwen's Breakout Season, Orchestrating the Agent Swarm, Creative AI Goes Native, The Developer's Reckoning. 20 sources curated from across the web.

The Panopticon Wears Ray-Bans

A joint investigation by Swedish newspapers Svenska Dagbladet and GΓΆteborgs-Posten has exposed what happens after Meta's AI smart glasses capture footage and it enters the company's training pipeline. Workers at Sama, a Meta subcontractor in Nairobi, describe annotating video that includes bathroom visits, sexual encounters, exposed bank cards, and intimate domestic scenes β€” content generated by users who likely had no idea the glasses were recording, let alone that the footage would be reviewed by human eyes thousands of miles away. "We see everything β€” from living rooms to naked bodies," one worker told reporters. "Meta has that type of content in its databases." The workers operate under extensive NDAs β€” speaking to journalists risks their livelihoods and, in many cases, a return to Nairobi's slums. The office is surveilled with cameras, and personal phones are banned. These are the manual laborers of the AI revolution, teaching machines to see while being forbidden from discussing what they themselves witness (more: https://www.svd.se/a/K8nrV4/metas-ai-smart-glasses-and-data-privacy-concerns-workers-say-we-see-everything).

The investigation's power lies in its specificity. Sales staff at Swedish retailers Synsam and Synoptik told reporters that users retain "full control" of their data and that "nothing is shared" with Meta. When the reporters bought a pair and analyzed network traffic, the glasses maintained constant communication with Meta servers in LuleΓ₯ and Denmark. Meta's AI features cannot function locally β€” every query, every image the glasses analyze, passes through Meta's infrastructure. The company's terms of service explicitly state that interactions "may be reviewed" by automated or human processes, and that the AI assistant "may store and use information shared with them." Users get no opt-out for this data processing; it is mandatory for the AI functions to work.

The GDPR implications are stark. Kleanthi Sardeli of NOYB (None Of Your Business) notes that when users speak a command, the camera may already be recording without explicit consent β€” a transparency violation under European law. Petter Flink of Sweden's privacy authority, IMY, was blunt: "The user really has no idea what is happening behind the scenes." The data Meta collects, he emphasized, is more valuable than the glasses themselves. Meta sold seven million units in 2025, tripling the combined total of the prior two years. Despite two months of requests, Meta provided only boilerplate responses. Meanwhile, the now-shuttered AI Search Index had tried to bring visibility to the other side of the surveillance equation β€” tracking which AI agents (GPTBot, ClaudeBot, PerplexityBot, and 50+ others) visit your website, a niche that proved there is demand for understanding who is watching, even when the watchers are machines (more: https://www.aisearchindex.com).

Cracking the Silicon

What does Apple's M4 Neural Engine actually look like from the inside? Manjeet Singh and Claude Opus 4.6, working as a human-AI collaborative pair, reverse-engineered the entire software stack from CoreML down to the IOKit kernel driver. They mapped 40+ private Objective-C classes in AppleNeuralEngine.framework and discovered a direct compile-load-evaluate pipeline via the undocumented _ANEClient API β€” bypassing CoreML entirely (more: https://maderix.substack.com/p/inside-the-m4-apple-neural-engine).

The findings are technically revealing. The ANE is a graph execution engine β€” not a CPU, not a GPU β€” that takes a compiled neural network graph and executes it as one atomic operation. Programs are compiled from MIL (Machine Learning Intermediate Language), a typed SSA representation, into E5 binaries that function more like parameterized configurations than traditional machine code: a 1024Γ—1024 matmul compiles to just 2,688 bytes, nearly identical to a 128Γ—128 matmul. The hardware supports a queue depth of 127 in-flight evaluations, independent DVFS (dynamic voltage and frequency scaling) with hardware and software adaptive triggers, and zero-copy I/O via IOSurfaces shared between GPU and ANE. The team also found _ANEInMemoryModelDescriptor, which accepts MIL programs directly in memory β€” critical for training scenarios where filesystem round-trips are unacceptable. Apple's marketed "38 TOPS" figure, the researchers warn, is misleading; actual peak throughput measured by bypassing CoreML paints a different picture, detailed in Part 2 of the series.

On the firmware security front, researcher coderush published the final installment of the Hydroph0bia series β€” a SecureBoot bypass (CVE-2025-4275) affecting UEFI firmware from Insyde H2O with massive supply chain impact. Ten days past embargo, only Dell had shipped BIOS fixes. Lenovo set remediation for no earlier than July 30. HP, Acer, and Fujitsu remained silent. The fix itself is instructive: Insyde added calls to remove shadow variables and registered VarCheck handlers to prevent them being set from the OS. But the researcher assessed the fix as insufficient β€” an attacker with SPI programming hardware or a flash write protection bypass could still circumvent both protections. The real fix, he argues, is to stop using NVRAM for security-sensitive applications entirely. Insyde's security team acknowledged the limitation, confirming regressions forced a weaker fix and promising a proper remediation within six months (more: https://coderush.me/hydroph0bia-part3/).

Qwen's Breakout Season

Alibaba's Qwen3.5 continues to cement itself as the open-weight model to beat. The 122B mixture-of-experts variant β€” with roughly 10B active parameters per token β€” now runs on a consumer-grade 3Γ—RTX 3090 setup (72GB VRAM) at 25 tokens per second using Q3_K quantization. Community reports confirm it handles the "car wash test" (a spatial reasoning probe designed to exploit training-set biases toward "walk vs. drive" answers β€” you ask if you should walk or drive to a car wash 100m away, and weaker models suggest walking) correctly, and early adopters on IQ2_KL quants report 50+ tokens per second on dual 3090s with the ik_llama runtime. One user running NVFP4 on 4Γ—3090s reports 110 tokens per second with 265K context and no speed degradation even past 100K tokens (more: https://www.reddit.com/r/LocalLLaMA/comments/1rf2ulo/qwen35_122b_in_72gb_vram_3x3090_is_the_best_model/).

Qwen3.5 also turns out to be natively multimodal. Image understanding works out of the box with llama.cpp β€” the key is adding a `modalities` configuration block to the opencode JSON file specifying text and image as input types. At 35B parameters, users report over 100 tokens per second with tool-use and scheduler plugins, turning it into a functional vision-capable local agent (more: https://www.reddit.com/r/LocalLLaMA/comments/1rgxr0v/qwen_35_is_multimodal_here_is_how_to_enable_image/).

Perplexity entered the embedding space with pplx-embed, a family built on diffusion-pretrained Qwen3 backbones using multi-stage contrastive learning. Two variants ship: pplx-embed-v1 for independent texts (no instruction prefix needed) and pplx-embed-context-v1 for context-aware document chunks. Both produce int8-quantized embeddings, cutting storage costs while reportedly outperforming Google and Alibaba on retrieval benchmarks (more: https://www.reddit.com/r/LocalLLaMA/comments/1rfkdjk/pplxembed_stateoftheart_embedding_models_for/).

On the voice synthesis front, faster-qwen3-tts delivers real-time Qwen3-TTS inference using pure CUDA graph capture β€” no Flash Attention, no vLLM, no Triton, just `torch.cuda.CUDAGraph`. The technique captures the entire decode step β€” both the 28-layer Talker and 5-layer Code Predictor transformers β€” and replays it as a single GPU operation, eliminating Python overhead between ~500 small kernel launches per step. On an RTX 4090, throughput jumps from 0.82Γ— to 4.78Γ— real-time, with time-to-first-audio dropping from 800ms to 156ms. On a Jetson AGX Orin, the 0.6B model goes from 0.179Γ— (slower than real-time) to 1.307Γ— (usable streaming), a 7.3Γ— speedup (more: https://github.com/andimarafioti/faster-qwen3-tts).

Orchestrating the Agent Swarm

The AI coding agent era has arrived at its coordination problem. Running one agent in a terminal is trivial; running thirty across different issues, branches, and PRs is a logistics nightmare. Two new open-source projects attack this from complementary angles.

agentchattr is a local chat server where AI agents and humans share a Slack-like workspace with channels. Claude Code, Codex, and Gemini CLI all connect via MCP (Model Context Protocol), and when anyone @mentions an agent, the server auto-injects a prompt into that agent's terminal. Agents can wake each other up, coordinate autonomously, and report back without manual copy-pasting. A per-channel loop guard pauses after N agent-to-agent hops to prevent runaway conversations, and human @mentions always pass through even when the guard is active. The token overhead is surprisingly modest: ~850 tokens for tool definitions in the system prompt, plus ~40 tokens per message for JSON metadata. Multi-instance support means you can run multiple Claudes with distinct identities and colors (more: https://github.com/bcurts/agentchattr).

ComposioHQ's Agent Orchestrator takes a more structured approach: each agent gets its own git worktree, its own branch, and its own PR. CI failures trigger automatic fix attempts. Reviewer comments get routed to the agent for resolution. The system is agent-agnostic (Claude Code, Codex, Aider), runtime-agnostic (tmux, Docker), and tracker-agnostic (GitHub, Linear), with eight swappable plugin slots covering every abstraction from workspace isolation to notification routing (more: https://github.com/ComposioHQ/agent-orchestrator).

While these tools solve the engineering coordination problem, the theoretical foundations of multi-agent coordination are advancing in parallel. A Tsinghua-ByteDance team published OMAD, among the first online off-policy multi-agent reinforcement learning frameworks using diffusion policies. The core innovation is a relaxed policy objective that maximizes scaled joint entropy, enabling coordinated exploration without tractable likelihood computation β€” a fundamental limitation of diffusion models that has previously confined them to offline and single-agent settings. Within the CTDE (centralized training, decentralized execution) paradigm, they employ a joint distributional value function to optimize decentralized diffusion policies simultaneously. Results across 10 continuous control tasks in MPE and MAMuJoCo show 2.5Γ— to 5Γ— gains in sample efficiency over state-of-the-art baselines (more: https://arxiv.org/abs/2602.18291v1).

Creative AI Goes Native

Corridor Digital's Niko Pueringer released CorridorKey, a neural network that solves the green screen "unmixing" problem from first principles. Unlike conventional AI roto tools that output a binary mask, CorridorKey predicts the true un-multiplied straight foreground color and a clean linear alpha channel for every pixel β€” including motion blur, out-of-focus edges, and translucent regions. It outputs 16-bit and 32-bit linear float EXR files natively for integration with Nuke, Fusion, or Resolve, handles 4K plates via dynamic resolution scaling from a 2048Γ—2048 backbone, and includes morphological cleanup for tracking markers. Hardware requirements are steep: inference at native resolution needs ~22.7GB VRAM (at least a 3090), and the optional GVM alpha hint generator demands 80GB+. The license permits commercial use for processing footage but prohibits repackaging the tool for sale or offering it as a paid API service β€” a pragmatic approach to open-sourcing creative tools while protecting against commoditization (more: https://github.com/nikopueringer/CorridorKey).

On the audio side, LoopMaker runs AI music generation entirely on-device using Apple's MLX framework. Built natively in Swift for macOS, it generates tracks up to four minutes long with optional lyrics and vocals across six genre modes, all on M-series chips with zero internet required after the initial model download. The pitch is simple: when you stop paying Suno or Stable Audio, your workflow doesn't break (more: https://www.reddit.com/r/LocalLLaMA/comments/1rjpegn/built_a_music_generation_app_that_runs_100/). ACE-Step 1.5, currently trending on HuggingFace, continues pushing the open music generation envelope (more: https://huggingface.co/ACE-Step/Ace-Step1.5). And in the visualization space, Sherlock β€” a two-year solo project β€” offers a declarative DSL for cinematic math and physics animations that compiles directly into animated scenes, with its own syntax, compiler, runtime, CLI, and live preview (more: https://www.reddit.com/r/learnmachinelearning/comments/1rehpka/i_spent_2_years_building_sherlock_a_brandnew/).

The Developer's Reckoning

The vibe coding quality crisis continues to sharpen. A candid Reddit thread captures the trap: you build an entire app with Cursor and Claude, it works beautifully, and then you realize adding new features risks breaking code you don't fully understand. Asking the AI to write tests produces tests as brittle as the code itself, leading to more time fixing broken tests than actual bugs. Community responses split into two camps: the "learn robust testing practices" school (strict dependency injection, fakes over mocks, TDD from day one, code review before implementation) and the "plain English first" approach (BDD specs, Playwright codegen for zero-code recording). One commenter built an automated QA system where agents bring up the app, click through it, and curl webhook endpoints β€” surfacing bugs that unit tests never catch, though the process requires significant babysitting due to permission prompts. The uncomfortable consensus: systematic test design remains non-negotiable, AI-assisted or not (more: https://www.reddit.com/r/ChatGPTCoding/comments/1riqmzx/how_do_you_automate_end_to_end_testing_without/).

One pragmatic response to the visual gap in text-based coding agents: a Claude Code skill that generates Excalidraw diagrams from any input β€” concepts, PDFs, research notes. The workflow packages a multi-step autonomous process into a single prompt: assess complexity, map visual patterns, generate the JSON, then validate its own output by invoking a Python script to render a PNG and inspect the result. The pattern β€” a full workflow with embedded code execution checkpoints β€” generalizes to any agent task requiring self-verification (more: https://www.linkedin.com/posts/cole-medin-727752184_most-of-us-are-visual-learners-but-ai-coding-activity-7434251220078346240-2BsP).

For developers who live in terminals and want better tooling for reviewing AI-generated code, `dv` offers a Go-based diff viewer with split and unified views, intraline highlighting, synchronized scrolling, mouse support, theming, and a command palette β€” all without leaving the terminal (more: https://github.com/darrenburns/dv). On the retrieval front, ReasonDB takes a different path from vector search: it preserves document structure as a hierarchy (headings, sections, paragraphs), builds bottom-up LLM summaries at each node, then navigates the tree via BM25 narrowing and beam-search traversal β€” visiting ~25 nodes out of millions instead of searching a flat embedding index. Built in Rust with redb and tantivy, it exposes an SQL-like language where `SEARCH` is BM25 and `REASON` triggers LLM-guided traversal (more: https://www.reddit.com/r/LocalLLaMA/comments/1rf4pwa/reasondb_opensource_document_db_where_the_llm/). And for those extending their agentic toolkit, the Nano-Banana Plugin offers a minimal Claude Code plugin template for experimentation (more: https://github.com/reggiechan74/cc-plugins/tree/main/nano-banana-plugin).

Sources (20 articles)

  1. Meta's AI smart glasses and data privacy concerns (svd.se)
  2. [Editorial] AI Search Index (aisearchindex.com)
  3. Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering (maderix.substack.com)
  4. Hydroph0bia – fixed SecureBoot bypass for UEFI firmware from Insyde H2O (2025) (coderush.me)
  5. Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time β€” also it nails the "car wash test" (reddit.com)
  6. Qwen 3.5 is multimodal. Here is how to enable image understanding in opencode with llama cpp (reddit.com)
  7. pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval (reddit.com)
  8. andimarafioti/faster-qwen3-tts (github.com)
  9. bcurts/agentchattr (github.com)
  10. [Editorial] ComposioHQ Agent Orchestrator (github.com)
  11. Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies (arxiv.org)
  12. nikopueringer/CorridorKey (github.com)
  13. Built a music generation app that runs 100% on-device using Apple's MLX framework no cloud, no API calls (reddit.com)
  14. ACE-Step/Ace-Step1.5 (huggingface.co)
  15. I spent 2 years building Sherlock β€” a brand-new programming language for cinematic math animations (reddit.com)
  16. How do you automate end to end testing without coding when you vibe coded the whole app (reddit.com)
  17. [Editorial] Visual Learning for AI Coding (linkedin.com)
  18. darrenburns/dv (github.com)
  19. ReasonDB – open-source document DB where the LLM navigates a tree instead of vector search (RAG alternative) (reddit.com)
  20. [Editorial] Claude Code Nano-Banana Plugin (github.com)