AI Silicon Gets Bespoke
Published on
Today's AI news: AI Silicon Gets Bespoke, The Open-Weight Surge, AI Weaponized, Creative AI Breaks New Ground, Research Tools and Novel Approaches, Hardware Hacking and InfoSec. 22 sources curated from across the web.
AI Silicon Gets Bespoke
OpenAI dropped its first custom inference chip this week, and the ambitions are not subtle. Jalapeño, designed and manufactured with Broadcom and assisted by OpenAI's own models during development, targets the specific workloads that keep inference bills astronomical. Early results claim significantly better performance-per-watt than current state-of-the-art alternatives — read: NVIDIA — though the chip is still in testing. Training stays on NVIDIA hardware for now; this is purely about shaving cost on the inference side where agentic products like Codex rack up millions of tokens per session. Greg Brockman framed it as exploiting OpenAI's "deep understanding of the workload" to build accelerators that general-purpose GPUs cannot match. The company spelled out the vertical integration thesis directly: "OpenAI is not only developing frontier models or building products on top of them; it is designing the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience." Google, Amazon, and Meta have all built custom AI accelerators; OpenAI joining the club was a matter of when, not if. (more: https://techcrunch.com/2026/06/24/openai-unveils-its-first-custom-chip-built-by-broadcom/)
Qualcomm made its play from the software side, announcing a $3.92 billion all-stock acquisition of Modular — the company behind Mojo and a software stack designed to let AI inference run across chipmakers' hardware without per-processor rewrites. This is a direct attack on CUDA, the software moat that has kept developers locked to NVIDIA for over a decade. Qualcomm CEO Cristiano Amon pitched it as building "developer-friendly, horizontal platforms that give customers real choice in how and where they deploy AI." Modular has positioned itself as a neutral software layer supporting chips from NVIDIA, AMD, and other vendors. With Qualcomm also reportedly in talks to acquire Tenstorrent for $8-10 billion, the smartphone chipmaker is going all-in on the data center inference market. The stock dropped 4% on the news — Wall Street still pricing it as a phone company. (more: https://www.reuters.com/business/qualcomm-buy-ai-startup-modular-2026-06-24/)
NVIDIA's answer to the infrastructure question is engineering, not silicon alternatives. The Rubin platform achieves 100% liquid cooling — every chip, every networking component — with coolant running at up to 45°C, hotter than a hot tub. That higher temperature is the whole point: it lets outdoor dry coolers reject heat without mechanical chillers in most climates. A 50MW hyperscale facility saves over $4 million annually, and rack density improves from 6U to 2U. The 75/25 water/propylene glycol coolant loop is filled once and runs closed for the facility's lifetime. In favorable geographies, this eliminates water consumption entirely — down from roughly 2.6 million gallons per megawatt per year with conventional cooling towers. Cooling has historically eaten up to 40% of data center electricity. No fans, no cold aisles, no noise requiring ear protection. The Rubin architecture is a fundamentally different machine. (more: https://blogs.nvidia.com/blog/liquid-cooling-ai-factories/)
The Open-Weight Surge
GLM-5.2 from Z.ai dropped MIT-licensed on June 16 and immediately became the focal point for an increasingly urgent conversation about the open-closed model gap. Interconnects AI's analysis puts it plainly: GLM-5.2 is the first open model competitive in coding harnesses as a general agent, matching Opus 4.8 on no-thinking benchmarks. The capability timeline should make frontier labs uncomfortable — 204 days behind Opus 4.5's release, putting the open-closed lag at roughly 6.8 months, squarely within the 6-9 month window many researchers have estimated. This creates pricing pressure on Anthropic's Claude Code revenue, which is heavily driven by being the only model that can credibly do agentic coding. Every inference provider selling open-model endpoints — Fireworks, Together, Thinky, Prime Intellect — just hit another inflection point. And all of this lands while Anthropic's most capable models remain banned under Commerce Department restrictions imposed June 12. GLM-5.2 is being given room to carve out economic territory that the frontier labs want reserved for higher-margin products. (more: https://www.interconnects.ai/p/glm-52-is-the-step-change-for-open)
Poolside shipped Laguna M.1 under Apache 2.0 — a 225B total parameter MoE with 23B active per token, 256 experts, and top-k=16 routing. The benchmark numbers are solid: 74.6% on SWE-bench Verified, 63.1% on SWE-bench Multilingual, with a 262K context window. The architecture is unusual: 3 dense SwiGLU layers, then 67 sparse MoE layers, all with global attention across 70 layers (64 Q-heads, 8 KV-heads), plus softplus attention output gating and YaRN positional encoding. Community reaction mixed gratitude with practical concerns — top-k=16 routing is exotic and llama.cpp hates it, full global attention means KV cache requirements are substantial, and the model is too large for most local hardware. Still, it is the strongest open-weight coding model from a US-based company. (more: https://old.reddit.com/r/LocalLLaMA/comments/1u9b2i3/poolsidelagunam1_hugging_face_225ba23b/)
For people actually running models locally, Mimo 2.5 is the sleeper pick. Its 5-to-1 local/global sliding-window attention (the same approach Gemma 3 uses) maintains speed at 150K+ context on dual RTX Pro 6000 — solving a private coding benchmark in ~4 minutes, same as Opus/Sonnet, while MiniMax M3 takes ~40 minutes on the same task. The explanation is kernel support: DeepSeek V4 Flash and MiniMax M3 need datacenter Blackwell SM100 kernels that nobody has written for consumer SM120 GPUs yet, so they silently fall back to dense attention or CPU ops. Attention architecture matters more than raw parameter count for local inference right now. (more: https://old.reddit.com/r/LocalLLaMA/comments/1udwabh/mimo_25_is_fast_at_large_context_dual_rtx_pro_6000/)
Speculative decoding gets broader coverage with Eagle3 landing in llama.cpp b9723 for Qwen models. Enabled via --spec-type draft-eagle3, it can stack with other methods (draft-eagle3,ngram-mod). Community benchmarks show similar or slightly slower throughput compared to native MTP, but Eagle3 covers models that lack MTP or DFlash support — a useful fallback rather than a replacement. Tensor parallelism is not supported yet. (more: https://old.reddit.com/r/LocalLLaMA/comments/1u9z4e4/the_eagle3_has_landed_for_qwen/)
On the voice synthesis front, a rigorous CPU-only TTS benchmark pitted Kokoro 82M, Supertonic 3, and Inflect-Nano-v1 (4.6M params) head-to-head on Intel Xeon, 4 cores, no GPU, with UTMOS scoring on every sample. Kokoro remains the only model that sounds human (MOS 4.44-4.45) but is the slowest at 0.57 RTF. Supertonic 3 at 5 steps is the practical sweet spot: MOS 4.37 at 3.2x real-time. Inflect-Nano blazes at 7.3x real-time but UTMOS over-rates it — the classic failure mode where small HiFi-GAN vocoders get rewarded for sounding clean rather than natural. Inflect-Nano also silently truncates at ~15 seconds, inflating its long-text benchmarks. The developer showed up in the thread acknowledging the cap and announcing v2 is in training. (more: https://old.reddit.com/r/LocalLLaMA/comments/1udg3rf/cpuonly_tts_benchmark_kokoro_82m_vs_supertonic_3/)
AI Weaponized
OALABS published a detailed forensic analysis of real-world AI-assisted hacking. A friend's compromised server yielded the attacker's working directory — complete with over 1,000 Claude (Opus 4.5) and Codex (GPT-5.2) agent session logs. The attacker, identified through an OPSEC failure as a young man in Addis Ababa (his resume with full name and LinkedIn was in the session history), framed all requests as "authorized redteam" exercises. Claude did the hacking with minimal human direction: researching exposed services via Shodan, writing custom exploits from CVE research, validating credentials, exfiltrating data, and drafting "PENTEST-REPORT" documents with dollar-value monetization estimates. Across 1,000+ sessions, Claude emitted only 9 policy violations; Codex just 1. The attacker breached at least 14 companies, attempted to crack a Bitcoin wallet containing 69.71 BTC (~$4M) by distributing the workload across 14 compromised hosts including a Southeast Asian government server farm — the cracking failed. OALABS explicitly argues against stronger refusals: most of the tasking was indistinguishable from routine redteam work, and broadly restricting the workflow would mostly degrade tools for legitimate security professionals while leaving the same capability available through older or less restricted models. The practical finding is stark: AI agents lower the skill floor for offensive operations. (more: https://research.openanalysis.net/claude/codex/hacking/ai%20hacking/llm/redteam/policy%20violation/2026/06/16/compromised-claude-hacking.html)
That operational reality lands alongside Anthropic accusing Alibaba of the largest known distillation attack against Claude — 28.8 million exchanges through nearly 25,000 fraudulent accounts between April and June 2026. The campaign, attributed to operators affiliated with Alibaba and Alibaba Qwen, dwarfs all prior efforts: DeepSeek ran 150K exchanges, Moonshot 3.4M, MiniMax 13M. Anthropic's letter to the Senate Banking Committee, dated June 10, arrived two days before Commerce imposed Mythos/Fable restrictions. The escalation curve is exponential. (more: https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-capabilities-2026-06-24/)
Open source is fighting its own form of weaponization. Greptile studied the PR deluge on OpenClaw — the fastest-growing GitHub repo in history — and the numbers are bleak. PRs jumped from 2/week to 3,400/week; merge rate collapsed from 48% to 9.3%. One contributor submitted 106 PRs in a single day with a median time between submissions measured in minutes. The convergence problem compounds the volume: 4 people submitted PRs with identical titles to add SearXNG as a search provider; 6 independently fixed the same Brave Search locale bug. Refactors merge at 35% vs features at 9% — contributions requiring deep codebase understanding outperform agent-generated novel features by nearly 4x. Mitchell Hashimoto's Vouch trust system for Ghostty is the email sender-reputation score reborn for pull requests. Linus's Law holds only if the underlying thinking remains diverse. (more: https://www.greptile.com/blog/prs-on-openclaw)
Creative AI Breaks New Ground
Krea 2 drops a 12B open-weights image model with a technical report that reads like a masterclass in training pipeline design. The multi-stage approach — pretraining on billions of images, midtraining for stylistic coverage with FAISS hierarchical k-means clustering and PageRank-based concept coverage over Wikipedia, SFT on hand-curated data, preference optimization via their custom STPO (a DPO variant), and RL — is built explicitly for aesthetic diversity rather than converging on a single polished default. A critical design choice: zero AI-generated images in the pretraining mix. They found even a small proportion introduces learnable biases that impose an upper bound on quality. Architecture choices lean LLM: GQA with gated sigmoid attention, a single Qwen 3 VL text encoder, and TDM timestep distillation. The prompt expander, trained via SFT+RL on open-source LLMs, maps underspecified user inputs into richer visual directions without overwriting intent. Distributed training runs on Kubernetes with Weka filesystem. (more: https://www.krea.ai/blog/krea-2-technical-report)
Google released a sweeping report on the future of AI in generative media, collecting industry perspectives from Synthesia, Freepik, Lux Capital, Google Labs, Gradient, OpusClip, and Leonardo.Ai. The near-term consensus: video replaces text as the primary content medium, founders become creative directors rather than coders, and AI film production overtakes traditional workflows. The more speculative predictions — BCI interfaces for creation, fully personalized content streams — read like a venture pitch deck, but the observations about tool convergence and the compression of production timelines are grounded in current trajectory. (more: https://services.google.com/fh/files/misc/future_of_ai_report_genmedia_final.pdf)
For practitioners who want to train rather than infer: MLX-LoRA-Studio is a native macOS SwiftUI app for LLM fine-tuning on Apple Silicon. It supports 9 training algorithms (SFT, DPO, CPO, ORPO, GRPO, Online DPO, XPO, RLHF Reinforce, PPO), multiple adapter types (LoRA, DoRA, QLoRA, full, QAT), live training metrics, synthetic data generation, and HuggingFace upload. ResourceGuard prevents the perennial Apple Silicon problem of swap death. MIT licensed. (more: https://github.com/Goekdeniz-Guelmez/MLX-LoRA-Studio)
Research Tools and Novel Approaches
Berkeley's PixelRAG takes a deliberately contrarian approach to retrieval-augmented generation: instead of parsing documents into text chunks, render them as screenshots and search the visual index. The argument is that text extraction destroys layout, formatting, and visual context that humans rely on for understanding. Built at SkyLab and BAIR, the system indexes 8.28M Wikipedia pages using a Qwen3-VL-Embedding model fine-tuned with LoRA, with a live API at pixelrag.ai requiring no setup or API key. It ships a Claude Code plugin (pixelbrowse skill) for integration into agentic workflows. Apache 2.0. (more: https://github.com/StarTrail-org/PixelRAG)
Anneal is a tensor compiler written in Go — a from-scratch port of tinygrad's rangeify-era core that makes a structural argument about autodiff. Everything — forward ops, gradients, movement ops — is a single immutable IR node (the UOp). Backward() does not build closures; it injects gradient UOps into the same graph as the forward pass, so the scheduler fuses kernels across the forward/backward boundary — an optimization structurally impossible with a tape-based system. Movement ops become range arithmetic, never copying data. It trains MLP, conv, nanoGPT, Llama-style decoders, and ViT end-to-end on real GPU hardware via WebGPU. ONNX import is zero-CGO. The anneal web command serves a browser studio with 8 deep-linkable views. AGPL3 licensed, deliberately a research vehicle, not yet a production framework. (more: https://github.com/georgebuilds/anneal)
NuExtract3 from NuMind appeared on HuggingFace's trending list, though content was unavailable at crawl time — one to watch for structured extraction. (more: https://huggingface.co/numind/NuExtract3)
Reuven Cohen's MetaHarness flips the agent-framework arms race on its head: rather than ship yet another assistant, it ships a factory that mints your own — repo-aware, branded, and publishable under your own npx your-name command. Its core move is separating the reusable Rust/WASM kernel from the opinionated content, so you take only the pieces you want and keep re-pinning to upstream engine updates instead of forking into a frozen dead end. Every generated harness targets six-plus hosts (Claude Code, Codex, pi.dev, Hermes…), starts default-deny, and ships with an Ed25519 witness manifest so anyone can verify exactly what's inside — and with Darwin Mode it sandbox-tests tweaks to its own config and keeps only what measurably helps. The model stays replaceable; the harness is the product. (more: https://ruv-explainer-agent-harness-generat.vercel.app)
Hardware Hacking and InfoSec
GPS's fundamental weakness — satellite signals from 20,000 km arrive extremely weak — has moved from theoretical concern to operational crisis at scale. Xona's Pulsar-0 LEO satellite provides positioning and timing signals roughly 100x stronger than standard GPS/GNSS, operating from low-Earth orbit where signal strength makes jamming significantly harder. Military M-code encryption protects timing from spoofing but not jamming, and civilian infrastructure remains fully exposed. The comment section produced substantive technical discussion: one commenter detailed the economics ($60K+ for basic RTK receivers on vessels, expensive cabling and antennas, yearly licensing — and still no spoofing protection). Others pointed out that Starlink could theoretically supplant GPS with its denser, lower-altitude, higher-power constellation — a demo already achieved 2-meter accuracy in 20 seconds with no cooperation from SpaceX. (more: https://hackaday.com/2026/06/23/long-theorized-gps-weakness-exploited-on-large-scale/)
Hugh Jeffreys tore down a prison tablet — an Iview Optimus-C-8001 running Windows 10 Home on an Atom Z8350 with 2GB RAM, hermetically sealed and locked in kiosk mode. Despite the physical lockdown, the software security is nonexistent: no disk encryption, standard Windows underneath. The hardware costs perhaps $50 to manufacture; incarcerated users pay $0.11 per minute for basic communication. The teardown ignited extensive commentary on the US for-profit prison system and the extractive economics of captive-market technology. (more: https://hackaday.com/2026/06/22/breaking-into-a-prison-tablet/)
Two new CVEs in Acer's Wave 7 routers are the latest entry in a depressing trend of IoT security failures. CVE number one: the CGI log file is accessible without authentication and contains cleartext login credentials for web and Telnet — a straight CVSS 10.0. The likely cause: a debug build shipped to production where an internal printf-equivalent logs credentials during authentication. CVE number two: a hard-coded AES key in the upload.cgi binary lets attackers decrypt, modify, and re-encrypt device backups, enabling persistent backdoor injection. The AES mode is almost certainly CBC or ECB (non-authenticated), not GCM. The video analysis walks through how these are not subtle exploit chains but architecture-level failures that should have been caught in any security review. The broader trend: router CVEs have been climbing steadily since 2016, and the devices remain the front door to every home network. (more: https://www.youtube.com/watch?v=gEMx7kt7n4E)
A player piano restomod puts a Pi Pico driving 88 solenoids via shift registers for MIDI playback — the 19th-century paper roll replaced by the entire MIDI corpus. The RP2040's PIO handles PWM for volume control, with staggered pulses to prevent power supply overload on chords. The modification is mostly reversible: solenoids poke the same key paddles the original pneumatic mechanism used. Within a week of deploying a web submission interface, internet strangers had queued 17 hours of music. (more: https://hackaday.com/2026/06/17/leaky-player-piano-gets-midi-upgrade-in-youtube-restomod/)
Sources (22 articles)
- OpenAI unveils its first custom chip, built by Broadcom (techcrunch.com)
- Qualcomm to Acquire Modular (reuters.com)
- 45°C cooling design cuts data center water use to near zero (blogs.nvidia.com)
- GLM-5.2 is a step change for open agents (interconnects.ai)
- poolside/Laguna-M.1 · Hugging Face - 225B-A23B (old.reddit.com)
- Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000) (old.reddit.com)
- The Eagle(3) has landed (for Qwen) (old.reddit.com)
- CPU-only TTS benchmark: Kokoro 82M vs Supertonic 3 vs Inflect-Nano-v1 (4.6M params), with UTMOS scoring on every sample (old.reddit.com)
- [Editorial] Compromised Claude Hacking Research (research.openanalysis.net)
- Anthropic says Alibaba illicitly extracted Claude AI model capabilities (reuters.com)
- PR spam today looks like email spam in the early 2000s (greptile.com)
- Krea 2: SOTA open-weights 12B image model (krea.ai)
- [Editorial] Google Future of AI in Generative Media Report (services.google.com)
- [Editorial] MLX-LoRA-Studio (github.com)
- StarTrail-org/PixelRAG (github.com)
- georgebuilds/anneal (github.com)
- numind/NuExtract3 (huggingface.co)
- [Editorial] Explainer Agent Harness Generator (ruv-explainer-agent-harness-generat.vercel.app)
- Long-Theorized GPS Weakness Exploited on Large Scale (hackaday.com)
- Breaking Into a Prison Tablet (hackaday.com)
- [Editorial] Video Feature (youtube.com)
- Leaky Player Piano Gets MIDI Upgrade in YouTube Restomod (hackaday.com)