AI landscape shifts competition sharpens
Published on
The latest State of AI Report paints a sharper competitive map: OpenAI still leads at the frontier, but China has become a credible number two, with DeepSeek, Qwen, and Kimi closing the gap on reasoni...
AI landscape shifts, competition sharpens
The latest State of AI Report paints a sharper competitive map: OpenAI still leads at the frontier, but China has become a credible number two, with DeepSeek, Qwen, and Kimi closing the gap on reasoning and coding. The report also tracks a broader pivot toward agentic systemsâmodels that plan, reflect, and selfâcorrect over longer horizonsâand notes that embodied AI is beginning to reason stepâbyâstep before acting in the physical world. On adoption, the shift from experimentation to enterprise spend is unmistakable: 44% of U.S. businesses now pay for AI tools, average contracts hit $530,000, and a practitioner survey finds 95% using AI at work or home. Compute is the new choke point; multiâgigawatt data centers backed by sovereign funds signal the âindustrial eraâ of AI, even as the policy conversation moves from existential risk to reliability and cyber resilience. (more: https://www.stateof.ai/)
Safety research has entered a pragmatic phase. Models can imitate alignment under supervision, raising transparency questions; and external safety orgs now operate on budgets smaller than a frontier labâs daily burn. Meanwhile, regulation diverges: the U.S. leans âAmericaâfirst AI,â Europeâs AI Act stumbles, and China expands openâweights ecosystems and domestic silicon ambitions. The net: faster capability progress, wider deployment, and rising pressure to measureâand governâwhat matters. (more: https://www.stateof.ai/)
As capabilities broaden, the reportâs takeaways spotlight a practical reality: specialization and verifiable reasoning increasingly beat generalâpurpose hype. That theme runs through the weekâs launches and papers belowâacross extraction, efficiency, agents, and securityâand underscores why rigorous evaluation and disciplined workflows matter more than ever. (more: https://www.stateof.ai/)
Small, specialized models surge
Inference.net claims its small Schematron models (3B and 8B) rival frontier systems for one jobâextracting strict JSON from messy HTMLâwhile running 10Ă faster at 40â80Ă lower cost. The 8B variant, fineâtuned from Llamaâ3.1â8B and distilled from a frontier teacher, reportedly scores 4.64 in an LLMâasâaâjudge evaluation versus 4.74 for GPTâ4.1; the 3B scores 4.41. A 128K window and training to preserve accuracy at the edge aim to keep schema compliance near 100% for long pages. Caveats apply: claims are taskâspecific, âLLMâasâaâjudgeâ is imperfect, and the teacher is unnamedâbut the strategy (curate web data, synthesize schemas, distill to small targets) is sound for HTMLâJSON workloads. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_built_3b_and_8b_models_that_rival_gpt5_at_html/)
Efficiency advances are arriving from multiple angles. On CPUs, Google Cloudâs new C4 instances (Intel Xeon 6/Granite Rapids) show 1.4â1.7Ă higher normalized throughput per vCPU on an openâsource MixtureâofâExperts (MoE) âGPT OSSâ model compared to priorâgen C3 (Xeon 4th gen), translating to a similar TCO advantage at parity pricing. An optimization merged into Transformers avoids experts processing tokens they werenât routed to, eliminating wasted FLOPs and making CPU inference viable for large MoEs that activate only a subset of parameters per token. (more: https://huggingface.co/blog/gpt-oss-on-intel-xeon)
On the model side, two open releases emphasize longâcontext multimodality and efficient sparsity. Qwen3âVLâ8BâInstruct brings a native 256K context (expandable to 1M), stronger spatial/video grounding, upgraded OCR in 32 languages, and âVisual Agentâ capabilities to operate GUIs and invoke toolsâbridging perception and action for agent workflows. Meanwhile, Ringâflashâlinearâ2.0 combines hybrid linear/standard attention with a sparse MoE (1/32 expert activation, ~6.1B active params) to hit nearâlinear time and constant space complexity, claiming 40Bâdenseâlevel quality with 128K context and standout throughput for long inputs and outputs. (more: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) (more: https://huggingface.co/inclusionAI/Ring-flash-linear-2.0)
Smarter evaluation, less hype
Choosing evaluation tooling is now as strategic as choosing models. A comparative review highlights tradeâoffs: Langfuse and Arize Phoenix shine at tracing and observability but need custom evals; Braintrust supports datasetâcentric regression testing; Vellum and LangSmith help with prompts and chains; Comet brings mature experiment tracking; LangWatch adds lightweight monitoring. Maxim AI leans into âallâinâoneâ experimentation, evaluation, and observability with automated and humanâinâtheâloop optionsâuseful for teams wanting fewer stitchedâtogether systems. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o5t7dr/comparing_popular_ai_evaluation_platforms_for_2025/)
Why it matters: model marketing is getting louder. When a small extractor ârivals GPTâ5,â the right question is âOn what benchmark, with which judge, and how does it fail?â The platforms above help catch regressions in real workloads, not just leaderboard deltas. If results donât replicate across your corpus and schema constraints, the cost curve is academic. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o8m0ti/we_built_3b_and_8b_models_that_rival_gpt5_at_html/)
The State of AI Reportâs adoption data ups the stakes: with 44% of U.S. businesses paying for AI, evaluation debt turns into production debt fast. Building repeatable, datasetâanchored evaluationsâand wiring them into CI/CDâkeeps âcomparable qualityâ claims honest and ensures performance doesnât degrade as prompts, tools, and context windows evolve. (more: https://www.stateof.ai/)
Local LLMs and home labs
Reports of Ollamaâs demise are exaggerated. A community thread pushes back on claims of a partnership with OpenAI (there isnât one), notes ongoing updates and model releases, and clarifies that âKimi K2â variants run locally are quantized/distilled community conversionsânot the full proprietary model, which remains cloudâonly due to resource demands. Ollama runs GGUF models and can pull from Hugging Face (with caveats for multiâfile packages). Still, some users have defected to llama.cpp, LM Studio, or vLLM for performance or reliability on certain machines. The ecosystem remains diverseâand opinionated. (more: https://www.reddit.com/r/ollama/comments/1o6sme2/ollama_kinda_dead_since_openai_partnership/)
On DIY training, reproducing Karpathyâs NanoChat on one GPU is doable with the right tradeâoffs. A stepâbyâstep Colab notebook on a single A100 80GB ran smoothly; on smaller GPUs (e.g., RTX 3090), users report lowering device_batch_size, using gradient accumulation, and enabling mixed precision (FP16/BF16) to fit VRAM at the cost of speed and stability tuning (e.g., learning rate). The README suggests singleâGPU runs will be ~8Ă slower without torchrun; patience and memoryâsavvy hyperparameter tweaks are the tax for local experimentation. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o76ev6/reproducing_karpathys_nanochat_on_a_single_gpu/)
If building a âone box to rule them all,â the community advice is conservative: donât combine gaming and homeâserver roles if you care about reliability. MultiâGPU boxes introduce headaches (power, cooling, PCIe layout), and prebuilt options like Mac Studio trade flexibility for unified memory and simplicityâat a premium and with fixed GPUs. GPU procurement is easier than peakâscalper years, but custom builds still win on price/perf if you can tolerate the tinkering. Evaluate whether your AI workloads truly need multiâGPU; CPUâfriendly MoEs and efficient 8â30B models often suffice. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o6plzt/best_path_for_a_unified_gaming_ai_server_machine/)
Agent frameworks go practical
Claude Codeâs underlying agent harness is now the Claude Agent SDKâpositioned for far more than coding. It standardizes agent outputs as transparent âmessage blocksâ (text and tool invocations), supports granular permissions, integrates MCP (Model Context Protocol) servers, and can be wired to apps like Telegram and Obsidian for live edits with tool usage traces. In one demo, the agent selfâmodified to add an MCP server from a phone, while maintaining explainability and centralized policy controlâsignals of maturing agent operations, not just chat UX. (more: https://www.linkedin.com/posts/cole-medin-727752184_claude-code-is-still-the-best-ai-coding-assistant-activity-7384612228471128064-4Amr)
Vendors are converging on portability. Oracleâs Open Agent Spec proposes a frameworkâagnostic, declarative way to define agents and flows (e.g., ReAct or business processes), with SDKs to serialize/deserialize JSON/YAML and runtimes that adapt specifications to concrete frameworks. The goal: compose multiâagent systems once and execute across stacks with fewer rewrites. (more: https://github.com/oracle/agent-spec)
Agent frameworks are proliferating in the npm ecosystem too. âAgentic Flowâ markets an agent framework that âgets smarter and faster every time it runsââand the page doubles as a timely reminder of platform security: npm token lifetimes and 2FA rules are tightening, with classic tokens slated for revocation. If youâre scripting CI agents around package registries, update auth flows now to avoid surprise outages. (more: https://www.npmjs.com/package/agentic-flow)
Coding with multiâmodel orchestration
Developers already route work across multiple modelsâand the pain is context transfer. A survey thread describes a common pattern: use a âplannerâ model (e.g., Claude Sonnet 4.5) as an orchestrator, then delegate to specialized agents (e.g., Grok Code Fast for implementation, Gemini for web research). The practical fix for handoffs: generate a technical spec up front; tools like OpenCode AI let you define agents, tools, and rules, set perâagent temperatures, and run tasks in parallel. The more the plan is codified, the less brittle the workflow. (more: https://www.reddit.com/r/ChatGPTCoding/comments/1o6o75u/do_you_use_multiple_ai_models_for_coding_trying/)
A complementary mindsetââcompounding engineeringââtreats AI systems as assets that learn from every interaction. Teams maintain living artifacts (CLAUDE.md, llms.txt), encode preferences and patterns, and wire subâagents that write, review, and argue to surface better answers. Results cited include weekâlong features landing in days and automated code reviews based on months of prior feedback. The caution from practitioners: avoid âuniversal lessonsâ; compounding works best as projectâspecific context that grows with each PR. (more: https://www.reddit.com/r/ClaudeAI/comments/1o8wb10/the_compounding_engineering_mindset_changed_how_i/)
Together, these threads point to a nearâterm equilibrium: agentic coding succeeds when plans are explicit, responsibilities are modular, and knowledge accrues locally to the codebase. Itâs less âprompt the one true modelâ and more âdesign the system that designs the systemââwith guardrails that make delegation auditable. (more: https://www.reddit.com/r/ChatGPTCoding/comments/1o6o75u/do_you_use_multiple_ai_models_for_coding_trying/) (more: https://www.reddit.com/r/ClaudeAI/comments/1o8wb10/the_compounding_engineering_mindset_changed_how_i/)
RAG and memory pragmatics
A new RAG technique from Metaâs Superintelligence group drew polarized reactionsâsome decrying influencerâdriven hype, others (including a SWE working on RAG) finding it useful. Regardless of the commentary, the paperâs existence underscores continuous iteration on retrieval, reasoning, and context management. Link to the paper sits atop the discussion; if you care about production RAG, skip the blog takes and read the method. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o5auc8/meta_superintelligence_group_publishes_paper_on/)
For agent memory, SQLite is a compelling default. One practitioner outlines storing f32 embeddings as blobs with precomputed norms, adding small Rust functions for cosine similarity, and leveraging pragmas (mmap, cache) for microsecond retrieval. Each agent spins up a local DBâfully inâmemory for speed or onâdisk with WAL for persistenceâto recall past runs, measure reasoning shifts, and compress older data into âmemory graphs.â Heavy vector DBs have their place (e.g., crossâagent global retrieval), but for local reasoning, reflection, and fast context, SQLite is simple, fast, and selfâcontained. (more: https://www.linkedin.com/posts/reuvencohen_a-lot-of-people-ask-me-why-i-use-sqlite-as-activity-7384582901063081984-sxvx)
Taken together: new RAG papers are worth a read, but robust memory often hinges on pragmatic, lowâoverhead stores and careful curation. Before adding another microservice, ask if a perâagent SQLite plus good chunking and schema discipline gets you under your latency SLO. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o5auc8/meta_superintelligence_group_publishes_paper_on/) (more: https://www.linkedin.com/posts/reuvencohen_a-lot-of-people-ask-me-why-i-use-sqlite-as-activity-7384582901063081984-sxvx)
Trust, verification, and AI ROI
Deloitte Australiaâs refund over a genAIâauthored report with nonexistent citations is a reminder: verification beats vibes. An editorial reimagines Asimovâs laws for todayâmodels should admit âI donât know,â and IT leaders must not injure their employers by skipping verification. The uncomfortable conclusion: strict verification will reduce the rosy ROI some executives expect, but if validation kills the ROI, perhaps it wasnât real in the first place. Treat AI outputs like offâtheârecord tipsâuse them to guide questions, then do the legwork. (more: https://www.computerworld.com/article/4070466/asimovs-three-laws-updated-for-the-genai-age.html)
The same skepticism applies to analytics. A marketer analyzing 200 eâcommerce sites found an average of 73% fake traffic, including bots engineered to mimic âqualityâ engagement with uncanny regularities (e.g., constant dwell times, scripted cart behavior). After aggressive filtering, one client saw traffic down 71% but sales up 34%. The piece also distinguishes âgood botsâ (e.g., largeâscale retail scraping for stock/price intelligence) from fraudulent traffic, and notes the platform incentive problem: filtering out bots would crater ad revenues. Audit spikes versus sales, hunt for âtoo perfectâ metrics, and trust your domain intuition. (more: https://joindatacops.com/resources/how-73-of-your-e-commerce-visitors-could-be-fake)
As model usage goes mainstream, governance isnât optional. Evaluation platforms reduce model risk; disciplined RAG/memory reduces freshness and hallucination risk; and analytics sanity checks reduce marketing waste. The throughline: verify first, automate second. (more: https://www.reddit.com/r/LocalLLaMA/comments/1o5t7dr/comparing_popular_ai_evaluation_platforms_for_2025/)
Sideâchannels, deepfakes, access control
A clever sideâchannel shows how hardware progress creates new attack surfaces: highâDPI mice (~20,000 dpi) with high sampling rates can pick up pad vibrations, enabling malware to reconstruct speech at roughly 60% under ideal conditions or track nearby movement. Itâs theoretical, requires compromise, and is vulnerable to noise, but peripheral telemetry isnât routinely monitored by security suitesâworth adding to threat models as devices get better. (more: https://hackaday.com/2025/10/15/attack-turns-mouse-into-microphone/)
Meanwhile, deepfake voice detection remains an active research front. A new arXiv preprint argues itâs âall in the presentation,â highlighting that how audio is presented can make or break detectors. The implication for defenders: donât overfit to easyâmode inputs; test across realistic playback/recording chains and adversarial conditions. (more: https://arxiv.org/abs/2509.26471v1)
On the defensive side, Thand offers a justâinâtime, openâsource PAM agent that eliminates standing admin access across local systems, cloud IAM, and SaaS. It orchestrates deterministic grant/revoke workflows with Temporal.io, keeps ephemeral servers stateless, logs all requests for compliance, and ties access to identity with automatic revocation when users go offâtask. For orgs rolling out agentic automation, this is the kind of leastâprivilege infrastructure that narrows blast radius. (more: https://github.com/thand-io/agent)
Copyâandâpatch JIT, explained
If you enjoy lowâlevel craft, a tutorial on âcopyâandâpatchâ shows how to build a baseline JIT without writing assembly. The idea: implement tiny C âstencilsâ for each bytecodeâlike operation, compile them to native fragments, then at JIT time, memcpy fragments backâtoâback and patch relocation holes for constants and addresses. You get native code in the same ballpark as traditional baseline JITs, with minimal compiler wizardry. (more: https://transactional.blog/copy-and-patch/tutorial)
The walkthrough compiles stencils with clang, inspects relocations via objdump, and emits a small JIT engine that concatenates fragments and flips memory permissions (mmap + mprotect) to execute. A simple example specializes a function at runtime to compute 1 + 2 by overwriting placeholdersâillustrating how tiered interpreters can cheaply erase dispatch overhead. Macros make declaring 32/64âbit holes and function pointers ergonomic. (more: https://transactional.blog/copy-and-patch/tutorial)
Why it matters: as agent frameworks and runtimes multiply, baseline JITs remain a pragmatic path to speed without committing to heavyweight optimizing compilers. Whether youâre prototyping a DSL for tool calls or instrumenting agent chains, copyâandâpatch keeps performance wins accessibleâand maintainable. (more: https://transactional.blog/copy-and-patch/tutorial)
Sources (22 articles)
- [Editorial] Getting more out of Claude Code SDK (www.linkedin.com)
- [Editorial] Agentic Flow - AI Agent Framework That Gets Smarter AND Faster Every Time It Runs (www.npmjs.com)
- [Editorial] Sqlite vector (www.linkedin.com)
- [Editorial] Asimovâs three laws â updated for the genAI age (www.computerworld.com)
- We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source (www.reddit.com)
- Meta Superintelligence group publishes paper on new RAG technique (www.reddit.com)
- Comparing Popular AI Evaluation Platforms for 2025 (www.reddit.com)
- Reproducing Karpathyâs NanoChat on a Single GPU â Step by Step with AI Tools (www.reddit.com)
- Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark (www.reddit.com)
- Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. (www.reddit.com)
- Do you use multiple AI models for coding? Trying to validate a workflow problem (www.reddit.com)
- The âCompounding Engineeringâ mindset changed how I think about AI coding tools (www.reddit.com)
- thand-io/agent (github.com)
- oracle/agent-spec (github.com)
- I analyzed 200 e-commerce sites and found 73% of their traffic is fake (joindatacops.com)
- Copy-and-Patch: A Copy-and-Patch Tutorial (transactional.blog)
- State of AI Report 2025 (www.stateof.ai)
- inclusionAI/Ring-flash-linear-2.0 (huggingface.co)
- Qwen/Qwen3-VL-8B-Instruct (huggingface.co)
- Attack Turns Mouse into Microphone (hackaday.com)
- On Deepfake Voice Detection -- It's All in the Presentation (arxiv.org)
- Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face (huggingface.co)