LLMs
Model releases, capabilities, comparisons, architectures
260 articles across 69 editions
Articles
- [Editorial] LLM Processing Internals -- 2026-02-18
- Krites: Asynchronous Verified Semantic Caching for Tiered LLM Architectures -- 2026-02-18
- The Strix Halo feels like an amazing super power [Activation Guide] -- 2026-02-18
- [Editorial] https://windley.com/archives/2026/02/a_policy-aware_agent_loop_with_cedar_and_openclaw.shtml -- 2026-02-12
- Open Source Kreuzberg benchmarks and new release -- 2026-02-12
- [NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? -- 2026-02-12
- I built a rough .gguf LLM visualizer -- 2026-02-12
- Local-First Fork of OpenClaw for using open source models--LocalClaw -- 2026-02-12
- Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
- BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
- llama.cpp performance breakthrough for multi-GPU setups -- 2026-01-06
- Llama 3.2 3B fMRI LOAD BEARING DIMS FOUND -- 2026-01-06
- Hyperbolic Math w Mac GPU acceleration -- 2026-01-06
- I built a local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP (LiveKit + Ollama + n8n) -- 2025-12-29
- exllamav3 adds support for GLM 4.7 (and 4.6V, + Ministral & OLMO 3) -- 2025-12-29
- Tencent just released WeDLM 8B Instruct on Hugging Face -- 2025-12-29
- Gen 3D with local llm -- 2025-12-29
- Offline vector DB experiment — anyone want to test on their local setup? -- 2025-12-29
- Roo Code 3.37 | GLM 4.7 | MM 2.1 | Custom tools | MORE!!! -- 2025-12-29
- mini-SGLang released: Learn how LLM inference actually works (5K lines, weekend-readable) -- 2025-12-18
- Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) - Unsloth -- 2025-12-18
- I vibe coded (I hope) useful tool for local LLMs inference -- 2025-12-18
- I built a local Python agent that catches stderr and self-heals using Ollama. No cloud APIs involved. (Demo) -- 2025-12-18
- mistralai/Devstral-Small-2-24B-Instruct-2512 -- 2025-12-18
- running Deepseek v32 on consumer hardware llama.cpp/Sglang/vLLm -- 2025-12-15
- Found a REAP variant of Qwen3-coder that I can use for 100K tokens in Roo Code on my macbook -- 2025-12-15
- Understanding the new router mode in llama cpp server -- 2025-12-15
- Letting a local Ollama model judge my AI agents and it’s surprisingly usable -- 2025-12-15
- [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_assessing-llms-for-serendipity-discovery-activity-7396596796938153984-JY9u -- 2025-12-12
- Confused and unsure -- 2025-12-10
- The "Confident Idiot" Problem: Why LLM-as-a-Judge fails in production. -- 2025-12-10
- We gave 5 LLMs $100K to trade stocks for 8 months -- 2025-12-08
- An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
- Hardcore function calling benchmark in backend coding agent. -- 2025-11-26
- Python script to stress-test LangChain agents against infinite loops (Open Logic) -- 2025-11-26
- Browser extension Powered by Ollama for Code Reviews on Gitlab and Azure DO -- 2025-11-26
- Mimir - Oauth and GDPR++ compliance + vscode plugin update -- 2025-11-26
- alpkeskin/gotoon -- 2025-11-26
- Cross-GPU prefix KV reuse with RDMA / NVLink - early experimental results -- 2025-11-13
- When does RTX 6000 Pro make sense over a 5090? -- 2025-11-13
- My (open-source) continuation (FlexAttention, RoPE, BlockMasks, Muon, etc.) to Karpathy's NanoGPT -- 2025-11-13
- Anyone running code model in cpu only VPS? -- 2025-11-13
- Hardware recommendations for Ollama for homelab -- 2025-11-10
- [Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
- Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
- Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
- Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
- [Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
- AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
- Natural Language Programming: Run Natural Language as Script -- 2025-11-02
- Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
- It has been 4 hrs since the release of nanochat from Karpathy and no sign of it here! A new full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase -- 2025-10-19
- Reproducing Karpathy’s NanoChat on a Single GPU — Step by Step with AI Tools -- 2025-10-17
- Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark -- 2025-10-17
- Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. -- 2025-10-17
- [Editorial] The best ChatGPT that $100 can buy. -- 2025-10-16
- [Editorial] Train your own LLM -- 2025-10-16
- Writing an LLM from scratch, part 22 – training our LLM -- 2025-10-16
- Kwaipilot/KAT-Dev-72B-Exp -- 2025-10-16
- What's the best local LLM for coding I can run on MacBook Pro M4 32Gb? -- 2025-10-12
- How do you benchmark the cognitive performance of local LLM models? -- 2025-10-12
- Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration -- 2025-10-08
- ibm-granite/granite-docling-258M -- 2025-10-08
- What are the best models for legal work in Oct 2025? -- 2025-10-07
- [Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
- princeton-pli/RLMT -- 2025-10-07
- TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
- I created a simple tool to manage your llama.cpp settings & installation -- 2025-10-03
- Looking for a web-based open-source Claude agent/orchestration framework (not for coding, just orchestration) -- 2025-10-03
- Codexia GUI for Codex CLI new features -- 2025-10-03
- ArchGW 🚀 - Use Ollama-based LLMs with Anthropic client (release 0.3.13) -- 2025-10-03
- How am I supposed to know which third party provider can be trusted not to completely lobotomize a model? -- 2025-09-28
- A step by step guide on how to build a LLM from scratch -- 2025-09-28
- YannQi/R-4B -- 2025-09-28
- AI and licensing (commercial use) -- 2025-09-26
- I trained an LLM from scratch AMA! -- 2025-09-26
- pengzhangzhi/Open-dLLM -- 2025-09-26
- Seeking Local LLM Recommendations for AST Generation (by Function Calling) -- 2025-09-24
- Uncensored LLM -- 2025-09-23
- Building RAG Systems at Enterprise Scale: Our Lessons and Challenges -- 2025-09-23
- Can you use a Claude Max account with Cascade? -- 2025-09-22
- Permanently alter context history from function -- 2025-09-22
- How do you use agents.md in codex cli or vs code extension? -- 2025-09-22
- Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux -- 2025-09-21
- gpt-oss-20b TTFT very slow with llama.cpp? -- 2025-09-21
- Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
- Built an OpenWebUI Mobile Companion (Conduit): Alternative to Commercial Chat Apps -- 2025-09-16
- Local LLM suite on iOS powered by llama cpp - with web search and RAG -- 2025-09-16
- What are the local TTS models with voice cloning? -- 2025-09-16
- [OSS] Beelzebub — “Canary tools” for AI Agents via MCP -- 2025-09-12
- Defeating Nondeterminism in LLM Inference -- 2025-09-12
- This Week in Security: NPM, Kerbroasting, and The Rest of the Story -- 2025-09-12
- Is the "cost of inference" going up or down? -- 2025-09-08
- Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
- Kwaipilot/KAT-V1-40B -- 2025-09-06
- Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
- I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them -- 2025-09-05
- Good setup for coder LLM under 12GB VRam and 64GB DDR5? -- 2025-09-04
- openai/gpt-oss-120b -- 2025-09-04
- Some thoughts on LLMs and software development -- 2025-08-30
- LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
- Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
- unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
- baichuan-inc/Baichuan-M2-32B -- 2025-08-28
- [Editorial] AI, cve, auto exploitation -- 2025-08-26
- [Editorial] Promptware Attacks Against LLM-Powered Assistants -- 2025-08-26
- [Editorial] AI portscan -- 2025-08-26
- Prompt Obfuscation -- 2025-08-26
- synacktiv/GroupPolicyBackdoor -- 2025-08-26
- DavidBuchanan314/anubis_offload -- 2025-08-26
- Some legend finally posted working quants of GLM-4.5 Air for Ollama -- 2025-08-24
- Had some beginner questions regarding how to use Ollama? -- 2025-08-24
- AI Mode in Search gets new agentic features and expands globally -- 2025-08-24
- Practical approach for streaming UI from LLMs -- 2025-08-24
- New Tool for Finding Why Your LLM Inference is Slow -- 2025-08-14
- I ran OpenAI’s GPT-OSS 20B locally on a 16GB Mac with Ollama — setup, gotchas, and mini demo -- 2025-08-14
- GLM 4.5 Air - Optimizing - Vulkan vs. CUDA? -- 2025-08-14
- google/gemma-3n-E4B -- 2025-08-07
- IntervitensInc/pangu-pro-moe-model -- 2025-08-07
- Welcome GPT OSS, the new open-source model family from OpenAI! -- 2025-08-07
- PSA: zai/glm-4.5 is absolutely crushing it for coding - way better than Claude’s recent performance -- 2025-08-03
- Wan-AI/Wan2.2-TI2V-5B -- 2025-08-03
- moonshotai/Kimi-K2-Instruct -- 2025-08-03
- realtime-ai/blastoff-llm -- 2025-08-02
- On the Predictive Power of Representation Dispersion in Language Models -- 2025-08-01
- [Editorial] The Anatomy of a Modern LLM -- 2025-07-31
- PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
- Building a quiet LLM machine for 24/7 use, is this setup overkill or smart? -- 2025-07-30
- haykgrigo3/TimeCapsuleLLM -- 2025-07-30
- unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-07-30
- PhysicsWallahAI/Aryabhata-1.0 -- 2025-07-30
- How are people extracting system prompts? -- 2025-07-29
- cherrydra/mcpurl -- 2025-07-29
- LLMs are bad at returning code in JSON -- 2025-07-29
- Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems -- 2025-07-27
- Shanghai AI Lab Just Released a Massive 97-Page Safety Evaluation of Frontier AI Models - Here Are the Most Concerning Findings -- 2025-07-27
- Chain-GPT/Solidity-LLM -- 2025-07-25
- Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
- Qwen Code: A command-line AI workflow tool, optimized for Qwen3-Coder models -- 2025-07-23
- Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization -- 2025-07-23
- Ignoring instructions? Or am I dumb? (claude.md) -- 2025-07-18
- Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
- Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
- Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
- (Kramer UI for Ollama) I was tired of dealing with Docker, so I built a simple, portable Windows UI for Ollama. -- 2025-07-11
- BastionChat: Finally got Qwen3 + Gemma3 (thinking models) running locally on iPhone/iPad with full RAG and voice mode -- 2025-07-11
- Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
- Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
- High Precision -- 2025-07-09
- skt/A.X-4.0 -- 2025-07-09
- Medical language model - for STT and summarize things -- 2025-07-09
- Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
- Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
- Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
- Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
- Helping someone build a local continuity LLM for writing and memory—does this setup make sense? -- 2025-07-06
- Is running a local LLM useful? How? -- 2025-07-06
- Any local models that has less restraints? -- 2025-07-06
- baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-07-06
- baidu/ERNIE-4.5-300B-A47B-PT -- 2025-07-06
- skt/A.X-4.0-Light -- 2025-07-04
- ChatDOC/OCRFlux-3B -- 2025-07-04
- Self-hosting LLaMA: What are your biggest pain points? -- 2025-07-02
- Built memX: a shared memory backend for LLM agents (demo + open-source code) -- 2025-06-29
- Automatically Evaluating AI Coding Assistants with Each Git Commit (Open Source) -- 2025-06-29
- Secure Minions: private collaboration between Ollama and frontier models -- 2025-06-29
- Privacy implications of sending data to OpenRouter -- 2025-06-29
- Exploring Practical Uses for Small Language Models (e.g., Microsoft Phi) -- 2025-06-29
- LLM with OCR capabilities -- 2025-06-29
- How to create a speech recognition model from scratch -- 2025-06-29
- Arch 0.3.0 is out - I added support for the Claude family of LLMs in the proxy server framework for agents 🚀 -- 2025-06-29
- Gemini Cli MCP Agent just released ! -- 2025-06-29
- Freeplane xml mind maps locally: only Qwen3 and Phi4 Reasoning Plus can create them in one shot? -- 2025-06-29
- automated debugging using Ollama -- 2025-06-27
- Knowledge Database Advise needed/ Local RAG for IT Asset Discovery - Best approach for varied data? -- 2025-06-27
- Need feedback for a RAG using Ollama as background. -- 2025-06-27
- Best tutorial for installing a local llm with GUI setup? -- 2025-06-27
- Create 2 and 3-bit GPTQ quantization for Qwen3-235B-A22B? -- 2025-06-27
- Ollama Frontend/GUI -- 2025-06-27
- Newtonian Formulation of Attention: Treating Tokens as Interacting Masses? -- 2025-06-27
- Gemini CLI: Open-source AI agent. Write code, debug, and automate tasks with Gemini 2.5 Pro with industry-leading high usage limits at no cost. -- 2025-06-27
- A Plan for SIMD -- 2025-06-27
- Squiggle: A simple programming language for intuitive probabilistic estimation -- 2025-06-27
- I Built a Symbolic Cognitive System to Fix AI Drift — It’s Now Public (SCS 2.0) -- 2025-06-27
- Menlo/Jan-nano-128k -- 2025-06-25
- Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
- Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
- Chonkie update. -- 2025-06-25
- Memory Layer Compatible with Local Llama -- 2025-06-25
- Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
- How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
- After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
- The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
- Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
- Skywork/Skywork-SWE-32B -- 2025-06-25
- moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
- POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
- XiaomiMiMo/MiMo -- 2025-06-22
- nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
- Menlo/Jan-nano -- 2025-06-22
- Quartet - a new algorithm for training LLMs in native FP4 on 5090s -- 2025-06-22
- Open Discussion: Improving HTML-to-Markdown Extraction Using Local LLMs (7B/8B, llama.cpp) – Seeking Feedback on My Approach! -- 2025-06-22
- Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) -- 2025-06-22
- 🧙♂️ I Built a Local AI Dungeon Master – Meet Dungeo_ai (Open Source & Powered by your local LLM ) -- 2025-06-22
- Got an LLM to write a fully standards-compliant HTTP 2.0 server via a code-compile-test loop -- 2025-06-22
- Use offline voice controlled agents to search and browse the internet with a contextually aware LLM in the next version of AI Runner -- 2025-06-22
- ReMind: AI-Powered Study Companion that Transforms how You Retain Knowledge! -- 2025-06-22
- My AI coding workflow that's actually working (not just hype) -- 2025-06-22
- Double-Entry Ledgers: The Missing Primitive in Modern Software -- 2025-06-22
- gpt_agents.py -- 2025-06-22
- Demo Video of AutoBE, Backend Vibe Coding Agent Achieving 100% Compilation Success (Open Source) -- 2025-06-19
- How to set up local llms on a 6700 xt -- 2025-06-19
- Jetson Orin AGX 32gb -- 2025-06-19
- AMD GPU support -- 2025-06-19
- Much lower performance for Mistral-Small 24B on RTX 3090 and from deepinfra API -- 2025-06-19
- Extract Website Information -- 2025-06-19
- Looking for a verified copy of big-lama.ckpt (181MB) used in the original LaMa inpainting model trained on Places2. -- 2025-06-19
- Is it true that all tools like Cline/Copilot Agent/Roo Code/Windsurf/Claude Code/Cursor are roughly the same thing? -- 2025-06-19
- SkyRoof: New Ham Satellite Tracking and SDR Receiver Software -- 2025-06-19
- 100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo -- 2025-06-17
- lyeslabs/mcpgen -- 2025-06-17
- binary-husky/gpt_academic -- 2025-06-16
- [Update] Rensa: added full CMinHash + OptDensMinHash support (fast MinHash in Rust for dataset deduplication / LLM fine-tuning) -- 2025-06-15
- Open Source Unsiloed AI Chunker (EF2024) -- 2025-06-15
- ether0 - Mistral 24B with RL on several molecular design tasks in chemistry -- 2025-06-15
- Need selfhosted AI to generate better bash scripts and ansible playbooks -- 2025-06-15
- How do I finetune Devstral with vision support? -- 2025-06-15
- What's the best approach for including niche dependency source files and associated documentation reference material in context? -- 2025-06-15
- Airlines Don't Want You to Know They Sold Your Flight Data to DHS -- 2025-06-15
- John Deere Must Face Second Right to Repair Lawsuit -- 2025-06-15
- What vector database and embeddings are y'all using -- 2025-06-15
- Turn based two model critique for rounds to refine answer - any examples or FOSS projects? -- 2025-06-15
- POC: Running up to 123B as a Letterfriend on <300€ for all hardware. -- 2025-06-07
- Using LLaMA 3 locally to plan macOS UI actions (Vision + Accessibility demo) -- 2025-06-07
- Sarvam-M a 24B open-weights hybrid reasoning model -- 2025-06-07
- why isn’t anyone building legit tools with local LLMs? -- 2025-06-07
- Anyone on Oahu want to let me borrow an RTX 6000 Pro to benchmark against this dual 5090 rig? -- 2025-06-07
- Strange memory usage -- 2025-06-07
- jax and jaxlib in ubuntu -- 2025-06-07
- We accidentally solved the biggest bottleneck in vibe coding: secret sprawl aka secret leaks -- 2025-06-07
- Improving performance of rav1d video decoder -- 2025-06-07
- KumoRFM: Gen-purpose model for making instant predictions over relational data -- 2025-06-07
- The copilot delusion -- 2025-06-07
- AI is getting insane (generating 3d models ChatGPT + 3daistudio.com or open source models) -- 2025-06-07
- Is Qwen the new face of local LLMs? -- 2025-06-07
- mcp-use/mcp-use -- 2025-05-30
- Cloi CLI: Local debugging agent that runs in your terminal -- 2025-05-30
- mistralai/Devstral-Small-2505_gguf -- 2025-05-30
- Open-Sourced Multimodal Large Diffusion Language Models -- 2025-05-30
- Built a Python library for text classification because I got tired of reinventing the wheel -- 2025-05-30
- Cleaning up responses to fix up synthetic data -- 2025-05-30
- My Gemma-3 musing .... after a good time dragging it through a grinder -- 2025-05-30
- Title: Seeking Help: A -- 2025-05-30
- Trying to learn ML - Book Recommendations -- 2025-05-30
- How to have cursor auto-apply code suggestions? -- 2025-05-30
- Triangle splatting: radiance fields represented by triangles -- 2025-05-30
- Why I Built My Own Audio Player -- 2025-05-30
- Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming -- 2025-05-30
- IDEA: Record your voice prompts, copy them straight into Ollama (100% local) -- 2025-05-30
- Migration to Postgres - Success -- 2025-05-30