LLMs

Model releases, capabilities, comparisons, architectures

260 articles across 69 editions

Articles

  1. [Editorial] LLM Processing Internals -- 2026-02-18
  2. Krites: Asynchronous Verified Semantic Caching for Tiered LLM Architectures -- 2026-02-18
  3. The Strix Halo feels like an amazing super power [Activation Guide] -- 2026-02-18
  4. [Editorial] https://windley.com/archives/2026/02/a_policy-aware_agent_loop_with_cedar_and_openclaw.shtml -- 2026-02-12
  5. Open Source Kreuzberg benchmarks and new release -- 2026-02-12
  6. [NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? -- 2026-02-12
  7. I built a rough .gguf LLM visualizer -- 2026-02-12
  8. Local-First Fork of OpenClaw for using open source models--LocalClaw -- 2026-02-12
  9. Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
  10. BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
  11. llama.cpp performance breakthrough for multi-GPU setups -- 2026-01-06
  12. Llama 3.2 3B fMRI LOAD BEARING DIMS FOUND -- 2026-01-06
  13. Hyperbolic Math w Mac GPU acceleration -- 2026-01-06
  14. I built a local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP (LiveKit + Ollama + n8n) -- 2025-12-29
  15. exllamav3 adds support for GLM 4.7 (and 4.6V, + Ministral & OLMO 3) -- 2025-12-29
  16. Tencent just released WeDLM 8B Instruct on Hugging Face -- 2025-12-29
  17. Gen 3D with local llm -- 2025-12-29
  18. Offline vector DB experiment — anyone want to test on their local setup? -- 2025-12-29
  19. Roo Code 3.37 | GLM 4.7 | MM 2.1 | Custom tools | MORE!!! -- 2025-12-29
  20. mini-SGLang released: Learn how LLM inference actually works (5K lines, weekend-readable) -- 2025-12-18
  21. Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) - Unsloth -- 2025-12-18
  22. I vibe coded (I hope) useful tool for local LLMs inference -- 2025-12-18
  23. I built a local Python agent that catches stderr and self-heals using Ollama. No cloud APIs involved. (Demo) -- 2025-12-18
  24. mistralai/Devstral-Small-2-24B-Instruct-2512 -- 2025-12-18
  25. running Deepseek v32 on consumer hardware llama.cpp/Sglang/vLLm -- 2025-12-15
  26. Found a REAP variant of Qwen3-coder that I can use for 100K tokens in Roo Code on my macbook -- 2025-12-15
  27. Understanding the new router mode in llama cpp server -- 2025-12-15
  28. Letting a local Ollama model judge my AI agents and it’s surprisingly usable -- 2025-12-15
  29. [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_assessing-llms-for-serendipity-discovery-activity-7396596796938153984-JY9u -- 2025-12-12
  30. Confused and unsure -- 2025-12-10
  31. The "Confident Idiot" Problem: Why LLM-as-a-Judge fails in production. -- 2025-12-10
  32. We gave 5 LLMs $100K to trade stocks for 8 months -- 2025-12-08
  33. An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
  34. Hardcore function calling benchmark in backend coding agent. -- 2025-11-26
  35. Python script to stress-test LangChain agents against infinite loops (Open Logic) -- 2025-11-26
  36. Browser extension Powered by Ollama for Code Reviews on Gitlab and Azure DO -- 2025-11-26
  37. Mimir - Oauth and GDPR++ compliance + vscode plugin update -- 2025-11-26
  38. alpkeskin/gotoon -- 2025-11-26
  39. Cross-GPU prefix KV reuse with RDMA / NVLink - early experimental results -- 2025-11-13
  40. When does RTX 6000 Pro make sense over a 5090? -- 2025-11-13
  41. My (open-source) continuation (FlexAttention, RoPE, BlockMasks, Muon, etc.) to Karpathy's NanoGPT -- 2025-11-13
  42. Anyone running code model in cpu only VPS? -- 2025-11-13
  43. Hardware recommendations for Ollama for homelab -- 2025-11-10
  44. [Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
  45. Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
  46. Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
  47. Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
  48. [Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
  49. AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
  50. Natural Language Programming: Run Natural Language as Script -- 2025-11-02
  51. Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
  52. It has been 4 hrs since the release of nanochat from Karpathy and no sign of it here! A new full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase -- 2025-10-19
  53. Reproducing Karpathy’s NanoChat on a Single GPU — Step by Step with AI Tools -- 2025-10-17
  54. Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark -- 2025-10-17
  55. Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. -- 2025-10-17
  56. [Editorial] The best ChatGPT that $100 can buy. -- 2025-10-16
  57. [Editorial] Train your own LLM -- 2025-10-16
  58. Writing an LLM from scratch, part 22 – training our LLM -- 2025-10-16
  59. Kwaipilot/KAT-Dev-72B-Exp -- 2025-10-16
  60. What's the best local LLM for coding I can run on MacBook Pro M4 32Gb? -- 2025-10-12
  61. How do you benchmark the cognitive performance of local LLM models? -- 2025-10-12
  62. Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration -- 2025-10-08
  63. ibm-granite/granite-docling-258M -- 2025-10-08
  64. What are the best models for legal work in Oct 2025? -- 2025-10-07
  65. [Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
  66. princeton-pli/RLMT -- 2025-10-07
  67. TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
  68. I created a simple tool to manage your llama.cpp settings & installation -- 2025-10-03
  69. Looking for a web-based open-source Claude agent/orchestration framework (not for coding, just orchestration) -- 2025-10-03
  70. Codexia GUI for Codex CLI new features -- 2025-10-03
  71. ArchGW 🚀 - Use Ollama-based LLMs with Anthropic client (release 0.3.13) -- 2025-10-03
  72. How am I supposed to know which third party provider can be trusted not to completely lobotomize a model? -- 2025-09-28
  73. A step by step guide on how to build a LLM from scratch -- 2025-09-28
  74. YannQi/R-4B -- 2025-09-28
  75. AI and licensing (commercial use) -- 2025-09-26
  76. I trained an LLM from scratch AMA! -- 2025-09-26
  77. pengzhangzhi/Open-dLLM -- 2025-09-26
  78. Seeking Local LLM Recommendations for AST Generation (by Function Calling) -- 2025-09-24
  79. Uncensored LLM -- 2025-09-23
  80. Building RAG Systems at Enterprise Scale: Our Lessons and Challenges -- 2025-09-23
  81. Can you use a Claude Max account with Cascade? -- 2025-09-22
  82. Permanently alter context history from function -- 2025-09-22
  83. How do you use agents.md in codex cli or vs code extension? -- 2025-09-22
  84. Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux -- 2025-09-21
  85. gpt-oss-20b TTFT very slow with llama.cpp? -- 2025-09-21
  86. Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
  87. Built an OpenWebUI Mobile Companion (Conduit): Alternative to Commercial Chat Apps -- 2025-09-16
  88. Local LLM suite on iOS powered by llama cpp - with web search and RAG -- 2025-09-16
  89. What are the local TTS models with voice cloning? -- 2025-09-16
  90. [OSS] Beelzebub — “Canary tools” for AI Agents via MCP -- 2025-09-12
  91. Defeating Nondeterminism in LLM Inference -- 2025-09-12
  92. This Week in Security: NPM, Kerbroasting, and The Rest of the Story -- 2025-09-12
  93. Is the "cost of inference" going up or down? -- 2025-09-08
  94. Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
  95. Kwaipilot/KAT-V1-40B -- 2025-09-06
  96. Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
  97. I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them -- 2025-09-05
  98. Good setup for coder LLM under 12GB VRam and 64GB DDR5? -- 2025-09-04
  99. openai/gpt-oss-120b -- 2025-09-04
  100. Some thoughts on LLMs and software development -- 2025-08-30
  101. LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
  102. Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
  103. unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
  104. baichuan-inc/Baichuan-M2-32B -- 2025-08-28
  105. [Editorial] AI, cve, auto exploitation -- 2025-08-26
  106. [Editorial] Promptware Attacks Against LLM-Powered Assistants -- 2025-08-26
  107. [Editorial] AI portscan -- 2025-08-26
  108. Prompt Obfuscation -- 2025-08-26
  109. synacktiv/GroupPolicyBackdoor -- 2025-08-26
  110. DavidBuchanan314/anubis_offload -- 2025-08-26
  111. Some legend finally posted working quants of GLM-4.5 Air for Ollama -- 2025-08-24
  112. Had some beginner questions regarding how to use Ollama? -- 2025-08-24
  113. AI Mode in Search gets new agentic features and expands globally -- 2025-08-24
  114. Practical approach for streaming UI from LLMs -- 2025-08-24
  115. New Tool for Finding Why Your LLM Inference is Slow -- 2025-08-14
  116. I ran OpenAI’s GPT-OSS 20B locally on a 16GB Mac with Ollama — setup, gotchas, and mini demo -- 2025-08-14
  117. GLM 4.5 Air - Optimizing - Vulkan vs. CUDA? -- 2025-08-14
  118. google/gemma-3n-E4B -- 2025-08-07
  119. IntervitensInc/pangu-pro-moe-model -- 2025-08-07
  120. Welcome GPT OSS, the new open-source model family from OpenAI! -- 2025-08-07
  121. PSA: zai/glm-4.5 is absolutely crushing it for coding - way better than Claude’s recent performance -- 2025-08-03
  122. Wan-AI/Wan2.2-TI2V-5B -- 2025-08-03
  123. moonshotai/Kimi-K2-Instruct -- 2025-08-03
  124. realtime-ai/blastoff-llm -- 2025-08-02
  125. On the Predictive Power of Representation Dispersion in Language Models -- 2025-08-01
  126. [Editorial] The Anatomy of a Modern LLM -- 2025-07-31
  127. PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
  128. Building a quiet LLM machine for 24/7 use, is this setup overkill or smart? -- 2025-07-30
  129. haykgrigo3/TimeCapsuleLLM -- 2025-07-30
  130. unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-07-30
  131. PhysicsWallahAI/Aryabhata-1.0 -- 2025-07-30
  132. How are people extracting system prompts? -- 2025-07-29
  133. cherrydra/mcpurl -- 2025-07-29
  134. LLMs are bad at returning code in JSON -- 2025-07-29
  135. Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems -- 2025-07-27
  136. Shanghai AI Lab Just Released a Massive 97-Page Safety Evaluation of Frontier AI Models - Here Are the Most Concerning Findings -- 2025-07-27
  137. Chain-GPT/Solidity-LLM -- 2025-07-25
  138. Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
  139. Qwen Code: A command-line AI workflow tool, optimized for Qwen3-Coder models -- 2025-07-23
  140. Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization -- 2025-07-23
  141. Ignoring instructions? Or am I dumb? (claude.md) -- 2025-07-18
  142. Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
  143. Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
  144. Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
  145. (Kramer UI for Ollama) I was tired of dealing with Docker, so I built a simple, portable Windows UI for Ollama. -- 2025-07-11
  146. BastionChat: Finally got Qwen3 + Gemma3 (thinking models) running locally on iPhone/iPad with full RAG and voice mode -- 2025-07-11
  147. Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
  148. Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
  149. High Precision -- 2025-07-09
  150. skt/A.X-4.0 -- 2025-07-09
  151. Medical language model - for STT and summarize things -- 2025-07-09
  152. Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
  153. Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
  154. Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
  155. Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
  156. Helping someone build a local continuity LLM for writing and memory—does this setup make sense? -- 2025-07-06
  157. Is running a local LLM useful? How? -- 2025-07-06
  158. Any local models that has less restraints? -- 2025-07-06
  159. baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-07-06
  160. baidu/ERNIE-4.5-300B-A47B-PT -- 2025-07-06
  161. skt/A.X-4.0-Light -- 2025-07-04
  162. ChatDOC/OCRFlux-3B -- 2025-07-04
  163. Self-hosting LLaMA: What are your biggest pain points? -- 2025-07-02
  164. Built memX: a shared memory backend for LLM agents (demo + open-source code) -- 2025-06-29
  165. Automatically Evaluating AI Coding Assistants with Each Git Commit (Open Source) -- 2025-06-29
  166. Secure Minions: private collaboration between Ollama and frontier models -- 2025-06-29
  167. Privacy implications of sending data to OpenRouter -- 2025-06-29
  168. Exploring Practical Uses for Small Language Models (e.g., Microsoft Phi) -- 2025-06-29
  169. LLM with OCR capabilities -- 2025-06-29
  170. How to create a speech recognition model from scratch -- 2025-06-29
  171. Arch 0.3.0 is out - I added support for the Claude family of LLMs in the proxy server framework for agents 🚀 -- 2025-06-29
  172. Gemini Cli MCP Agent just released ! -- 2025-06-29
  173. Freeplane xml mind maps locally: only Qwen3 and Phi4 Reasoning Plus can create them in one shot? -- 2025-06-29
  174. automated debugging using Ollama -- 2025-06-27
  175. Knowledge Database Advise needed/ Local RAG for IT Asset Discovery - Best approach for varied data? -- 2025-06-27
  176. Need feedback for a RAG using Ollama as background. -- 2025-06-27
  177. Best tutorial for installing a local llm with GUI setup? -- 2025-06-27
  178. Create 2 and 3-bit GPTQ quantization for Qwen3-235B-A22B? -- 2025-06-27
  179. Ollama Frontend/GUI -- 2025-06-27
  180. Newtonian Formulation of Attention: Treating Tokens as Interacting Masses? -- 2025-06-27
  181. Gemini CLI: Open-source AI agent. Write code, debug, and automate tasks with Gemini 2.5 Pro with industry-leading high usage limits at no cost. -- 2025-06-27
  182. A Plan for SIMD -- 2025-06-27
  183. Squiggle: A simple programming language for intuitive probabilistic estimation -- 2025-06-27
  184. I Built a Symbolic Cognitive System to Fix AI Drift — It’s Now Public (SCS 2.0) -- 2025-06-27
  185. Menlo/Jan-nano-128k -- 2025-06-25
  186. Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
  187. Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
  188. Chonkie update. -- 2025-06-25
  189. Memory Layer Compatible with Local Llama -- 2025-06-25
  190. Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
  191. How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
  192. After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
  193. The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
  194. Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
  195. Skywork/Skywork-SWE-32B -- 2025-06-25
  196. moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
  197. POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
  198. XiaomiMiMo/MiMo -- 2025-06-22
  199. nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
  200. Menlo/Jan-nano -- 2025-06-22
  201. Quartet - a new algorithm for training LLMs in native FP4 on 5090s -- 2025-06-22
  202. Open Discussion: Improving HTML-to-Markdown Extraction Using Local LLMs (7B/8B, llama.cpp) – Seeking Feedback on My Approach! -- 2025-06-22
  203. Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) -- 2025-06-22
  204. 🧙‍♂️ I Built a Local AI Dungeon Master – Meet Dungeo_ai (Open Source & Powered by your local LLM ) -- 2025-06-22
  205. Got an LLM to write a fully standards-compliant HTTP 2.0 server via a code-compile-test loop -- 2025-06-22
  206. Use offline voice controlled agents to search and browse the internet with a contextually aware LLM in the next version of AI Runner -- 2025-06-22
  207. ReMind: AI-Powered Study Companion that Transforms how You Retain Knowledge! -- 2025-06-22
  208. My AI coding workflow that's actually working (not just hype) -- 2025-06-22
  209. Double-Entry Ledgers: The Missing Primitive in Modern Software -- 2025-06-22
  210. gpt_agents.py -- 2025-06-22
  211. Demo Video of AutoBE, Backend Vibe Coding Agent Achieving 100% Compilation Success (Open Source) -- 2025-06-19
  212. How to set up local llms on a 6700 xt -- 2025-06-19
  213. Jetson Orin AGX 32gb -- 2025-06-19
  214. AMD GPU support -- 2025-06-19
  215. Much lower performance for Mistral-Small 24B on RTX 3090 and from deepinfra API -- 2025-06-19
  216. Extract Website Information -- 2025-06-19
  217. Looking for a verified copy of big-lama.ckpt (181MB) used in the original LaMa inpainting model trained on Places2. -- 2025-06-19
  218. Is it true that all tools like Cline/Copilot Agent/Roo Code/Windsurf/Claude Code/Cursor are roughly the same thing? -- 2025-06-19
  219. SkyRoof: New Ham Satellite Tracking and SDR Receiver Software -- 2025-06-19
  220. 100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo -- 2025-06-17
  221. lyeslabs/mcpgen -- 2025-06-17
  222. binary-husky/gpt_academic -- 2025-06-16
  223. [Update] Rensa: added full CMinHash + OptDensMinHash support (fast MinHash in Rust for dataset deduplication / LLM fine-tuning) -- 2025-06-15
  224. Open Source Unsiloed AI Chunker (EF2024) -- 2025-06-15
  225. ether0 - Mistral 24B with RL on several molecular design tasks in chemistry -- 2025-06-15
  226. Need selfhosted AI to generate better bash scripts and ansible playbooks -- 2025-06-15
  227. How do I finetune Devstral with vision support? -- 2025-06-15
  228. What's the best approach for including niche dependency source files and associated documentation reference material in context? -- 2025-06-15
  229. Airlines Don't Want You to Know They Sold Your Flight Data to DHS -- 2025-06-15
  230. John Deere Must Face Second Right to Repair Lawsuit -- 2025-06-15
  231. What vector database and embeddings are y'all using -- 2025-06-15
  232. Turn based two model critique for rounds to refine answer - any examples or FOSS projects? -- 2025-06-15
  233. POC: Running up to 123B as a Letterfriend on <300€ for all hardware. -- 2025-06-07
  234. Using LLaMA 3 locally to plan macOS UI actions (Vision + Accessibility demo) -- 2025-06-07
  235. Sarvam-M a 24B open-weights hybrid reasoning model -- 2025-06-07
  236. why isn’t anyone building legit tools with local LLMs? -- 2025-06-07
  237. Anyone on Oahu want to let me borrow an RTX 6000 Pro to benchmark against this dual 5090 rig? -- 2025-06-07
  238. Strange memory usage -- 2025-06-07
  239. jax and jaxlib in ubuntu -- 2025-06-07
  240. We accidentally solved the biggest bottleneck in vibe coding: secret sprawl aka secret leaks -- 2025-06-07
  241. Improving performance of rav1d video decoder -- 2025-06-07
  242. KumoRFM: Gen-purpose model for making instant predictions over relational data -- 2025-06-07
  243. The copilot delusion -- 2025-06-07
  244. AI is getting insane (generating 3d models ChatGPT + 3daistudio.com or open source models) -- 2025-06-07
  245. Is Qwen the new face of local LLMs? -- 2025-06-07
  246. mcp-use/mcp-use -- 2025-05-30
  247. Cloi CLI: Local debugging agent that runs in your terminal -- 2025-05-30
  248. mistralai/Devstral-Small-2505_gguf -- 2025-05-30
  249. Open-Sourced Multimodal Large Diffusion Language Models -- 2025-05-30
  250. Built a Python library for text classification because I got tired of reinventing the wheel -- 2025-05-30
  251. Cleaning up responses to fix up synthetic data -- 2025-05-30
  252. My Gemma-3 musing .... after a good time dragging it through a grinder -- 2025-05-30
  253. Title: Seeking Help: A -- 2025-05-30
  254. Trying to learn ML - Book Recommendations -- 2025-05-30
  255. How to have cursor auto-apply code suggestions? -- 2025-05-30
  256. Triangle splatting: radiance fields represented by triangles -- 2025-05-30
  257. Why I Built My Own Audio Player -- 2025-05-30
  258. Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming -- 2025-05-30
  259. IDEA: Record your voice prompts, copy them straight into Ollama (100% local) -- 2025-05-30
  260. Migration to Postgres - Success -- 2025-05-30