LLMs

Model releases, capabilities, comparisons, architectures

265 articles across 70 editions

Articles

  1. [Editorial] -- 2026-03-28
  2. [Editorial] -- 2026-03-28
  3. [Editorial] -- 2026-03-28
  4. [Editorial] -- 2026-03-28
  5. [Editorial] -- 2026-03-28
  6. [Editorial] LLM Processing Internals -- 2026-02-18
  7. Krites: Asynchronous Verified Semantic Caching for Tiered LLM Architectures -- 2026-02-18
  8. The Strix Halo feels like an amazing super power [Activation Guide] -- 2026-02-18
  9. [Editorial] https://windley.com/archives/2026/02/a_policy-aware_agent_loop_with_cedar_and_openclaw.shtml -- 2026-02-12
  10. Open Source Kreuzberg benchmarks and new release -- 2026-02-12
  11. [NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? -- 2026-02-12
  12. I built a rough .gguf LLM visualizer -- 2026-02-12
  13. Local-First Fork of OpenClaw for using open source models--LocalClaw -- 2026-02-12
  14. Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
  15. BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
  16. llama.cpp performance breakthrough for multi-GPU setups -- 2026-01-06
  17. Llama 3.2 3B fMRI LOAD BEARING DIMS FOUND -- 2026-01-06
  18. Hyperbolic Math w Mac GPU acceleration -- 2026-01-06
  19. I built a local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP (LiveKit + Ollama + n8n) -- 2025-12-29
  20. exllamav3 adds support for GLM 4.7 (and 4.6V, + Ministral & OLMO 3) -- 2025-12-29
  21. Tencent just released WeDLM 8B Instruct on Hugging Face -- 2025-12-29
  22. Gen 3D with local llm -- 2025-12-29
  23. Offline vector DB experiment — anyone want to test on their local setup? -- 2025-12-29
  24. Roo Code 3.37 | GLM 4.7 | MM 2.1 | Custom tools | MORE!!! -- 2025-12-29
  25. mini-SGLang released: Learn how LLM inference actually works (5K lines, weekend-readable) -- 2025-12-18
  26. Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) - Unsloth -- 2025-12-18
  27. I vibe coded (I hope) useful tool for local LLMs inference -- 2025-12-18
  28. I built a local Python agent that catches stderr and self-heals using Ollama. No cloud APIs involved. (Demo) -- 2025-12-18
  29. mistralai/Devstral-Small-2-24B-Instruct-2512 -- 2025-12-18
  30. running Deepseek v32 on consumer hardware llama.cpp/Sglang/vLLm -- 2025-12-15
  31. Found a REAP variant of Qwen3-coder that I can use for 100K tokens in Roo Code on my macbook -- 2025-12-15
  32. Understanding the new router mode in llama cpp server -- 2025-12-15
  33. Letting a local Ollama model judge my AI agents and it’s surprisingly usable -- 2025-12-15
  34. [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_assessing-llms-for-serendipity-discovery-activity-7396596796938153984-JY9u -- 2025-12-12
  35. Confused and unsure -- 2025-12-10
  36. The "Confident Idiot" Problem: Why LLM-as-a-Judge fails in production. -- 2025-12-10
  37. We gave 5 LLMs $100K to trade stocks for 8 months -- 2025-12-08
  38. An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
  39. Hardcore function calling benchmark in backend coding agent. -- 2025-11-26
  40. Python script to stress-test LangChain agents against infinite loops (Open Logic) -- 2025-11-26
  41. Browser extension Powered by Ollama for Code Reviews on Gitlab and Azure DO -- 2025-11-26
  42. Mimir - Oauth and GDPR++ compliance + vscode plugin update -- 2025-11-26
  43. alpkeskin/gotoon -- 2025-11-26
  44. Cross-GPU prefix KV reuse with RDMA / NVLink - early experimental results -- 2025-11-13
  45. When does RTX 6000 Pro make sense over a 5090? -- 2025-11-13
  46. My (open-source) continuation (FlexAttention, RoPE, BlockMasks, Muon, etc.) to Karpathy's NanoGPT -- 2025-11-13
  47. Anyone running code model in cpu only VPS? -- 2025-11-13
  48. Hardware recommendations for Ollama for homelab -- 2025-11-10
  49. [Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
  50. Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
  51. Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
  52. Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
  53. [Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
  54. AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
  55. Natural Language Programming: Run Natural Language as Script -- 2025-11-02
  56. Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
  57. It has been 4 hrs since the release of nanochat from Karpathy and no sign of it here! A new full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase -- 2025-10-19
  58. Reproducing Karpathy’s NanoChat on a Single GPU — Step by Step with AI Tools -- 2025-10-17
  59. Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark -- 2025-10-17
  60. Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. -- 2025-10-17
  61. [Editorial] The best ChatGPT that $100 can buy. -- 2025-10-16
  62. [Editorial] Train your own LLM -- 2025-10-16
  63. Writing an LLM from scratch, part 22 – training our LLM -- 2025-10-16
  64. Kwaipilot/KAT-Dev-72B-Exp -- 2025-10-16
  65. What's the best local LLM for coding I can run on MacBook Pro M4 32Gb? -- 2025-10-12
  66. How do you benchmark the cognitive performance of local LLM models? -- 2025-10-12
  67. Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration -- 2025-10-08
  68. ibm-granite/granite-docling-258M -- 2025-10-08
  69. What are the best models for legal work in Oct 2025? -- 2025-10-07
  70. [Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
  71. princeton-pli/RLMT -- 2025-10-07
  72. TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
  73. I created a simple tool to manage your llama.cpp settings & installation -- 2025-10-03
  74. Looking for a web-based open-source Claude agent/orchestration framework (not for coding, just orchestration) -- 2025-10-03
  75. Codexia GUI for Codex CLI new features -- 2025-10-03
  76. ArchGW 🚀 - Use Ollama-based LLMs with Anthropic client (release 0.3.13) -- 2025-10-03
  77. How am I supposed to know which third party provider can be trusted not to completely lobotomize a model? -- 2025-09-28
  78. A step by step guide on how to build a LLM from scratch -- 2025-09-28
  79. YannQi/R-4B -- 2025-09-28
  80. AI and licensing (commercial use) -- 2025-09-26
  81. I trained an LLM from scratch AMA! -- 2025-09-26
  82. pengzhangzhi/Open-dLLM -- 2025-09-26
  83. Seeking Local LLM Recommendations for AST Generation (by Function Calling) -- 2025-09-24
  84. Uncensored LLM -- 2025-09-23
  85. Building RAG Systems at Enterprise Scale: Our Lessons and Challenges -- 2025-09-23
  86. Can you use a Claude Max account with Cascade? -- 2025-09-22
  87. Permanently alter context history from function -- 2025-09-22
  88. How do you use agents.md in codex cli or vs code extension? -- 2025-09-22
  89. Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux -- 2025-09-21
  90. gpt-oss-20b TTFT very slow with llama.cpp? -- 2025-09-21
  91. Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
  92. Built an OpenWebUI Mobile Companion (Conduit): Alternative to Commercial Chat Apps -- 2025-09-16
  93. Local LLM suite on iOS powered by llama cpp - with web search and RAG -- 2025-09-16
  94. What are the local TTS models with voice cloning? -- 2025-09-16
  95. [OSS] Beelzebub — “Canary tools” for AI Agents via MCP -- 2025-09-12
  96. Defeating Nondeterminism in LLM Inference -- 2025-09-12
  97. This Week in Security: NPM, Kerbroasting, and The Rest of the Story -- 2025-09-12
  98. Is the "cost of inference" going up or down? -- 2025-09-08
  99. Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
  100. Kwaipilot/KAT-V1-40B -- 2025-09-06
  101. Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
  102. I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them -- 2025-09-05
  103. Good setup for coder LLM under 12GB VRam and 64GB DDR5? -- 2025-09-04
  104. openai/gpt-oss-120b -- 2025-09-04
  105. Some thoughts on LLMs and software development -- 2025-08-30
  106. LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
  107. Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
  108. unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
  109. baichuan-inc/Baichuan-M2-32B -- 2025-08-28
  110. [Editorial] AI, cve, auto exploitation -- 2025-08-26
  111. [Editorial] Promptware Attacks Against LLM-Powered Assistants -- 2025-08-26
  112. [Editorial] AI portscan -- 2025-08-26
  113. Prompt Obfuscation -- 2025-08-26
  114. synacktiv/GroupPolicyBackdoor -- 2025-08-26
  115. DavidBuchanan314/anubis_offload -- 2025-08-26
  116. Some legend finally posted working quants of GLM-4.5 Air for Ollama -- 2025-08-24
  117. Had some beginner questions regarding how to use Ollama? -- 2025-08-24
  118. AI Mode in Search gets new agentic features and expands globally -- 2025-08-24
  119. Practical approach for streaming UI from LLMs -- 2025-08-24
  120. New Tool for Finding Why Your LLM Inference is Slow -- 2025-08-14
  121. I ran OpenAI’s GPT-OSS 20B locally on a 16GB Mac with Ollama — setup, gotchas, and mini demo -- 2025-08-14
  122. GLM 4.5 Air - Optimizing - Vulkan vs. CUDA? -- 2025-08-14
  123. google/gemma-3n-E4B -- 2025-08-07
  124. IntervitensInc/pangu-pro-moe-model -- 2025-08-07
  125. Welcome GPT OSS, the new open-source model family from OpenAI! -- 2025-08-07
  126. PSA: zai/glm-4.5 is absolutely crushing it for coding - way better than Claude’s recent performance -- 2025-08-03
  127. Wan-AI/Wan2.2-TI2V-5B -- 2025-08-03
  128. moonshotai/Kimi-K2-Instruct -- 2025-08-03
  129. realtime-ai/blastoff-llm -- 2025-08-02
  130. On the Predictive Power of Representation Dispersion in Language Models -- 2025-08-01
  131. [Editorial] The Anatomy of a Modern LLM -- 2025-07-31
  132. PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
  133. Building a quiet LLM machine for 24/7 use, is this setup overkill or smart? -- 2025-07-30
  134. haykgrigo3/TimeCapsuleLLM -- 2025-07-30
  135. unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-07-30
  136. PhysicsWallahAI/Aryabhata-1.0 -- 2025-07-30
  137. How are people extracting system prompts? -- 2025-07-29
  138. cherrydra/mcpurl -- 2025-07-29
  139. LLMs are bad at returning code in JSON -- 2025-07-29
  140. Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems -- 2025-07-27
  141. Shanghai AI Lab Just Released a Massive 97-Page Safety Evaluation of Frontier AI Models - Here Are the Most Concerning Findings -- 2025-07-27
  142. Chain-GPT/Solidity-LLM -- 2025-07-25
  143. Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
  144. Qwen Code: A command-line AI workflow tool, optimized for Qwen3-Coder models -- 2025-07-23
  145. Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization -- 2025-07-23
  146. Ignoring instructions? Or am I dumb? (claude.md) -- 2025-07-18
  147. Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
  148. Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
  149. Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
  150. (Kramer UI for Ollama) I was tired of dealing with Docker, so I built a simple, portable Windows UI for Ollama. -- 2025-07-11
  151. BastionChat: Finally got Qwen3 + Gemma3 (thinking models) running locally on iPhone/iPad with full RAG and voice mode -- 2025-07-11
  152. Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
  153. Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
  154. High Precision -- 2025-07-09
  155. skt/A.X-4.0 -- 2025-07-09
  156. Medical language model - for STT and summarize things -- 2025-07-09
  157. Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
  158. Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
  159. Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
  160. Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
  161. Helping someone build a local continuity LLM for writing and memory—does this setup make sense? -- 2025-07-06
  162. Is running a local LLM useful? How? -- 2025-07-06
  163. Any local models that has less restraints? -- 2025-07-06
  164. baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-07-06
  165. baidu/ERNIE-4.5-300B-A47B-PT -- 2025-07-06
  166. skt/A.X-4.0-Light -- 2025-07-04
  167. ChatDOC/OCRFlux-3B -- 2025-07-04
  168. Self-hosting LLaMA: What are your biggest pain points? -- 2025-07-02
  169. Built memX: a shared memory backend for LLM agents (demo + open-source code) -- 2025-06-29
  170. Automatically Evaluating AI Coding Assistants with Each Git Commit (Open Source) -- 2025-06-29
  171. Secure Minions: private collaboration between Ollama and frontier models -- 2025-06-29
  172. Privacy implications of sending data to OpenRouter -- 2025-06-29
  173. Exploring Practical Uses for Small Language Models (e.g., Microsoft Phi) -- 2025-06-29
  174. LLM with OCR capabilities -- 2025-06-29
  175. How to create a speech recognition model from scratch -- 2025-06-29
  176. Arch 0.3.0 is out - I added support for the Claude family of LLMs in the proxy server framework for agents 🚀 -- 2025-06-29
  177. Gemini Cli MCP Agent just released ! -- 2025-06-29
  178. Freeplane xml mind maps locally: only Qwen3 and Phi4 Reasoning Plus can create them in one shot? -- 2025-06-29
  179. automated debugging using Ollama -- 2025-06-27
  180. Knowledge Database Advise needed/ Local RAG for IT Asset Discovery - Best approach for varied data? -- 2025-06-27
  181. Need feedback for a RAG using Ollama as background. -- 2025-06-27
  182. Best tutorial for installing a local llm with GUI setup? -- 2025-06-27
  183. Create 2 and 3-bit GPTQ quantization for Qwen3-235B-A22B? -- 2025-06-27
  184. Ollama Frontend/GUI -- 2025-06-27
  185. Newtonian Formulation of Attention: Treating Tokens as Interacting Masses? -- 2025-06-27
  186. Gemini CLI: Open-source AI agent. Write code, debug, and automate tasks with Gemini 2.5 Pro with industry-leading high usage limits at no cost. -- 2025-06-27
  187. A Plan for SIMD -- 2025-06-27
  188. Squiggle: A simple programming language for intuitive probabilistic estimation -- 2025-06-27
  189. I Built a Symbolic Cognitive System to Fix AI Drift — It’s Now Public (SCS 2.0) -- 2025-06-27
  190. Menlo/Jan-nano-128k -- 2025-06-25
  191. Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
  192. Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
  193. Chonkie update. -- 2025-06-25
  194. Memory Layer Compatible with Local Llama -- 2025-06-25
  195. Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
  196. How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
  197. After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
  198. The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
  199. Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
  200. Skywork/Skywork-SWE-32B -- 2025-06-25
  201. moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
  202. POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
  203. XiaomiMiMo/MiMo -- 2025-06-22
  204. nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
  205. Menlo/Jan-nano -- 2025-06-22
  206. Quartet - a new algorithm for training LLMs in native FP4 on 5090s -- 2025-06-22
  207. Open Discussion: Improving HTML-to-Markdown Extraction Using Local LLMs (7B/8B, llama.cpp) – Seeking Feedback on My Approach! -- 2025-06-22
  208. Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) -- 2025-06-22
  209. 🧙‍♂️ I Built a Local AI Dungeon Master – Meet Dungeo_ai (Open Source & Powered by your local LLM ) -- 2025-06-22
  210. Got an LLM to write a fully standards-compliant HTTP 2.0 server via a code-compile-test loop -- 2025-06-22
  211. Use offline voice controlled agents to search and browse the internet with a contextually aware LLM in the next version of AI Runner -- 2025-06-22
  212. ReMind: AI-Powered Study Companion that Transforms how You Retain Knowledge! -- 2025-06-22
  213. My AI coding workflow that's actually working (not just hype) -- 2025-06-22
  214. Double-Entry Ledgers: The Missing Primitive in Modern Software -- 2025-06-22
  215. gpt_agents.py -- 2025-06-22
  216. Demo Video of AutoBE, Backend Vibe Coding Agent Achieving 100% Compilation Success (Open Source) -- 2025-06-19
  217. How to set up local llms on a 6700 xt -- 2025-06-19
  218. Jetson Orin AGX 32gb -- 2025-06-19
  219. AMD GPU support -- 2025-06-19
  220. Much lower performance for Mistral-Small 24B on RTX 3090 and from deepinfra API -- 2025-06-19
  221. Extract Website Information -- 2025-06-19
  222. Looking for a verified copy of big-lama.ckpt (181MB) used in the original LaMa inpainting model trained on Places2. -- 2025-06-19
  223. Is it true that all tools like Cline/Copilot Agent/Roo Code/Windsurf/Claude Code/Cursor are roughly the same thing? -- 2025-06-19
  224. SkyRoof: New Ham Satellite Tracking and SDR Receiver Software -- 2025-06-19
  225. 100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo -- 2025-06-17
  226. lyeslabs/mcpgen -- 2025-06-17
  227. binary-husky/gpt_academic -- 2025-06-16
  228. [Update] Rensa: added full CMinHash + OptDensMinHash support (fast MinHash in Rust for dataset deduplication / LLM fine-tuning) -- 2025-06-15
  229. Open Source Unsiloed AI Chunker (EF2024) -- 2025-06-15
  230. ether0 - Mistral 24B with RL on several molecular design tasks in chemistry -- 2025-06-15
  231. Need selfhosted AI to generate better bash scripts and ansible playbooks -- 2025-06-15
  232. How do I finetune Devstral with vision support? -- 2025-06-15
  233. What's the best approach for including niche dependency source files and associated documentation reference material in context? -- 2025-06-15
  234. Airlines Don't Want You to Know They Sold Your Flight Data to DHS -- 2025-06-15
  235. John Deere Must Face Second Right to Repair Lawsuit -- 2025-06-15
  236. What vector database and embeddings are y'all using -- 2025-06-15
  237. Turn based two model critique for rounds to refine answer - any examples or FOSS projects? -- 2025-06-15
  238. POC: Running up to 123B as a Letterfriend on <300€ for all hardware. -- 2025-06-07
  239. Using LLaMA 3 locally to plan macOS UI actions (Vision + Accessibility demo) -- 2025-06-07
  240. Sarvam-M a 24B open-weights hybrid reasoning model -- 2025-06-07
  241. why isn’t anyone building legit tools with local LLMs? -- 2025-06-07
  242. Anyone on Oahu want to let me borrow an RTX 6000 Pro to benchmark against this dual 5090 rig? -- 2025-06-07
  243. Strange memory usage -- 2025-06-07
  244. jax and jaxlib in ubuntu -- 2025-06-07
  245. We accidentally solved the biggest bottleneck in vibe coding: secret sprawl aka secret leaks -- 2025-06-07
  246. Improving performance of rav1d video decoder -- 2025-06-07
  247. KumoRFM: Gen-purpose model for making instant predictions over relational data -- 2025-06-07
  248. The copilot delusion -- 2025-06-07
  249. AI is getting insane (generating 3d models ChatGPT + 3daistudio.com or open source models) -- 2025-06-07
  250. Is Qwen the new face of local LLMs? -- 2025-06-07
  251. mcp-use/mcp-use -- 2025-05-30
  252. Cloi CLI: Local debugging agent that runs in your terminal -- 2025-05-30
  253. mistralai/Devstral-Small-2505_gguf -- 2025-05-30
  254. Open-Sourced Multimodal Large Diffusion Language Models -- 2025-05-30
  255. Built a Python library for text classification because I got tired of reinventing the wheel -- 2025-05-30
  256. Cleaning up responses to fix up synthetic data -- 2025-05-30
  257. My Gemma-3 musing .... after a good time dragging it through a grinder -- 2025-05-30
  258. Title: Seeking Help: A -- 2025-05-30
  259. Trying to learn ML - Book Recommendations -- 2025-05-30
  260. How to have cursor auto-apply code suggestions? -- 2025-05-30
  261. Triangle splatting: radiance fields represented by triangles -- 2025-05-30
  262. Why I Built My Own Audio Player -- 2025-05-30
  263. Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming -- 2025-05-30
  264. IDEA: Record your voice prompts, copy them straight into Ollama (100% local) -- 2025-05-30
  265. Migration to Postgres - Success -- 2025-05-30