LLMs

Model releases, capabilities, comparisons, architectures

280 articles across 75 editions

Articles

End of an Agony: Post-Mortem of a Real Production LLM Service — 'If LLM Decided It Will Shit Itself, It Will Stay Shat' -- 2026-07-07
Every Eval Ever: Community Evaluation Results Featured on HuggingFace Model Pages -- 2026-07-07
Making LLMs Better at Creative Writing Using Entropy -- 2026-07-07
55 LLMs blind-grade each other: 22K judgments reveal systematic same-family bias -- 2026-06-30
If LLMs Have Human-Like Attributes, Then So Does Age of Empires II -- 2026-06-30
The Doorman's Fallacy in action -- 2026-06-30
[Editorial] -- 2026-06-18
[Editorial] -- 2026-06-18
ESP32 Bit Pirate, a Hardware Hacking Tool with WebCLI That Speaks Every Protocol -- 2026-06-09
Porting the ThinkPad X61 to Coreboot -- 2026-06-09
[Editorial] -- 2026-06-09
1k Data Breaches Later, the Disclosure Lag Is Worse -- 2026-06-09
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts -- 2026-05-26
aaron-kidwell/goLoL -- 2026-05-26
[Editorial] -- 2026-05-26
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] LLM Processing Internals -- 2026-02-18
Krites: Asynchronous Verified Semantic Caching for Tiered LLM Architectures -- 2026-02-18
The Strix Halo feels like an amazing super power [Activation Guide] -- 2026-02-18
[Editorial] https://windley.com/archives/2026/02/a_policy-aware_agent_loop_with_cedar_and_openclaw.shtml -- 2026-02-12
Open Source Kreuzberg benchmarks and new release -- 2026-02-12
[NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? -- 2026-02-12
I built a rough .gguf LLM visualizer -- 2026-02-12
Local-First Fork of OpenClaw for using open source models--LocalClaw -- 2026-02-12
Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
llama.cpp performance breakthrough for multi-GPU setups -- 2026-01-06
Llama 3.2 3B fMRI LOAD BEARING DIMS FOUND -- 2026-01-06
Hyperbolic Math w Mac GPU acceleration -- 2026-01-06
I built a local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP (LiveKit + Ollama + n8n) -- 2025-12-29
exllamav3 adds support for GLM 4.7 (and 4.6V, + Ministral & OLMO 3) -- 2025-12-29
Tencent just released WeDLM 8B Instruct on Hugging Face -- 2025-12-29
Gen 3D with local llm -- 2025-12-29
Offline vector DB experiment — anyone want to test on their local setup? -- 2025-12-29
Roo Code 3.37 | GLM 4.7 | MM 2.1 | Custom tools | MORE!!! -- 2025-12-29
mini-SGLang released: Learn how LLM inference actually works (5K lines, weekend-readable) -- 2025-12-18
Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) - Unsloth -- 2025-12-18
I vibe coded (I hope) useful tool for local LLMs inference -- 2025-12-18
I built a local Python agent that catches stderr and self-heals using Ollama. No cloud APIs involved. (Demo) -- 2025-12-18
mistralai/Devstral-Small-2-24B-Instruct-2512 -- 2025-12-18
running Deepseek v32 on consumer hardware llama.cpp/Sglang/vLLm -- 2025-12-15
Found a REAP variant of Qwen3-coder that I can use for 100K tokens in Roo Code on my macbook -- 2025-12-15
Understanding the new router mode in llama cpp server -- 2025-12-15
Letting a local Ollama model judge my AI agents and it’s surprisingly usable -- 2025-12-15
[Editorial] https://www.linkedin.com/posts/stuart-winter-tear_assessing-llms-for-serendipity-discovery-activity-7396596796938153984-JY9u -- 2025-12-12
Confused and unsure -- 2025-12-10
The "Confident Idiot" Problem: Why LLM-as-a-Judge fails in production. -- 2025-12-10
We gave 5 LLMs $100K to trade stocks for 8 months -- 2025-12-08
An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
Hardcore function calling benchmark in backend coding agent. -- 2025-11-26
Python script to stress-test LangChain agents against infinite loops (Open Logic) -- 2025-11-26
Browser extension Powered by Ollama for Code Reviews on Gitlab and Azure DO -- 2025-11-26
Mimir - Oauth and GDPR++ compliance + vscode plugin update -- 2025-11-26
alpkeskin/gotoon -- 2025-11-26
Cross-GPU prefix KV reuse with RDMA / NVLink - early experimental results -- 2025-11-13
When does RTX 6000 Pro make sense over a 5090? -- 2025-11-13
My (open-source) continuation (FlexAttention, RoPE, BlockMasks, Muon, etc.) to Karpathy's NanoGPT -- 2025-11-13
Anyone running code model in cpu only VPS? -- 2025-11-13
Hardware recommendations for Ollama for homelab -- 2025-11-10
[Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
[Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
Natural Language Programming: Run Natural Language as Script -- 2025-11-02
Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
It has been 4 hrs since the release of nanochat from Karpathy and no sign of it here! A new full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase -- 2025-10-19
Reproducing Karpathy’s NanoChat on a Single GPU — Step by Step with AI Tools -- 2025-10-17
Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark -- 2025-10-17
Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. -- 2025-10-17
[Editorial] The best ChatGPT that $100 can buy. -- 2025-10-16
[Editorial] Train your own LLM -- 2025-10-16
Writing an LLM from scratch, part 22 – training our LLM -- 2025-10-16
Kwaipilot/KAT-Dev-72B-Exp -- 2025-10-16
What's the best local LLM for coding I can run on MacBook Pro M4 32Gb? -- 2025-10-12
How do you benchmark the cognitive performance of local LLM models? -- 2025-10-12
Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration -- 2025-10-08
ibm-granite/granite-docling-258M -- 2025-10-08
What are the best models for legal work in Oct 2025? -- 2025-10-07
[Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
princeton-pli/RLMT -- 2025-10-07
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
I created a simple tool to manage your llama.cpp settings & installation -- 2025-10-03
Looking for a web-based open-source Claude agent/orchestration framework (not for coding, just orchestration) -- 2025-10-03
Codexia GUI for Codex CLI new features -- 2025-10-03
ArchGW 🚀 - Use Ollama-based LLMs with Anthropic client (release 0.3.13) -- 2025-10-03
How am I supposed to know which third party provider can be trusted not to completely lobotomize a model? -- 2025-09-28
A step by step guide on how to build a LLM from scratch -- 2025-09-28
YannQi/R-4B -- 2025-09-28
AI and licensing (commercial use) -- 2025-09-26
I trained an LLM from scratch AMA! -- 2025-09-26
pengzhangzhi/Open-dLLM -- 2025-09-26
Seeking Local LLM Recommendations for AST Generation (by Function Calling) -- 2025-09-24
Uncensored LLM -- 2025-09-23
Building RAG Systems at Enterprise Scale: Our Lessons and Challenges -- 2025-09-23
Can you use a Claude Max account with Cascade? -- 2025-09-22
Permanently alter context history from function -- 2025-09-22
How do you use agents.md in codex cli or vs code extension? -- 2025-09-22
Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux -- 2025-09-21
gpt-oss-20b TTFT very slow with llama.cpp? -- 2025-09-21
Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
Built an OpenWebUI Mobile Companion (Conduit): Alternative to Commercial Chat Apps -- 2025-09-16
Local LLM suite on iOS powered by llama cpp - with web search and RAG -- 2025-09-16
What are the local TTS models with voice cloning? -- 2025-09-16
[OSS] Beelzebub — “Canary tools” for AI Agents via MCP -- 2025-09-12
Defeating Nondeterminism in LLM Inference -- 2025-09-12
This Week in Security: NPM, Kerbroasting, and The Rest of the Story -- 2025-09-12
Is the "cost of inference" going up or down? -- 2025-09-08
Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
Kwaipilot/KAT-V1-40B -- 2025-09-06
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them -- 2025-09-05
Good setup for coder LLM under 12GB VRam and 64GB DDR5? -- 2025-09-04
openai/gpt-oss-120b -- 2025-09-04
Some thoughts on LLMs and software development -- 2025-08-30
LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
baichuan-inc/Baichuan-M2-32B -- 2025-08-28
[Editorial] AI, cve, auto exploitation -- 2025-08-26
[Editorial] Promptware Attacks Against LLM-Powered Assistants -- 2025-08-26
[Editorial] AI portscan -- 2025-08-26
Prompt Obfuscation -- 2025-08-26
synacktiv/GroupPolicyBackdoor -- 2025-08-26
DavidBuchanan314/anubis_offload -- 2025-08-26
Some legend finally posted working quants of GLM-4.5 Air for Ollama -- 2025-08-24
Had some beginner questions regarding how to use Ollama? -- 2025-08-24
AI Mode in Search gets new agentic features and expands globally -- 2025-08-24
Practical approach for streaming UI from LLMs -- 2025-08-24
New Tool for Finding Why Your LLM Inference is Slow -- 2025-08-14
I ran OpenAI’s GPT-OSS 20B locally on a 16GB Mac with Ollama — setup, gotchas, and mini demo -- 2025-08-14
GLM 4.5 Air - Optimizing - Vulkan vs. CUDA? -- 2025-08-14
google/gemma-3n-E4B -- 2025-08-07
IntervitensInc/pangu-pro-moe-model -- 2025-08-07
Welcome GPT OSS, the new open-source model family from OpenAI! -- 2025-08-07
PSA: zai/glm-4.5 is absolutely crushing it for coding - way better than Claude’s recent performance -- 2025-08-03
Wan-AI/Wan2.2-TI2V-5B -- 2025-08-03
moonshotai/Kimi-K2-Instruct -- 2025-08-03
realtime-ai/blastoff-llm -- 2025-08-02
On the Predictive Power of Representation Dispersion in Language Models -- 2025-08-01
[Editorial] The Anatomy of a Modern LLM -- 2025-07-31
PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
Building a quiet LLM machine for 24/7 use, is this setup overkill or smart? -- 2025-07-30
haykgrigo3/TimeCapsuleLLM -- 2025-07-30
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-07-30
PhysicsWallahAI/Aryabhata-1.0 -- 2025-07-30
How are people extracting system prompts? -- 2025-07-29
cherrydra/mcpurl -- 2025-07-29
LLMs are bad at returning code in JSON -- 2025-07-29
Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems -- 2025-07-27
Shanghai AI Lab Just Released a Massive 97-Page Safety Evaluation of Frontier AI Models - Here Are the Most Concerning Findings -- 2025-07-27
Chain-GPT/Solidity-LLM -- 2025-07-25
Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
Qwen Code: A command-line AI workflow tool, optimized for Qwen3-Coder models -- 2025-07-23
Chain-of-Descriptions: Improving Code LLMs for VHDL Code Generation and Summarization -- 2025-07-23
Ignoring instructions? Or am I dumb? (claude.md) -- 2025-07-18
Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
(Kramer UI for Ollama) I was tired of dealing with Docker, so I built a simple, portable Windows UI for Ollama. -- 2025-07-11
BastionChat: Finally got Qwen3 + Gemma3 (thinking models) running locally on iPhone/iPad with full RAG and voice mode -- 2025-07-11
Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
High Precision -- 2025-07-09
skt/A.X-4.0 -- 2025-07-09
Medical language model - for STT and summarize things -- 2025-07-09
Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
Helping someone build a local continuity LLM for writing and memory—does this setup make sense? -- 2025-07-06
Is running a local LLM useful? How? -- 2025-07-06
Any local models that has less restraints? -- 2025-07-06
baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-07-06
baidu/ERNIE-4.5-300B-A47B-PT -- 2025-07-06
skt/A.X-4.0-Light -- 2025-07-04
ChatDOC/OCRFlux-3B -- 2025-07-04
Self-hosting LLaMA: What are your biggest pain points? -- 2025-07-02
Built memX: a shared memory backend for LLM agents (demo + open-source code) -- 2025-06-29
Automatically Evaluating AI Coding Assistants with Each Git Commit (Open Source) -- 2025-06-29
Secure Minions: private collaboration between Ollama and frontier models -- 2025-06-29
Privacy implications of sending data to OpenRouter -- 2025-06-29
Exploring Practical Uses for Small Language Models (e.g., Microsoft Phi) -- 2025-06-29
LLM with OCR capabilities -- 2025-06-29
How to create a speech recognition model from scratch -- 2025-06-29
Arch 0.3.0 is out - I added support for the Claude family of LLMs in the proxy server framework for agents 🚀 -- 2025-06-29
Gemini Cli MCP Agent just released ! -- 2025-06-29
Freeplane xml mind maps locally: only Qwen3 and Phi4 Reasoning Plus can create them in one shot? -- 2025-06-29
automated debugging using Ollama -- 2025-06-27
Knowledge Database Advise needed/ Local RAG for IT Asset Discovery - Best approach for varied data? -- 2025-06-27
Need feedback for a RAG using Ollama as background. -- 2025-06-27
Best tutorial for installing a local llm with GUI setup? -- 2025-06-27
Create 2 and 3-bit GPTQ quantization for Qwen3-235B-A22B? -- 2025-06-27
Ollama Frontend/GUI -- 2025-06-27
Newtonian Formulation of Attention: Treating Tokens as Interacting Masses? -- 2025-06-27
Gemini CLI: Open-source AI agent. Write code, debug, and automate tasks with Gemini 2.5 Pro with industry-leading high usage limits at no cost. -- 2025-06-27
A Plan for SIMD -- 2025-06-27
Squiggle: A simple programming language for intuitive probabilistic estimation -- 2025-06-27
I Built a Symbolic Cognitive System to Fix AI Drift — It’s Now Public (SCS 2.0) -- 2025-06-27
Menlo/Jan-nano-128k -- 2025-06-25
Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
Chonkie update. -- 2025-06-25
Memory Layer Compatible with Local Llama -- 2025-06-25
Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
Skywork/Skywork-SWE-32B -- 2025-06-25
moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
XiaomiMiMo/MiMo -- 2025-06-22
nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
Menlo/Jan-nano -- 2025-06-22
Quartet - a new algorithm for training LLMs in native FP4 on 5090s -- 2025-06-22
Open Discussion: Improving HTML-to-Markdown Extraction Using Local LLMs (7B/8B, llama.cpp) – Seeking Feedback on My Approach! -- 2025-06-22
Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) -- 2025-06-22
🧙‍♂️ I Built a Local AI Dungeon Master – Meet Dungeo_ai (Open Source & Powered by your local LLM ) -- 2025-06-22
Got an LLM to write a fully standards-compliant HTTP 2.0 server via a code-compile-test loop -- 2025-06-22
Use offline voice controlled agents to search and browse the internet with a contextually aware LLM in the next version of AI Runner -- 2025-06-22
ReMind: AI-Powered Study Companion that Transforms how You Retain Knowledge! -- 2025-06-22
My AI coding workflow that's actually working (not just hype) -- 2025-06-22
Double-Entry Ledgers: The Missing Primitive in Modern Software -- 2025-06-22
gpt_agents.py -- 2025-06-22
Demo Video of AutoBE, Backend Vibe Coding Agent Achieving 100% Compilation Success (Open Source) -- 2025-06-19
How to set up local llms on a 6700 xt -- 2025-06-19
Jetson Orin AGX 32gb -- 2025-06-19
AMD GPU support -- 2025-06-19
Much lower performance for Mistral-Small 24B on RTX 3090 and from deepinfra API -- 2025-06-19
Extract Website Information -- 2025-06-19
Looking for a verified copy of big-lama.ckpt (181MB) used in the original LaMa inpainting model trained on Places2. -- 2025-06-19
Is it true that all tools like Cline/Copilot Agent/Roo Code/Windsurf/Claude Code/Cursor are roughly the same thing? -- 2025-06-19
SkyRoof: New Ham Satellite Tracking and SDR Receiver Software -- 2025-06-19
100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo -- 2025-06-17
lyeslabs/mcpgen -- 2025-06-17
binary-husky/gpt_academic -- 2025-06-16
[Update] Rensa: added full CMinHash + OptDensMinHash support (fast MinHash in Rust for dataset deduplication / LLM fine-tuning) -- 2025-06-15
Open Source Unsiloed AI Chunker (EF2024) -- 2025-06-15
ether0 - Mistral 24B with RL on several molecular design tasks in chemistry -- 2025-06-15
Need selfhosted AI to generate better bash scripts and ansible playbooks -- 2025-06-15
How do I finetune Devstral with vision support? -- 2025-06-15
What's the best approach for including niche dependency source files and associated documentation reference material in context? -- 2025-06-15
Airlines Don't Want You to Know They Sold Your Flight Data to DHS -- 2025-06-15
John Deere Must Face Second Right to Repair Lawsuit -- 2025-06-15
What vector database and embeddings are y'all using -- 2025-06-15
Turn based two model critique for rounds to refine answer - any examples or FOSS projects? -- 2025-06-15
POC: Running up to 123B as a Letterfriend on <300€ for all hardware. -- 2025-06-07
Using LLaMA 3 locally to plan macOS UI actions (Vision + Accessibility demo) -- 2025-06-07
Sarvam-M a 24B open-weights hybrid reasoning model -- 2025-06-07
why isn’t anyone building legit tools with local LLMs? -- 2025-06-07
Anyone on Oahu want to let me borrow an RTX 6000 Pro to benchmark against this dual 5090 rig? -- 2025-06-07
Strange memory usage -- 2025-06-07
jax and jaxlib in ubuntu -- 2025-06-07
We accidentally solved the biggest bottleneck in vibe coding: secret sprawl aka secret leaks -- 2025-06-07
Improving performance of rav1d video decoder -- 2025-06-07
KumoRFM: Gen-purpose model for making instant predictions over relational data -- 2025-06-07
The copilot delusion -- 2025-06-07
AI is getting insane (generating 3d models ChatGPT + 3daistudio.com or open source models) -- 2025-06-07
Is Qwen the new face of local LLMs? -- 2025-06-07
mcp-use/mcp-use -- 2025-05-30
Cloi CLI: Local debugging agent that runs in your terminal -- 2025-05-30
mistralai/Devstral-Small-2505_gguf -- 2025-05-30
Open-Sourced Multimodal Large Diffusion Language Models -- 2025-05-30
Built a Python library for text classification because I got tired of reinventing the wheel -- 2025-05-30
Cleaning up responses to fix up synthetic data -- 2025-05-30
My Gemma-3 musing .... after a good time dragging it through a grinder -- 2025-05-30
Title: Seeking Help: A -- 2025-05-30
Trying to learn ML - Book Recommendations -- 2025-05-30
How to have cursor auto-apply code suggestions? -- 2025-05-30
Triangle splatting: radiance fields represented by triangles -- 2025-05-30
Why I Built My Own Audio Player -- 2025-05-30
Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming -- 2025-05-30
IDEA: Record your voice prompts, copy them straight into Ollama (100% local) -- 2025-05-30
Migration to Postgres - Success -- 2025-05-30