Reasoning Models

Chain of thought, thinking models, math and logic reasoning

387 articles across 123 editions

Articles

DyCon: Dynamic Reasoning Control via Evolving Difficulty Modeling -- 2026-07-09
Particle Scattering Sampler for llama.cpp -- 2026-07-06
[Editorial] NVIDIA Just Open-Sourced a Polyamorous AI -- 2026-07-06
[Paper] Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling -- 2026-07-06
55 LLMs blind-grade each other: 22K judgments reveal systematic same-family bias -- 2026-06-30
If LLMs Have Human-Like Attributes, Then So Does Age of Empires II -- 2026-06-30
The Doorman's Fallacy in action -- 2026-06-30
LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels -- 2026-06-26
Got GLM-5.2 + MTP speculative decode running on 4x DGX Spark (GB10) — and the build piece the public recipe is missing -- 2026-06-26
Run a vLLM Server on HF Jobs in One Command -- 2026-06-26
Running Sonnet 4.6 on every Instagram DM for a 7-location restaurant. 97% cache hit is the only reason it's affordable -- 2026-06-26
[Editorial] rupixel -- 2026-06-26
OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation -- 2026-06-19
Beyond LoRA: Can you beat the most popular fine-tuning technique? -- 2026-06-19
[Editorial] Can LLMs Be Computers? -- 2026-06-15
[Editorial] Reasoning Language Models — RLM and MinRLM -- 2026-06-15
[Editorial] Research Paper — AI/ML Methods -- 2026-06-15
[Editorial] Steve Yegge on Services and Complexity -- 2026-06-15
Holo3.1: Fast & Local Computer Use Agents -- 2026-06-03
Fused MoE dispatch kernel in pure Triton: 89-131% of Megablocks, runs on AMD with zero code changes -- 2026-06-03
ReAligned-Qwen3.5 Release -- 2026-06-03
KANX: A production-ready Kolmogorov-Arnold Network library -- 2026-06-03
SWE-rebench Leaderboard (March, April and May 2026): GPT-5.5, Opus 4.7, Cursor (Composer 2.5), Kimi K2.6 and More -- 2026-05-28
The frontier reasoning race is starting to look like a crowded subway station -- 2026-05-28
MiMo-V2.5-coder -- 2026-05-28
An OpenAI model has disproved a central conjecture in discrete geometry -- 2026-05-21
I've joined Anthropic -- 2026-05-21
Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct -- 2026-05-20
llama: avoid copying logits during prompt decode in MTP by am17an - Pull Request #23198 - ggml-org/llama.cpp -- 2026-05-20
Extension idea: llama-server with custom samplers -- 2026-05-20
Simpler self hosted alt to Open WebUI -- 2026-05-20
EMO: Pretraining mixture of experts for emergent modularity -- 2026-05-13
[Editorial] -- 2026-05-13
2.5x Faster Inference with Qwen 3.6 27B Using MTP — Complete Hardware Guide -- 2026-05-08
Atlas Is Now Open Source — Pure Rust+CUDA Inference Engine for Blackwell -- 2026-05-08
antirez/ds4 — DeepSeek 4 Flash Local Inference Engine for Metal -- 2026-05-08
[Editorial] dflash — Flash Inference Tool -- 2026-05-08
Heretic 1.3: Reproducible Abliterated Models, Integrated Benchmarking, Reduced VRAM -- 2026-05-08
Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! -- 2026-05-06
noonghunna/club-3090 — Community LLM serving recipes for RTX 3090 -- 2026-05-06
Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB -- 2026-05-06
Decoupled Attention from Weights - Gemma 4 26B -- 2026-05-06
Writing an LLM Compiler from Scratch: PyTorch to CUDA -- 2026-05-04
Karpathy's MicroGPT Running at 50,000 Tokens/Second on an FPGA -- 2026-05-04
Hipfire: Full AMD Architecture Validation Across RDNA 1–4, Strix Halo, and BC250 -- 2026-05-04
Qwen 3.6-35B KV Cache Benchmark: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M Context -- 2026-05-04
llama.cpp DeepSeek v4 Flash experimental inference -- 2026-05-01
llama.cpp benchmark native vs. non native NVFP4 on Blackwell — summary -- 2026-05-01
Speculative decoding with Gemma-4-31B + Gemma-4-E2B enables 120-200 tok/s -- 2026-05-01
The 4B class of 2026 (benchmark) -- 2026-05-01
Study: 2x+ coding performance of 7B model without touching the coding agent -- 2026-05-01
[Editorial] RuVLLM ESP32 v0.3.0-rc2 — LLM Inference on Microcontrollers -- 2026-05-01
Zyphra/ZUNA — New Model on Hugging Face -- 2026-05-01
[Editorial] Schmidhuber: The World Model Boom -- 2026-04-20
[Editorial] Schmidhuber: World Models, Planning & Curiosity (1990 Origins) -- 2026-04-20
[Editorial] The Ontology Problem (Technical) — Kurt Cagle -- 2026-04-20
All elementary functions from a single binary operator -- 2026-04-14
Falcon Perception -- 2026-04-02
TRL v1.0: Post-Training Library Built to Move with the Field -- 2026-04-02
lucas-maes/le-wm -- 2026-04-02
Toward explaining why traditional ablation/abliteration works -- 2026-04-02
The Cognitive Dark Forest -- 2026-03-30
$500K/year pharmacovigilance platform replicated in a weekend with Claude Code -- 2026-03-30
Google Stitch is insane -- 2026-03-30
[Editorial] Time Traveled from Frontier AI -- 2026-03-30
[Editorial] Ambient Intelligence -- 2026-03-30
Controllable Reasoning Models Are Private Thinkers -- 2026-03-26
Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it -- 2026-03-26
Gemini thoughts turned very violent -- 2026-03-26
Towards a Neural Debugger for Python -- 2026-03-26
Mathematics behind extreme quantization of Microsoft's BitNet -- 2026-03-26
A.T.L.A.S - Adaptive Test-time Learning and Autonomous Specialization -- 2026-03-26
[Editorial] -- 2026-03-25
[Editorial] -- 2026-03-25
[Editorial] -- 2026-03-25
[Editorial] -- 2026-03-25
[Editorial] -- 2026-03-25
[Editorial] arxiv:2603.15371 -- 2026-03-23
[Editorial] The Open World / Closed World Conundrum -- 2026-03-16
[Editorial] W3C Context Graph Community Group -- 2026-03-16
[Editorial] LLM vs LRM -- 2026-03-10
To everyone using still ollama/lm-studio... llama-swap is the real deal -- 2026-03-10
[Editorial] Markov Chains -- 2026-03-10
[Editorial] Speeding One Cog Breaks the Machine -- 2026-03-10
[Editorial] KatanaLarp -- 2026-03-07
[Editorial] Research Paper -- 2026-03-07
[Editorial] The Builders PRD -- 2026-03-07
[Editorial] Our Design Docs Write Themselves -- 2026-03-05
[Editorial] Claude Code: Brilliant Until the Repo... -- 2026-03-05
[Editorial] Context Is King, But Bad Context Is Poison -- 2026-03-05
[Editorial] Semantic Anchors for LLM Coding -- 2026-03-05
Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers -- 2026-02-25
O(1) Inference and Causal Monoid State Compression in Spartacus-1B -- 2026-02-25
Sink-Aware Pruning for Diffusion Language Models -- 2026-02-25
Consistency of Large Reasoning Models Under Multi-Turn Attacks -- 2026-02-16
[Editorial] AI Testing and Quality Engineering -- 2026-02-16
[Editorial] https://github.com/GMaN1911/claude-cognitive -- 2026-01-02
SA-RAG: Using spreading activation to improve multi-hop retrieval in RAG systems -- 2026-01-02
Is there a way to see what is trashing my context? -- 2026-01-02
A zero-setup agent that benchmarks multiple open / closed source LLMs on your specific problem / data -- 2026-01-02
What is a good model for assisting with patching source code? -- 2026-01-02
Just got an RTX Pro 6000 - need recommendations for processing a massive dataset with instruction following -- 2026-01-02
MiniMaxAI/MiniMax-M2.1 -- 2026-01-02
[Editorial] https://www.linkedin.com/posts/stuart-winter-tear_realist-and-pluralist-conceptions-of-intelligence-activity-7397231918871703554-FmSP -- 2025-12-11
VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection -- 2025-12-11
Nanbeige4-3B: Lightweight with strong reasoning capabilities -- 2025-12-10
mistralai/Devstral-2-123B-Instruct-2512 -- 2025-12-10
MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
PrimeIntellect/INTELLECT-3 -- 2025-12-04
cerebras/MiniMax-M2-REAP-162B-A10B -- 2025-12-04
62-day fixed-prompt probe on Grok-4: strong semantic attractors, thematic inversion, and refusal onset (1,242 samples, fully public) -- 2025-12-03
I built an open-source "Passport" for Claude Agents (MCP) so they can cryptographically sign their own actions -- 2025-12-01
Implemented Anthropic's Programmatic Tool Calling with Langchain so you can use it with any models and tune it for your own use case -- 2025-12-01
CodeModeToon -- 2025-12-01
WeiboAI/VibeThinker-1.5B -- 2025-11-28
[Editorial] https://ai.google.dev/gemini-api/docs/prompting-strategies#agentic-si-template -- 2025-11-28
An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
I built an open-source CLI that generates context.json bundles for React/TypeScript projects -- 2025-11-28
GraphLite: An Embeddable Graph Database with ISO Graph Query Language Support -- 2025-11-26
allenai/Olmo-3-32B-Think -- 2025-11-25
tencent/HunyuanOCR -- 2025-11-25
peteromallet/Qwen-Image-Edit-InScene -- 2025-11-25
[Editorial] https://www.linkedin.com/posts/stuart-winter-tear_decision-making-amid-information-based-threats-activity-7396539314815533056-4pVx -- 2025-11-18
[Editorial] https://gist.github.com/ruvnet/d6d2739400943037443b78c3ef86d8a5 -- 2025-11-18
[Editorial] https://github.com/mrwadams/stride-gpt/blob/master/docs/operationalization-guide.md -- 2025-11-18
janhq/Jan-v2-VL-high -- 2025-11-18
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers -- 2025-11-18
[Editorial] https://arxiv.org/pdf/2506.21734 -- 2025-11-11
OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval -- 2025-11-11
[Editorial] https://www.linkedin.com/posts/andriyburkov_this-paper-shows-a-27-million-parameter-model-activity-7393432619365052416-SFLO -- 2025-11-10
Trajectory Distillation for Foundation Models -- 2025-11-10
sail-sg/Precision-RL -- 2025-11-10
inclusionAI/LLaDA2.0-flash-preview -- 2025-11-10
AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
Natural Language Programming: Run Natural Language as Script -- 2025-11-02
Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs -- 2025-10-31
[Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc -- 2025-10-31
Raezil/lattice-agent -- 2025-10-31
driaforall/mem-agent -- 2025-10-31
Memp: Exploring Agent Procedural Memory -- 2025-10-31
I built the HuggingChat Omni Router 🥳 🎈 -- 2025-10-28
Claude Code 2.0.27 -- 2025-10-28
ltjed/freephdlabor -- 2025-10-28
Show HN: Whatdidido – CLI to summarize your work from Jira/Linear -- 2025-10-28
Reasoning should be thought of as a drawback, not a feature -- 2025-10-21
inclusionAI/Ring-1T-preview -- 2025-10-21
Learning Lifted Action Models From Traces of Incomplete Actions and States -- 2025-10-20
I got tired of OpenAI dependency. Built a multi-LLM control center instead. -- 2025-10-19
Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
Claude Code taking a coffee break 🤔 -- 2025-10-19
Show HN: Cmux – Coding Agent Multiplexer -- 2025-10-19
[Editorial] Sqlite vector -- 2025-10-17
Meta Superintelligence group publishes paper on new RAG technique -- 2025-10-17
[Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
[Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
I tested if tiny LLMs can self-improve through memory: Qwen3-1.7B gained +8% accuracy on MATH problems -- 2025-10-16
Tested 9 RAG query transformation techniques – HydE is absurdly underrated -- 2025-10-16
GPT-OSS from Scratch on AMD GPUs -- 2025-10-11
How do I compare cost per token for serverless vs provisioned hardware? -- 2025-10-11
OpenAI is good at deals -- 2025-10-11
meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
adb1274/batchi -- 2025-10-11
What are the best models for legal work in Oct 2025? -- 2025-10-07
[Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
princeton-pli/RLMT -- 2025-10-07
TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models -- 2025-10-05
swiss-ai/Apertus-8B-2509 -- 2025-10-04
Qwen3-Omni thinking model running on local H100 (major leap over 2.5) -- 2025-09-30
Seeking Advice: Best Model + Framework for Max Tokens/sec on Dual L40S (Testing Rig) -- 2025-09-30
For local models, has anyone benchmarked tool calling protocols performance? -- 2025-09-30
A step by step guide on how to build a LLM from scratch -- 2025-09-28
YannQi/R-4B -- 2025-09-28
New Agent benchmark from Meta Super Intelligence Lab and Hugging Face -- 2025-09-27
evalops/dspy-micro-agent -- 2025-09-27
nvidia/NVIDIA-Nemotron-Nano-9B-v2 -- 2025-09-27
inclusionAI/Ling-flash-2.0 -- 2025-09-27
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model -- 2025-09-26
CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation -- 2025-09-26
A1: Asynchronous Test-Time Scaling via Conformal Prediction -- 2025-09-25
DeepLink-org/DeepTrace -- 2025-09-25
GLM 4.5 Air Template Breaking llamacpp Prompt Caching -- 2025-09-25
Tracking prompt evolution for RAG systems - anyone else doing this? -- 2025-09-25
MAESTRO v0.1.6 Update: Better support for models that struggle with JSON mode (DeepSeek, Kimi K2, etc.) -- 2025-09-25
Dead-simple example code for Ollama function calling. -- 2025-09-25
nvidia/NVIDIA-Nemotron-Nano-12B-v2 -- 2025-09-23
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning -- 2025-09-22
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward -- 2025-09-22
support for the upcoming Olmo3 model has been merged into llama.cpp -- 2025-09-21
Running Nvidia CUDA Pytorch/vLLM projects and pipelines on AMD with no modifications -- 2025-09-21
A Quick Look At The AMD Instinct MI355X With ROCm 7.0 -- 2025-09-21
Uncensored AI model for from 4b Max 8b -- 2025-09-21
GPT-OSS-120B Performance Benchmarks and Provider Trade-Offs -- 2025-09-20
Why are there three different Codex variants? -- 2025-09-20
zli12321/Vision-SR1 -- 2025-09-19
lrzjason/Comfyui-QwenEditUtils -- 2025-09-19
Mini-o3/Mini-o3 -- 2025-09-19
vLLM is kinda awesome -- 2025-09-19
Public AI on Hugging Face Inference Providers 🔥 -- 2025-09-19
HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data -- 2025-09-19
GPT-OSS:20b & Qwen 4b are a match made in heaven for 24GB VRAM builds -- 2025-09-18
Was working in RAG recently got to know how well Gemma3 4B performs -- 2025-09-18
[Editorial] which patterns truly survived compression -- 2025-09-16
[Editorial] AI Kill Chain -- 2025-09-16
TsinghuaC3I/Unify-Post-Training -- 2025-09-16
[Editorial] Tricks from OpenAI gpt-oss YOU can use with transformers -- 2025-09-15
openbmb/MiniCPM4.1-8B -- 2025-09-15
nunchaku-tech/nunchaku-qwen-image -- 2025-09-15
ggml-org/gpt-oss-20b-GGUF -- 2025-09-15
MBZUAI releases K2 Think. 32B reasoning model based on Qwen 2.5 32B backbone, focusing on high performance in math, coding and science. -- 2025-09-14
unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF -- 2025-09-14
[vllm] Hints to run Qwen3-235B MoE on 8x AMD mixed cards! -- 2025-09-12
Inference for 24 people with a 5000€ budget -- 2025-09-12
$142 upgrade kit and spare modules turn Nvidia RTX 4090 24GB to 48GB AI card -- 2025-09-12
Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers -- 2025-09-12
Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost -- 2025-09-10
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic -- 2025-09-10
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search -- 2025-09-10
Introducing FineVision: a huge open-source dataset for training SOTA Vision Language Models -- 2025-09-10
wildminder/ComfyUI-VibeVoice -- 2025-09-10
bytedance/USO -- 2025-09-10
Wan-AI/Wan2.2-I2V-A14B -- 2025-09-10
[Editorial] Update from Anthropic regarding their poor perfomance of late -- 2025-09-09
LiquidAI/LFM2-VL-450M -- 2025-09-09
An LLM-powered Natural-to-Robotic Language Translation Framework with Correctness Guarantees -- 2025-09-09
Qwen3 30B A3B 2507 Hybrid Deep Reasoning Showcase -- 2025-09-08
Is the "cost of inference" going up or down? -- 2025-09-08
Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
Voice cloning -- 2025-09-08
haasonsaas/dspy-0to1-guide -- 2025-09-06
16 reproducible failures → upgraded into a 300+ page Global Fix Map. one link inside, feedback wanted -- 2025-09-06
Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
Kwaipilot/KAT-V1-40B -- 2025-09-06
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
nasa-ibm-ai4science/Surya-1.0 -- 2025-09-06
Context Reasoning Benchmarks: GPT-5, Claude, Gemini, Grok on Real Tasks -- 2025-09-05
The CLAUDE.md Framework: A Guide to Structured AI-Assisted Work (prompts included) -- 2025-09-05
Team-intN18-SoybeanSeclab/Typhon -- 2025-09-05
DatarusAI/Datarus-R1-14B-preview -- 2025-09-05
Training & Querying 3 Ollama Models with Zer00logy: Symbolic Cognition Framework and Void-Math OS -- 2025-09-04
I'm building local, open-source, fast, efficient, minimal, and extendible RAG library I always wanted to use -- 2025-09-03
Creating the brain behind dumb models -- 2025-09-03
🌟Introducing Art-0-8B: Reasoning the way you want it to with Adaptive Thinking🌟 -- 2025-09-02
Fine Tune Model for Home Assistant? -- 2025-09-02
DeepSeek V3.1 improves on the multiplayer Step Game social reasoning benchmark -- 2025-08-31
I built Husk, a native, private, and open-source iOS client for your local models -- 2025-08-31
Would a “Knowledge Coverage Audit” tool be useful for RAG/chatbot builders? -- 2025-08-30
baichuan-inc/Baichuan-M2-32B -- 2025-08-28
Hierarchical Reasoning Model (HRM) implementation for text generation -- 2025-08-27
Datarus-R1-14B-Preview, an adaptive multi-step reasoning LLM for automated data analysis -- 2025-08-24
Fully Open source, serverless, community-driven MCP alternative built in Python, TS and Go -- 2025-08-24
unsloth/Kimi-K2-Instruct-GGUF -- 2025-08-24
DeepSeek V3.1 Reasoner improves over DeepSeek R1 on the Extended NYT Connections benchmark -- 2025-08-24
DeepSeek-V3.1 (Thinking and Non Thinking) -- 2025-08-22
Modify <think> to explore the impact on <answer> -- 2025-08-22
Tiny finance “thinking” model (Gemma-3 270M) with verifiable rewards (SFT → GRPO) — structured outputs + auto-eval (with code) -- 2025-08-22
Qwen/Qwen3-30B-A3B-Thinking-2507 -- 2025-08-22
tencent/Hunyuan-7B-Instruct -- 2025-08-22
🐧 llama.cpp on Steam Deck (Ubuntu 25.04) with GPU (Vulkan) — step-by-step that actually works -- 2025-08-22
Running Qwen3-Coder-30B-A3 Q4_LM in Cursor with Agent Mode unlocked -- 2025-08-22
Docker now support AI Models, anyone using it? -- 2025-08-22
Why does gpt-oss 120b run slower in ollama than in LM Studio in my setup? -- 2025-08-22
Speculative decoding in archgw candidate release 0.4.0. Could use feedback, -- 2025-08-16
Nvidia Tilus: A Tile-Level GPU Kernel Programming Language -- 2025-08-16
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model -- 2025-08-16
HoML: vLLM's speed + Ollama like interface -- 2025-08-15
HoML vs. Ollama: A Deep Dive into Performance -- 2025-08-15
Sampler Settings for GLM 4.5-Air -- 2025-08-15
Fully verbal LLM program for OSX using whisper, ollama & XTTS -- 2025-08-15
Closing the Modality Gap for Mixed Modality Search -- 2025-08-07
ByteDance drops Seed-Prover -- 2025-08-06
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B -- 2025-08-06
Context Management by Trimming Conversation -- 2025-08-06
Exploiting Primacy Effect To Improve Large Language Models -- 2025-08-06
[Editorial] HRM -- 2025-08-03
How are people running an MLX-compatible OpenAI API server locally? -- 2025-08-03
I built the perfect MCP client for broke developers (Ollama powered) -- 2025-08-03
character-ai/pipelining-sft -- 2025-08-03
CoexistAI – LLM-Powered Research Assistant (Now with MCP, Vision, Local File Chat, and More) -- 2025-08-02
Best <2B open-source LLMs for European languages? -- 2025-08-02
Local TTS quality -- 2025-08-02
[Editorial] The Anatomy of a Modern LLM -- 2025-07-31
PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
Has vLLM made Ollama and llama.cpp redundant? -- 2025-07-30
[Editorial] Alternative to vector db rag -- 2025-07-30
How are people extracting system prompts? -- 2025-07-29
cherrydra/mcpurl -- 2025-07-29
[Editorial] neural networks don’t need to be giant to be powerful -- 2025-07-27
Qwen/Qwen3-235B-A22B-Thinking-2507 -- 2025-07-27
mistralai/Magistral-Small-2507 -- 2025-07-27
Running Qwen3 235B-A22B 2507 on a Threadripper 3970X + 3x RTX 3090 Machine at 15 tok/s -- 2025-07-25
The Latest GPT-5 Leaks and Teasers -- 2025-07-25
Qwen3-235B-A22B-Thinking-2507 released! -- 2025-07-25
From chaotic prompting to structured workflow: My Claude evolution -- 2025-07-24
Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support -- 2025-07-24
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models -- 2025-07-24
Building an MCP Server and Client with FastMCP 2.0 -- 2025-07-24
Does LLM architecture allow for injecting some more input tokens in the middle of token generation? -- 2025-07-24
Lucy: A Mobile-Capable 1.7B Reasoning Model That Rivals Jan-Nano -- 2025-07-23
microsoft/Phi-4-mini-flash-reasoning -- 2025-07-22
LGAI-EXAONE/EXAONE-4.0-32B -- 2025-07-22
Replacing thinking with tool usage enables reasoning in small language models -- 2025-07-22
A Request for Comments (RFC) for MCP-alternative Universal Tool Calling Protocol (UTCP) was created -- 2025-07-22
How to use the same context across LLMs and Agents -- 2025-07-22
new models from NVIDIA: OpenReasoning-Nemotron 32B/14B/7B/1.5B -- 2025-07-21
OpenAI Places Second Behind Human Coder at AtCoder Progmming Event -- 2025-07-21
HelpingAI/Dhanishtha-2.0-preview -- 2025-07-21
Probing for Arithmetic Errors in Language Models -- 2025-07-21
Struggling to Generate Polished UI with Claude Code -- 2025-07-20
IMO 2025 LLM Mathematical Reasoning Evaluation -- 2025-07-20
A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 -- 2025-07-19
Madness, the ignorant's question. Would it be possible to lighten an LLM model? -- 2025-07-18
Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
OpenAI's open source LLM is a reasoning model, coming Next Thursday! -- 2025-07-14
The BastionRank Showdown: Crowning the Best On-Device AI Models of 2025 -- 2025-07-14
Local Llama with Home Assistant Integration and Multilingual-Fuzzy naming -- 2025-07-14
Podcast generation app -- works with Ollama -- 2025-07-14
support for Jamba hybrid Transformer-Mamba models has been merged into llama.cpp -- 2025-07-13
Asynchronous Robot Inference: Decoupling Action Prediction and Execution -- 2025-07-13
Suggestion: Grayscale-First Hack to Optimize Image Recognition in Grok—Save Compute Without Losing Accuracy? -- 2025-07-13
Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
AGI is not multimodal -- 2025-07-09
How Do Vision-Language Models Process Conflicting Information Across Modalities? -- 2025-07-09
High Precision -- 2025-07-09
skt/A.X-4.0 -- 2025-07-09
Medical language model - for STT and summarize things -- 2025-07-09
Ollama alternatives -- 2025-07-09
Dealing with tool_calls hallucinations -- 2025-07-09
SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model -- 2025-07-07
i made a commit message generator that can be used offline and for free -- 2025-07-05
THUDM/GLM-4.1V-9B-Thinking -- 2025-07-05
baidu/ERNIE-4.5-VL-424B-A47B-Base-Paddle -- 2025-07-05
LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs -- 2025-07-05
skt/A.X-4.0-Light -- 2025-07-04
ChatDOC/OCRFlux-3B -- 2025-07-04
Is there a local model that can solve this text decoding riddle? -- 2025-07-03
Seven replies to the viral Apple reasoning paper and why they fall short -- 2025-07-03
R1-0528 won't stop thinking -- 2025-07-03
Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3 -- 2025-07-02
Hoshinonyaruko/Gensokyo-MCP -- 2025-07-01
THU-KEG/AdaptThink -- 2025-06-28
modelcontextprotocol/registry -- 2025-06-27
Skywork/Skywork-SWE-32B -- 2025-06-25
moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
XiaomiMiMo/MiMo -- 2025-06-22
nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
Menlo/Jan-nano -- 2025-06-22
MiniMax-AI/SynLogic -- 2025-06-15
The Fractured Entangled Representation Hypothesis -- 2025-06-15
mistralai/Magistral-Small-2506_gguf -- 2025-06-14
Ruminate: From All-or-Nothing to Just-Right Reasoning in LLMs -- 2025-06-14
[update] Restructured repo under rvn-tools — modular CLI for LLM formats -- 2025-06-14
Testing Quant Quality for Shisa V2 405B -- 2025-06-14
Old model, new implementation -- 2025-06-14
Ollama vs Llamacpp: Different output for same model -- 2025-06-14
How to improve my ViT model -- 2025-06-14
From RPC to transactions and durable executions -- 2025-06-14
Flattening Rust’s learning curve -- 2025-06-14
Async from scratch 3: Pinned against the wall -- 2025-06-14
How to get the most out of my AMD 7900XT? -- 2025-06-14
typelevel/cats -- 2025-06-11
wesm/pydata-book -- 2025-06-11
lerobot/smolvla_base -- 2025-06-10
jedisct1/openapi-mcp -- 2025-06-09
sarvamai/sarvam-m -- 2025-06-07
Qwen/Qwen3-Reranker-0.6B -- 2025-06-07
arcee-ai/Homunculus -- 2025-06-04
PRIME-RL/Entropy-Mechanism-of-RL -- 2025-06-02
Atlas: Learning to Optimally Memorize the Context at Test Time -- 2025-06-02
Gen-Verse/MMaDA -- 2025-06-01
osmosis-ai/Osmosis-Structure-0.6B -- 2025-06-01
0-1 phase transitions in sparse spiked matrix estimation -- 2025-06-01
0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum -- 2025-06-01
simplescaling/s1 -- 2025-05-31
FractalAIResearch/Fathom-R1-14B -- 2025-05-31
unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF -- 2025-05-31
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B -- 2025-05-31
I made Model Version Control Protocol for AI agents -- 2025-05-31
AI Baby Monitor – fully local Video-LLM nanny (beeps when safety rules are violated) -- 2025-05-31
LMStudio - llama.cpp - vLLM -- 2025-05-31
Built an ADK Agent that finds Jobs based on your Resume -- 2025-05-31
Should I resize the image before sending it to Qwen VL 7B? Would it give better results? -- 2025-05-31
How to start a LLM project? -- 2025-05-31
Beware of Fast-Math -- 2025-05-31