Reasoning Models

Chain of thought, thinking models, math and logic reasoning

362 articles across 115 editions

Articles

  1. An OpenAI model has disproved a central conjecture in discrete geometry -- 2026-05-21
  2. I've joined Anthropic -- 2026-05-21
  3. Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct -- 2026-05-20
  4. llama: avoid copying logits during prompt decode in MTP by am17an - Pull Request #23198 - ggml-org/llama.cpp -- 2026-05-20
  5. Extension idea: llama-server with custom samplers -- 2026-05-20
  6. Simpler self hosted alt to Open WebUI -- 2026-05-20
  7. EMO: Pretraining mixture of experts for emergent modularity -- 2026-05-13
  8. [Editorial] -- 2026-05-13
  9. 2.5x Faster Inference with Qwen 3.6 27B Using MTP — Complete Hardware Guide -- 2026-05-08
  10. Atlas Is Now Open Source — Pure Rust+CUDA Inference Engine for Blackwell -- 2026-05-08
  11. antirez/ds4 — DeepSeek 4 Flash Local Inference Engine for Metal -- 2026-05-08
  12. [Editorial] dflash — Flash Inference Tool -- 2026-05-08
  13. Heretic 1.3: Reproducible Abliterated Models, Integrated Benchmarking, Reduced VRAM -- 2026-05-08
  14. Ryzen AI Max+ 495 (Gorgon Halo) with 192GB VRAM! -- 2026-05-06
  15. noonghunna/club-3090 — Community LLM serving recipes for RTX 3090 -- 2026-05-06
  16. Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB -- 2026-05-06
  17. Decoupled Attention from Weights - Gemma 4 26B -- 2026-05-06
  18. Writing an LLM Compiler from Scratch: PyTorch to CUDA -- 2026-05-04
  19. Karpathy's MicroGPT Running at 50,000 Tokens/Second on an FPGA -- 2026-05-04
  20. Hipfire: Full AMD Architecture Validation Across RDNA 1–4, Strix Halo, and BC250 -- 2026-05-04
  21. Qwen 3.6-35B KV Cache Benchmark: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M Context -- 2026-05-04
  22. llama.cpp DeepSeek v4 Flash experimental inference -- 2026-05-01
  23. llama.cpp benchmark native vs. non native NVFP4 on Blackwell — summary -- 2026-05-01
  24. Speculative decoding with Gemma-4-31B + Gemma-4-E2B enables 120-200 tok/s -- 2026-05-01
  25. The 4B class of 2026 (benchmark) -- 2026-05-01
  26. Study: 2x+ coding performance of 7B model without touching the coding agent -- 2026-05-01
  27. [Editorial] RuVLLM ESP32 v0.3.0-rc2 — LLM Inference on Microcontrollers -- 2026-05-01
  28. Zyphra/ZUNA — New Model on Hugging Face -- 2026-05-01
  29. [Editorial] Schmidhuber: The World Model Boom -- 2026-04-20
  30. [Editorial] Schmidhuber: World Models, Planning & Curiosity (1990 Origins) -- 2026-04-20
  31. [Editorial] The Ontology Problem (Technical) — Kurt Cagle -- 2026-04-20
  32. All elementary functions from a single binary operator -- 2026-04-14
  33. Falcon Perception -- 2026-04-02
  34. TRL v1.0: Post-Training Library Built to Move with the Field -- 2026-04-02
  35. lucas-maes/le-wm -- 2026-04-02
  36. Toward explaining why traditional ablation/abliteration works -- 2026-04-02
  37. The Cognitive Dark Forest -- 2026-03-30
  38. $500K/year pharmacovigilance platform replicated in a weekend with Claude Code -- 2026-03-30
  39. Google Stitch is insane -- 2026-03-30
  40. [Editorial] Time Traveled from Frontier AI -- 2026-03-30
  41. [Editorial] Ambient Intelligence -- 2026-03-30
  42. Controllable Reasoning Models Are Private Thinkers -- 2026-03-26
  43. Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it -- 2026-03-26
  44. Gemini thoughts turned very violent -- 2026-03-26
  45. Towards a Neural Debugger for Python -- 2026-03-26
  46. Mathematics behind extreme quantization of Microsoft's BitNet -- 2026-03-26
  47. A.T.L.A.S - Adaptive Test-time Learning and Autonomous Specialization -- 2026-03-26
  48. [Editorial] -- 2026-03-25
  49. [Editorial] -- 2026-03-25
  50. [Editorial] -- 2026-03-25
  51. [Editorial] -- 2026-03-25
  52. [Editorial] -- 2026-03-25
  53. [Editorial] arxiv:2603.15371 -- 2026-03-23
  54. [Editorial] The Open World / Closed World Conundrum -- 2026-03-16
  55. [Editorial] W3C Context Graph Community Group -- 2026-03-16
  56. [Editorial] LLM vs LRM -- 2026-03-10
  57. To everyone using still ollama/lm-studio... llama-swap is the real deal -- 2026-03-10
  58. [Editorial] Markov Chains -- 2026-03-10
  59. [Editorial] Speeding One Cog Breaks the Machine -- 2026-03-10
  60. [Editorial] KatanaLarp -- 2026-03-07
  61. [Editorial] Research Paper -- 2026-03-07
  62. [Editorial] The Builders PRD -- 2026-03-07
  63. [Editorial] Our Design Docs Write Themselves -- 2026-03-05
  64. [Editorial] Claude Code: Brilliant Until the Repo... -- 2026-03-05
  65. [Editorial] Context Is King, But Bad Context Is Poison -- 2026-03-05
  66. [Editorial] Semantic Anchors for LLM Coding -- 2026-03-05
  67. Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers -- 2026-02-25
  68. O(1) Inference and Causal Monoid State Compression in Spartacus-1B -- 2026-02-25
  69. Sink-Aware Pruning for Diffusion Language Models -- 2026-02-25
  70. Consistency of Large Reasoning Models Under Multi-Turn Attacks -- 2026-02-16
  71. [Editorial] AI Testing and Quality Engineering -- 2026-02-16
  72. [Editorial] https://github.com/GMaN1911/claude-cognitive -- 2026-01-02
  73. SA-RAG: Using spreading activation to improve multi-hop retrieval in RAG systems -- 2026-01-02
  74. Is there a way to see what is trashing my context? -- 2026-01-02
  75. A zero-setup agent that benchmarks multiple open / closed source LLMs on your specific problem / data -- 2026-01-02
  76. What is a good model for assisting with patching source code? -- 2026-01-02
  77. Just got an RTX Pro 6000 - need recommendations for processing a massive dataset with instruction following -- 2026-01-02
  78. MiniMaxAI/MiniMax-M2.1 -- 2026-01-02
  79. [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_realist-and-pluralist-conceptions-of-intelligence-activity-7397231918871703554-FmSP -- 2025-12-11
  80. VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection -- 2025-12-11
  81. Nanbeige4-3B: Lightweight with strong reasoning capabilities -- 2025-12-10
  82. mistralai/Devstral-2-123B-Instruct-2512 -- 2025-12-10
  83. MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
  84. PrimeIntellect/INTELLECT-3 -- 2025-12-04
  85. cerebras/MiniMax-M2-REAP-162B-A10B -- 2025-12-04
  86. 62-day fixed-prompt probe on Grok-4: strong semantic attractors, thematic inversion, and refusal onset (1,242 samples, fully public) -- 2025-12-03
  87. I built an open-source "Passport" for Claude Agents (MCP) so they can cryptographically sign their own actions -- 2025-12-01
  88. Implemented Anthropic's Programmatic Tool Calling with Langchain so you can use it with any models and tune it for your own use case -- 2025-12-01
  89. CodeModeToon -- 2025-12-01
  90. WeiboAI/VibeThinker-1.5B -- 2025-11-28
  91. [Editorial] https://ai.google.dev/gemini-api/docs/prompting-strategies#agentic-si-template -- 2025-11-28
  92. An explainer blog on attention, KV-caching, continuous batching -- 2025-11-28
  93. I built an open-source CLI that generates context.json bundles for React/TypeScript projects -- 2025-11-28
  94. GraphLite: An Embeddable Graph Database with ISO Graph Query Language Support -- 2025-11-26
  95. allenai/Olmo-3-32B-Think -- 2025-11-25
  96. tencent/HunyuanOCR -- 2025-11-25
  97. peteromallet/Qwen-Image-Edit-InScene -- 2025-11-25
  98. [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_decision-making-amid-information-based-threats-activity-7396539314815533056-4pVx -- 2025-11-18
  99. [Editorial] https://gist.github.com/ruvnet/d6d2739400943037443b78c3ef86d8a5 -- 2025-11-18
  100. [Editorial] https://github.com/mrwadams/stride-gpt/blob/master/docs/operationalization-guide.md -- 2025-11-18
  101. janhq/Jan-v2-VL-high -- 2025-11-18
  102. Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers -- 2025-11-18
  103. [Editorial] https://arxiv.org/pdf/2506.21734 -- 2025-11-11
  104. OPERA: A Reinforcement Learning--Enhanced Orchestrated Planner-Executor Architecture for Reasoning-Oriented Multi-Hop Retrieval -- 2025-11-11
  105. [Editorial] https://www.linkedin.com/posts/andriyburkov_this-paper-shows-a-27-million-parameter-model-activity-7393432619365052416-SFLO -- 2025-11-10
  106. Trajectory Distillation for Foundation Models -- 2025-11-10
  107. sail-sg/Precision-RL -- 2025-11-10
  108. inclusionAI/LLaDA2.0-flash-preview -- 2025-11-10
  109. AI Agents Reasoning Collapse Imminent (CMU, Berkeley) -- 2025-11-02
  110. Natural Language Programming: Run Natural Language as Script -- 2025-11-02
  111. Claude Code is a Beast – Tips from 6 Months of Hardcore Use -- 2025-11-02
  112. Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs -- 2025-10-31
  113. [Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc -- 2025-10-31
  114. Raezil/lattice-agent -- 2025-10-31
  115. driaforall/mem-agent -- 2025-10-31
  116. Memp: Exploring Agent Procedural Memory -- 2025-10-31
  117. I built the HuggingChat Omni Router 🥳 🎈 -- 2025-10-28
  118. Claude Code 2.0.27 -- 2025-10-28
  119. ltjed/freephdlabor -- 2025-10-28
  120. Show HN: Whatdidido – CLI to summarize your work from Jira/Linear -- 2025-10-28
  121. Reasoning should be thought of as a drawback, not a feature -- 2025-10-21
  122. inclusionAI/Ring-1T-preview -- 2025-10-21
  123. Learning Lifted Action Models From Traces of Incomplete Actions and States -- 2025-10-20
  124. I got tired of OpenAI dependency. Built a multi-LLM control center instead. -- 2025-10-19
  125. Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
  126. Claude Code taking a coffee break 🤔 -- 2025-10-19
  127. Show HN: Cmux – Coding Agent Multiplexer -- 2025-10-19
  128. [Editorial] Sqlite vector -- 2025-10-17
  129. Meta Superintelligence group publishes paper on new RAG technique -- 2025-10-17
  130. [Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
  131. [Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
  132. I tested if tiny LLMs can self-improve through memory: Qwen3-1.7B gained +8% accuracy on MATH problems -- 2025-10-16
  133. Tested 9 RAG query transformation techniques – HydE is absurdly underrated -- 2025-10-16
  134. GPT-OSS from Scratch on AMD GPUs -- 2025-10-11
  135. How do I compare cost per token for serverless vs provisioned hardware? -- 2025-10-11
  136. OpenAI is good at deals -- 2025-10-11
  137. meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
  138. adb1274/batchi -- 2025-10-11
  139. What are the best models for legal work in Oct 2025? -- 2025-10-07
  140. [Update] FamilyBench: New models tested - Claude Sonnet 4.5 takes 2nd place, Qwen 3 Next breaks 70%, new Kimi weirdly below the old version, same for GLM 4.6 -- 2025-10-07
  141. princeton-pli/RLMT -- 2025-10-07
  142. TGPO: Tree-Guided Preference Optimization for Robust Web Agent Reinforcement Learning -- 2025-10-07
  143. DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models -- 2025-10-05
  144. swiss-ai/Apertus-8B-2509 -- 2025-10-04
  145. Qwen3-Omni thinking model running on local H100 (major leap over 2.5) -- 2025-09-30
  146. Seeking Advice: Best Model + Framework for Max Tokens/sec on Dual L40S (Testing Rig) -- 2025-09-30
  147. For local models, has anyone benchmarked tool calling protocols performance? -- 2025-09-30
  148. A step by step guide on how to build a LLM from scratch -- 2025-09-28
  149. YannQi/R-4B -- 2025-09-28
  150. New Agent benchmark from Meta Super Intelligence Lab and Hugging Face -- 2025-09-27
  151. evalops/dspy-micro-agent -- 2025-09-27
  152. nvidia/NVIDIA-Nemotron-Nano-9B-v2 -- 2025-09-27
  153. inclusionAI/Ling-flash-2.0 -- 2025-09-27
  154. Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model -- 2025-09-26
  155. CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation -- 2025-09-26
  156. A1: Asynchronous Test-Time Scaling via Conformal Prediction -- 2025-09-25
  157. DeepLink-org/DeepTrace -- 2025-09-25
  158. GLM 4.5 Air Template Breaking llamacpp Prompt Caching -- 2025-09-25
  159. Tracking prompt evolution for RAG systems - anyone else doing this? -- 2025-09-25
  160. MAESTRO v0.1.6 Update: Better support for models that struggle with JSON mode (DeepSeek, Kimi K2, etc.) -- 2025-09-25
  161. Dead-simple example code for Ollama function calling. -- 2025-09-25
  162. nvidia/NVIDIA-Nemotron-Nano-12B-v2 -- 2025-09-23
  163. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning -- 2025-09-22
  164. Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward -- 2025-09-22
  165. support for the upcoming Olmo3 model has been merged into llama.cpp -- 2025-09-21
  166. Running Nvidia CUDA Pytorch/vLLM projects and pipelines on AMD with no modifications -- 2025-09-21
  167. A Quick Look At The AMD Instinct MI355X With ROCm 7.0 -- 2025-09-21
  168. Uncensored AI model for from 4b Max 8b -- 2025-09-21
  169. GPT-OSS-120B Performance Benchmarks and Provider Trade-Offs -- 2025-09-20
  170. Why are there three different Codex variants? -- 2025-09-20
  171. zli12321/Vision-SR1 -- 2025-09-19
  172. lrzjason/Comfyui-QwenEditUtils -- 2025-09-19
  173. Mini-o3/Mini-o3 -- 2025-09-19
  174. vLLM is kinda awesome -- 2025-09-19
  175. Public AI on Hugging Face Inference Providers 🔥 -- 2025-09-19
  176. HyST: LLM-Powered Hybrid Retrieval over Semi-Structured Tabular Data -- 2025-09-19
  177. GPT-OSS:20b & Qwen 4b are a match made in heaven for 24GB VRAM builds -- 2025-09-18
  178. Was working in RAG recently got to know how well Gemma3 4B performs -- 2025-09-18
  179. [Editorial] which patterns truly survived compression -- 2025-09-16
  180. [Editorial] AI Kill Chain -- 2025-09-16
  181. TsinghuaC3I/Unify-Post-Training -- 2025-09-16
  182. [Editorial] Tricks from OpenAI gpt-oss YOU can use with transformers -- 2025-09-15
  183. openbmb/MiniCPM4.1-8B -- 2025-09-15
  184. nunchaku-tech/nunchaku-qwen-image -- 2025-09-15
  185. ggml-org/gpt-oss-20b-GGUF -- 2025-09-15
  186. MBZUAI releases K2 Think. 32B reasoning model based on Qwen 2.5 32B backbone, focusing on high performance in math, coding and science. -- 2025-09-14
  187. unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF -- 2025-09-14
  188. [vllm] Hints to run Qwen3-235B MoE on 8x AMD mixed cards! -- 2025-09-12
  189. Inference for 24 people with a 5000€ budget -- 2025-09-12
  190. $142 upgrade kit and spare modules turn Nvidia RTX 4090 24GB to 48GB AI card -- 2025-09-12
  191. Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers -- 2025-09-12
  192. Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost -- 2025-09-10
  193. Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic -- 2025-09-10
  194. Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search -- 2025-09-10
  195. Introducing FineVision: a huge open-source dataset for training SOTA Vision Language Models -- 2025-09-10
  196. wildminder/ComfyUI-VibeVoice -- 2025-09-10
  197. bytedance/USO -- 2025-09-10
  198. Wan-AI/Wan2.2-I2V-A14B -- 2025-09-10
  199. [Editorial] Update from Anthropic regarding their poor perfomance of late -- 2025-09-09
  200. LiquidAI/LFM2-VL-450M -- 2025-09-09
  201. An LLM-powered Natural-to-Robotic Language Translation Framework with Correctness Guarantees -- 2025-09-09
  202. Qwen3 30B A3B 2507 Hybrid Deep Reasoning Showcase -- 2025-09-08
  203. Is the "cost of inference" going up or down? -- 2025-09-08
  204. Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
  205. UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
  206. Voice cloning -- 2025-09-08
  207. haasonsaas/dspy-0to1-guide -- 2025-09-06
  208. 16 reproducible failures → upgraded into a 300+ page Global Fix Map. one link inside, feedback wanted -- 2025-09-06
  209. Show HN: Entropy-Guided Loop – How to make small models reason -- 2025-09-06
  210. Kwaipilot/KAT-V1-40B -- 2025-09-06
  211. Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning -- 2025-09-06
  212. nasa-ibm-ai4science/Surya-1.0 -- 2025-09-06
  213. Context Reasoning Benchmarks: GPT-5, Claude, Gemini, Grok on Real Tasks -- 2025-09-05
  214. The CLAUDE.md Framework: A Guide to Structured AI-Assisted Work (prompts included) -- 2025-09-05
  215. Team-intN18-SoybeanSeclab/Typhon -- 2025-09-05
  216. DatarusAI/Datarus-R1-14B-preview -- 2025-09-05
  217. Training & Querying 3 Ollama Models with Zer00logy: Symbolic Cognition Framework and Void-Math OS -- 2025-09-04
  218. I'm building local, open-source, fast, efficient, minimal, and extendible RAG library I always wanted to use -- 2025-09-03
  219. Creating the brain behind dumb models -- 2025-09-03
  220. 🌟Introducing Art-0-8B: Reasoning the way you want it to with Adaptive Thinking🌟 -- 2025-09-02
  221. Fine Tune Model for Home Assistant? -- 2025-09-02
  222. DeepSeek V3.1 improves on the multiplayer Step Game social reasoning benchmark -- 2025-08-31
  223. I built Husk, a native, private, and open-source iOS client for your local models -- 2025-08-31
  224. Would a “Knowledge Coverage Audit” tool be useful for RAG/chatbot builders? -- 2025-08-30
  225. baichuan-inc/Baichuan-M2-32B -- 2025-08-28
  226. Hierarchical Reasoning Model (HRM) implementation for text generation -- 2025-08-27
  227. Datarus-R1-14B-Preview, an adaptive multi-step reasoning LLM for automated data analysis -- 2025-08-24
  228. Fully Open source, serverless, community-driven MCP alternative built in Python, TS and Go -- 2025-08-24
  229. unsloth/Kimi-K2-Instruct-GGUF -- 2025-08-24
  230. DeepSeek V3.1 Reasoner improves over DeepSeek R1 on the Extended NYT Connections benchmark -- 2025-08-24
  231. DeepSeek-V3.1 (Thinking and Non Thinking) -- 2025-08-22
  232. Modify <think> to explore the impact on <answer> -- 2025-08-22
  233. Tiny finance “thinking” model (Gemma-3 270M) with verifiable rewards (SFT → GRPO) — structured outputs + auto-eval (with code) -- 2025-08-22
  234. Qwen/Qwen3-30B-A3B-Thinking-2507 -- 2025-08-22
  235. tencent/Hunyuan-7B-Instruct -- 2025-08-22
  236. 🐧 llama.cpp on Steam Deck (Ubuntu 25.04) with GPU (Vulkan) — step-by-step that actually works -- 2025-08-22
  237. Running Qwen3-Coder-30B-A3 Q4_LM in Cursor with Agent Mode unlocked -- 2025-08-22
  238. Docker now support AI Models, anyone using it? -- 2025-08-22
  239. Why does gpt-oss 120b run slower in ollama than in LM Studio in my setup? -- 2025-08-22
  240. Speculative decoding in archgw candidate release 0.4.0. Could use feedback, -- 2025-08-16
  241. Nvidia Tilus: A Tile-Level GPU Kernel Programming Language -- 2025-08-16
  242. SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model -- 2025-08-16
  243. HoML: vLLM's speed + Ollama like interface -- 2025-08-15
  244. HoML vs. Ollama: A Deep Dive into Performance -- 2025-08-15
  245. Sampler Settings for GLM 4.5-Air -- 2025-08-15
  246. Fully verbal LLM program for OSX using whisper, ollama & XTTS -- 2025-08-15
  247. Closing the Modality Gap for Mixed Modality Search -- 2025-08-07
  248. ByteDance drops Seed-Prover -- 2025-08-06
  249. naver-hyperclovax/HyperCLOVAX-SEED-Think-14B -- 2025-08-06
  250. Context Management by Trimming Conversation -- 2025-08-06
  251. Exploiting Primacy Effect To Improve Large Language Models -- 2025-08-06
  252. [Editorial] HRM -- 2025-08-03
  253. How are people running an MLX-compatible OpenAI API server locally? -- 2025-08-03
  254. I built the perfect MCP client for broke developers (Ollama powered) -- 2025-08-03
  255. character-ai/pipelining-sft -- 2025-08-03
  256. CoexistAI – LLM-Powered Research Assistant (Now with MCP, Vision, Local File Chat, and More) -- 2025-08-02
  257. Best <2B open-source LLMs for European languages? -- 2025-08-02
  258. Local TTS quality -- 2025-08-02
  259. [Editorial] The Anatomy of a Modern LLM -- 2025-07-31
  260. PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
  261. Has vLLM made Ollama and llama.cpp redundant? -- 2025-07-30
  262. [Editorial] Alternative to vector db rag -- 2025-07-30
  263. How are people extracting system prompts? -- 2025-07-29
  264. cherrydra/mcpurl -- 2025-07-29
  265. [Editorial] neural networks don’t need to be giant to be powerful -- 2025-07-27
  266. Qwen/Qwen3-235B-A22B-Thinking-2507 -- 2025-07-27
  267. mistralai/Magistral-Small-2507 -- 2025-07-27
  268. Running Qwen3 235B-A22B 2507 on a Threadripper 3970X + 3x RTX 3090 Machine at 15 tok/s -- 2025-07-25
  269. The Latest GPT-5 Leaks and Teasers -- 2025-07-25
  270. Qwen3-235B-A22B-Thinking-2507 released! -- 2025-07-25
  271. From chaotic prompting to structured workflow: My Claude evolution -- 2025-07-24
  272. Never Come Up Empty: Adaptive HyDE Retrieval for Improving LLM Developer Support -- 2025-07-24
  273. MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models -- 2025-07-24
  274. Building an MCP Server and Client with FastMCP 2.0 -- 2025-07-24
  275. Does LLM architecture allow for injecting some more input tokens in the middle of token generation? -- 2025-07-24
  276. Lucy: A Mobile-Capable 1.7B Reasoning Model That Rivals Jan-Nano -- 2025-07-23
  277. microsoft/Phi-4-mini-flash-reasoning -- 2025-07-22
  278. LGAI-EXAONE/EXAONE-4.0-32B -- 2025-07-22
  279. Replacing thinking with tool usage enables reasoning in small language models -- 2025-07-22
  280. A Request for Comments (RFC) for MCP-alternative Universal Tool Calling Protocol (UTCP) was created -- 2025-07-22
  281. How to use the same context across LLMs and Agents -- 2025-07-22
  282. new models from NVIDIA: OpenReasoning-Nemotron 32B/14B/7B/1.5B -- 2025-07-21
  283. OpenAI Places Second Behind Human Coder at AtCoder Progmming Event -- 2025-07-21
  284. HelpingAI/Dhanishtha-2.0-preview -- 2025-07-21
  285. Probing for Arithmetic Errors in Language Models -- 2025-07-21
  286. Struggling to Generate Polished UI with Claude Code -- 2025-07-20
  287. IMO 2025 LLM Mathematical Reasoning Evaluation -- 2025-07-20
  288. A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1 -- 2025-07-19
  289. Madness, the ignorant's question. Would it be possible to lighten an LLM model? -- 2025-07-18
  290. Open source and free iOS app to chat with your LLMs when you are away from home. -- 2025-07-16
  291. Requirements and architecture for a good enough model with scientific papers RAG -- 2025-07-16
  292. Excited to share updates to Open WebUI Starter! New docs, Docker support, and templates for everyone -- 2025-07-16
  293. OpenAI's open source LLM is a reasoning model, coming Next Thursday! -- 2025-07-14
  294. The BastionRank Showdown: Crowning the Best On-Device AI Models of 2025 -- 2025-07-14
  295. Local Llama with Home Assistant Integration and Multilingual-Fuzzy naming -- 2025-07-14
  296. Podcast generation app -- works with Ollama -- 2025-07-14
  297. support for Jamba hybrid Transformer-Mamba models has been merged into llama.cpp -- 2025-07-13
  298. Asynchronous Robot Inference: Decoupling Action Prediction and Execution -- 2025-07-13
  299. Suggestion: Grayscale-First Hack to Optimize Image Recognition in Grok—Save Compute Without Losing Accuracy? -- 2025-07-13
  300. Upskill your LLMs with Gradio MCP Servers -- 2025-07-09
  301. AGI is not multimodal -- 2025-07-09
  302. How Do Vision-Language Models Process Conflicting Information Across Modalities? -- 2025-07-09
  303. High Precision -- 2025-07-09
  304. skt/A.X-4.0 -- 2025-07-09
  305. Medical language model - for STT and summarize things -- 2025-07-09
  306. Ollama alternatives -- 2025-07-09
  307. Dealing with tool_calls hallucinations -- 2025-07-09
  308. SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model -- 2025-07-07
  309. i made a commit message generator that can be used offline and for free -- 2025-07-05
  310. THUDM/GLM-4.1V-9B-Thinking -- 2025-07-05
  311. baidu/ERNIE-4.5-VL-424B-A47B-Base-Paddle -- 2025-07-05
  312. LoRA Fine-Tuning Without GPUs: A CPU-Efficient Meta-Generation Framework for LLMs -- 2025-07-05
  313. skt/A.X-4.0-Light -- 2025-07-04
  314. ChatDOC/OCRFlux-3B -- 2025-07-04
  315. Is there a local model that can solve this text decoding riddle? -- 2025-07-03
  316. Seven replies to the viral Apple reasoning paper and why they fall short -- 2025-07-03
  317. R1-0528 won't stop thinking -- 2025-07-03
  318. Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3 -- 2025-07-02
  319. Hoshinonyaruko/Gensokyo-MCP -- 2025-07-01
  320. THU-KEG/AdaptThink -- 2025-06-28
  321. modelcontextprotocol/registry -- 2025-06-27
  322. Skywork/Skywork-SWE-32B -- 2025-06-25
  323. moonshotai/Kimi-VL-A3B-Thinking-2506 -- 2025-06-25
  324. POLARIS-Project/Polaris-4B-Preview -- 2025-06-25
  325. XiaomiMiMo/MiMo -- 2025-06-22
  326. nvidia/AceReason-Nemotron-1.1-7B -- 2025-06-22
  327. Menlo/Jan-nano -- 2025-06-22
  328. MiniMax-AI/SynLogic -- 2025-06-15
  329. The Fractured Entangled Representation Hypothesis -- 2025-06-15
  330. mistralai/Magistral-Small-2506_gguf -- 2025-06-14
  331. Ruminate: From All-or-Nothing to Just-Right Reasoning in LLMs -- 2025-06-14
  332. [update] Restructured repo under rvn-tools — modular CLI for LLM formats -- 2025-06-14
  333. Testing Quant Quality for Shisa V2 405B -- 2025-06-14
  334. Old model, new implementation -- 2025-06-14
  335. Ollama vs Llamacpp: Different output for same model -- 2025-06-14
  336. How to improve my ViT model -- 2025-06-14
  337. From RPC to transactions and durable executions -- 2025-06-14
  338. Flattening Rust’s learning curve -- 2025-06-14
  339. Async from scratch 3: Pinned against the wall -- 2025-06-14
  340. How to get the most out of my AMD 7900XT? -- 2025-06-14
  341. typelevel/cats -- 2025-06-11
  342. wesm/pydata-book -- 2025-06-11
  343. lerobot/smolvla_base -- 2025-06-10
  344. jedisct1/openapi-mcp -- 2025-06-09
  345. sarvamai/sarvam-m -- 2025-06-07
  346. Qwen/Qwen3-Reranker-0.6B -- 2025-06-07
  347. arcee-ai/Homunculus -- 2025-06-04
  348. PRIME-RL/Entropy-Mechanism-of-RL -- 2025-06-02
  349. Atlas: Learning to Optimally Memorize the Context at Test Time -- 2025-06-02
  350. Gen-Verse/MMaDA -- 2025-06-01
  351. osmosis-ai/Osmosis-Structure-0.6B -- 2025-06-01
  352. 0-1 phase transitions in sparse spiked matrix estimation -- 2025-06-01
  353. 0-Step Capturability, Motion Decomposition and Global Feedback Control of the 3D Variable Height-Inverted Pendulum -- 2025-06-01
  354. simplescaling/s1 -- 2025-05-31
  355. FractalAIResearch/Fathom-R1-14B -- 2025-05-31
  356. unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF -- 2025-05-31
  357. deepseek-ai/DeepSeek-R1-0528-Qwen3-8B -- 2025-05-31
  358. I made Model Version Control Protocol for AI agents -- 2025-05-31
  359. AI Baby Monitor – fully local Video-LLM nanny (beeps when safety rules are violated) -- 2025-05-31
  360. LMStudio - llama.cpp - vLLM -- 2025-05-31
  361. Built an ADK Agent that finds Jobs based on your Resume -- 2025-05-31
  362. Should I resize the image before sending it to Qwen VL 7B? Would it give better results? -- 2025-05-31
  363. How to start a LLM project? -- 2025-05-31
  364. Beware of Fast-Math -- 2025-05-31