Model Architecture

Transformers, mixture of experts, attention mechanisms, model design

519 articles across 144 editions

Articles

  1. Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking -- 2026-04-09
  2. Finally Abliterated Sarvam 30B and 105B! -- 2026-04-09
  3. Qwen3.6-Plus -- 2026-04-07
  4. Omnivoice - 600+ Language Open-Source TTS with Voice Cloning and Design -- 2026-04-07
  5. [Editorial] Generative Models and High-Quality Output -- 2026-04-07
  6. Falcon Perception -- 2026-04-02
  7. TRL v1.0: Post-Training Library Built to Move with the Field -- 2026-04-02
  8. lucas-maes/le-wm -- 2026-04-02
  9. Toward explaining why traditional ablation/abliteration works -- 2026-04-02
  10. [Editorial] ramimac/teampcp -- 2026-03-27
  11. This Week in Security: Second Verse, Worse Than the First -- 2026-03-27
  12. Towards a Neural Debugger for Python -- 2026-03-26
  13. Mathematics behind extreme quantization of Microsoft's BitNet -- 2026-03-26
  14. A.T.L.A.S - Adaptive Test-time Learning and Autonomous Specialization -- 2026-03-26
  15. Show HN: Three new Kitten TTS models – smallest less than 25MB -- 2026-03-23
  16. Flash-MoE: Running a 397B Parameter Model on a Laptop -- 2026-03-23
  17. [Editorial] Distillation Techniques -- 2026-03-23
  18. Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled-GGUF -- 2026-03-23
  19. Local Qwen 8B + 4B completes browser automation by replanning one step at a time -- 2026-03-23
  20. Does imatrix calibration data affect writing style? I ran a blind-scored experiment -- 2026-03-23
  21. Memory Chip Crunch to Persist Until 2030, SK Hynix Chairman Says -- 2026-03-19
  22. Meta's renewed commitment to jemalloc -- 2026-03-19
  23. We all had p2p wrong with vllm so I rtfm -- 2026-03-19
  24. [Editorial] LLM Architecture Gallery -- 2026-03-17
  25. Mistral small 4 PR on transformers. -- 2026-03-17
  26. I spent a weekend doing layer surgery on 6 different model architectures. There's a "danger zone" at ~50-56% depth. -- 2026-03-17
  27. Nemotron 3 Super and the no free lunch problem -- 2026-03-17
  28. Timber: AOT Compiler Turns XGBoost/sklearn/ONNX into Native C99 — 336x Faster -- 2026-03-13
  29. [Editorial] Living on the Edge: Advancements in Model Deployment -- 2026-03-13
  30. [Editorial] Translating a PDF While Preserving Layout -- 2026-03-13
  31. Heretic Defeats GPT-OSS with Arbitrary-Rank Ablation (ARA) Decensoring -- 2026-03-11
  32. Fat Fish — A Proper Upscale and Prune of Mistral Nemo -- 2026-03-11
  33. Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 — Community Benchmarking Infrastructure -- 2026-03-11
  34. Yann LeCun's AMI Labs Raises $1.03 Billion to Build World Models -- 2026-03-11
  35. [Editorial] Bell Labs Solved Prompt Injection in 1976 -- 2026-02-27
  36. [Editorial] GitHub Copilot Exploited -- 2026-02-27
  37. [Editorial] Stop Telling Your AI Agent What Not to Do -- 2026-02-27
  38. [Editorial] AI MCP Integration Patterns -- 2026-02-27
  39. Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers -- 2026-02-25
  40. O(1) Inference and Causal Monoid State Compression in Spartacus-1B -- 2026-02-25
  41. Sink-Aware Pruning for Diffusion Language Models -- 2026-02-25
  42. Liquid AI releases LFM2-24B-A2B -- 2026-02-24
  43. Qwen3's most underrated feature: Voice embeddings -- 2026-02-24
  44. After many contributions craft, Crane now officially supports Qwen3-TTS! -- 2026-02-24
  45. We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file. -- 2026-02-24
  46. [Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
  47. [Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
  48. [Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
  49. Pros/Cons and use case for bypassing permissions -- 2026-02-12
  50. [Editorial] https://www.kylerush.org/posts/opus-4-5-really-changed-things -- 2026-02-10
  51. [Editorial] https://www.linkedin.com/posts/ownyourai_i-taught-my-claude-code-to-swallow-a-32m-activity-7426902541868728321-2Z36 -- 2026-02-10
  52. [Editorial] https://docs.google.com/document/d/1I9r21TyQuAO1y2ecztBU0PSCpjHSL_vZJiA5v276Wro/mobilebasic -- 2026-02-10
  53. [Editorial] https://www.linkedin.com/posts/cole-medin-727752184_claude-codes-new-agent-teams-feature-is-share-7426633806792609792-pT3s -- 2026-02-10
  54. [Tool] claude-config-sync: Sync your Claude Code configuration across machines using GitHub Gists -- 2026-02-10
  55. [Editorial] https://www.linkedin.com/posts/ownyourai_i-just-woke-up-to-qwen3-coder-next-80b-activity-7424703876240695297-Nlqf -- 2026-02-04
  56. LiquidAI/LFM2.5-1.2B-Thinking -- 2026-02-04
  57. ByteDance-Seed/Stable-DiffCoder-8B-Instruct · Hugging Face -- 2026-02-03
  58. tencent/Youtu-VL-4B-Instruct -- 2026-02-03
  59. NousResearch/NousCoder-14B -- 2026-02-03
  60. transformers v5 final is out 🔥 -- 2026-01-30
  61. PaddlePaddle/PaddleOCR-VL-1.5 -- 2026-01-30
  62. zai-org/GLM-4.7 -- 2026-01-30
  63. MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models -- 2026-01-30
  64. Sharing my set of distilled small language models (3B) + training data in more than 50 low-resource languages -- 2026-01-30
  65. Introducing Kimi K2.5, Open-Source Visual Agentic Intelligence -- 2026-01-28
  66. ~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons? -- 2026-01-28
  67. openbmb/AgentCPM-Report -- 2026-01-28
  68. Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice -- 2026-01-28
  69. Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation -- 2026-01-28
  70. One Year Since the “DeepSeek Moment” -- 2026-01-28
  71. Qwen/Qwen-Image-2512 -- 2026-01-28
  72. [Editorial] https://www.linkedin.com/posts/ownyourai_deepseek-just-released-the-first-vision-ai-activity-7421818927657385987-V1yo -- 2026-01-27
  73. Unsloth announces support for finetuning embedding models -- 2026-01-27
  74. Qwen/Qwen3-TTS-12Hz-0.6B-Base -- 2026-01-27
  75. Qwen/Qwen3-VL-Reranker-2B -- 2026-01-27
  76. [Editorial] https://www.linkedin.com/posts/ivandj_early-claims-around-self-evolving-memory-activity-7421307316437676033-l0Jm -- 2026-01-26
  77. [Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-world-model-activity-7421556928910290944-cx4v -- 2026-01-26
  78. stepfun-ai/Step3-VL-10B -- 2026-01-26
  79. Qwen/Qwen3-VL-Embedding-2B -- 2026-01-21
  80. Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2026-01-21
  81. Qwen/Qwen3-VL-Reranker-8B -- 2026-01-21
  82. Bartowski comes through again. GLM 4.7 flash GGUF -- 2026-01-21
  83. Step-Audio-R1.1 (Open Weight) by StepFun just set a new SOTA on the Artificial Analysis Speech Reasoning leaderboard -- 2026-01-20
  84. GLM-4.7-Flash -- 2026-01-20
  85. stepfun-ai/Step-Audio-R1.1 -- 2026-01-20
  86. tencent/HY-MT1.5-1.8B -- 2026-01-20
  87. Introducing GLM-Image -- 2026-01-14
  88. GPT-OSS -> MLA conversion breakthrough (20B), still looking for compute + collaborators -- 2026-01-14
  89. FrogBoss 32B and FrogMini 14B from Microsoft -- 2026-01-14
  90. Qwen/Qwen3-VL-Embedding-8B -- 2026-01-14
  91. MCP for Financial Ontology! -- 2026-01-09
  92. A community index for MCPs that don’t disappear after the thread ends -- 2026-01-09
  93. tencent/HY-WorldPlay -- 2026-01-09
  94. meituan-longcat/LongCat-Image -- 2026-01-09
  95. Introducing Falcon H1R 7B -- 2026-01-09
  96. [Editorial] https://www.linkedin.com/posts/ernst-van-gassen-9196a7b5_we-spend-a-lot-of-time-trying-to-make-prompts-ugcPost-7414966440023482368-dgh5 -- 2026-01-09
  97. [Editorial] https://leakhub.ai/ -- 2026-01-09
  98. [Editorial] https://www.linkedin.com/posts/hardmaru_survival-of-the-fittest-code-blog-https-activity-7415068590485458944-3tqP -- 2026-01-09
  99. A closer look at a BGP anomaly in Venezuela -- 2026-01-09
  100. Show HN: I visualized the entire history of Citi Bike in the browser -- 2026-01-09
  101. Modifying a QingPing Air Quality Monitor for Local MQTT Access -- 2026-01-09
  102. [Editorial] https://github.com/hiyouga/LlamaFactory -- 2026-01-07
  103. Tongyi-MAI/MAI-UI-8B · Hugging Face -- 2026-01-07
  104. MultiverseComputingCAI/HyperNova-60B · Hugging Face -- 2026-01-07
  105. [Editorial] https://www.alwaysfurther.ai/blog/train-4b-model-to-beat-claude-sonnet-gemini -- 2026-01-05
  106. [Experimental] Gemma 3 4B - Dark CoT: Pushing 4B Reasoning to 33%+ on GPQA Diamond -- 2026-01-05
  107. Youtu-LLM-2B-GGUF is here! -- 2026-01-05
  108. upstage/Solar-Open-100B -- 2026-01-05
  109. EditMGT — fast, localized image editing with Masked Generative Transformers -- 2025-12-30
  110. Francis-Rings/FlashPortrait -- 2025-12-30
  111. zai-org/GLM-TTS -- 2025-12-30
  112. Flowception: Temporally Expansive Flow Matching for Video Generation -- 2025-12-30
  113. I've been experimenting with SLM's a lot recently. My goal was to prove even SLMs can be accurate with the right architecture behind it. -- 2025-12-22
  114. MBZUAI releases K2-V2 - 70B fully open model. -- 2025-12-22
  115. microsoft/Fara-7B -- 2025-12-22
  116. Alibaba Tongyi Open Sources Two Audio Models: Fun-CosyVoice 3.0 (TTS) and Fun-ASR-Nano-2512 (ASR) -- 2025-12-19
  117. My professor lent me an A6000, so I tried to build a coding model. Here is Anni! (Qwen3-14B Fine-tune) -- 2025-12-19
  118. Two years ago, I was just a math major. Now I've built the 1.5B router model used by HuggingFace. Can I bring it to Cursor? -- 2025-12-19
  119. Independent evaluation of GPT5.2 on SWE-bench: 5.2 high is #3 behind Gemini, 5.2 medium behind Sonnet 4.5 -- 2025-12-16
  120. Did I overhype Claude Code? GPT + Comet are quietly beating it for me -- 2025-12-16
  121. 🚀 New: Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B -- 2025-12-16
  122. ByteDance/Dolphin-v2 -- 2025-12-16
  123. zai-org/AutoGLM-Phone-9B-Multilingual -- 2025-12-16
  124. [Editorial] https://www.linkedin.com/posts/eric-vyacheslav-156273169_a-7m-model-just-surpassed-deepseek-r1-gemini-activity-7405985266043297792-s1Jn -- 2025-12-15
  125. Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models -- 2025-12-15
  126. New in llama.cpp: Model Management -- 2025-12-12
  127. Successful prototype component prompt - For big projects -- 2025-12-12
  128. Building RNJ-1: What makes It different from Gemma 3? -- 2025-12-11
  129. EssentialAI/rnj-1 -- 2025-12-11
  130. From Azure Functions to FreeBSD -- 2025-12-10
  131. [Editorial] https://www.linkedin.com/posts/dakharlamov_too-slow-thats-what-they-called-engineers-activity-7403964370390646784-J27P -- 2025-12-09
  132. [Editorial] https://arxiv.org/abs/2511.20920 -- 2025-12-09
  133. [Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_your-ai-agents-context-window-is-not-a-database-activity-7403394612893024256-9S87 -- 2025-12-09
  134. [Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-postgres-a-self-learning-activity-7403841311029837824-jogp -- 2025-12-09
  135. Emacs is my new window manager -- 2025-12-08
  136. shubh-io/DockMate -- 2025-12-08
  137. Comfy-Org/HunyuanVideo_1.5_repackaged -- 2025-12-08
  138. We were tired of guessing which local model to use for which query. built a speculative execution lib that figures it out (github) -- 2025-12-05
  139. Nimony (eventually Nim 3.0) Design Principles -- 2025-12-03
  140. nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
  141. allenai/Olmo-3-1125-32B -- 2025-12-03
  142. moonshotai/Kimi-Linear-48B-A3B-Instruct -- 2025-12-02
  143. orabazes/FLUX.2-dev-GGUF -- 2025-12-02
  144. Transformers v5: Simple model definitions powering the AI ecosystem -- 2025-12-02
  145. Pocketbase – open-source realtime back end in 1 file -- 2025-11-28
  146. [Editorial] https://www.rand.org/pubs/commentary/2025/11/electromagnetic-warfare-natos-blind-spot-could-decide.html -- 2025-11-28
  147. In depth analysis of Nvidia's Jet Nemotron models -- 2025-11-26
  148. Hidden causes of LLM latency, its not just the model size -- 2025-11-26
  149. Benchmark: Self-Hosted Qwen-30B (LoRA) vs. Llama-3.1-8B vs. GPT-4.1-nano. Comparison of parsing success rates and negative constraints. -- 2025-11-25
  150. [Editorial] https://arxiv.org/html/2510.04871v1 -- 2025-11-24
  151. [Editorial] https://www.linkedin.com/posts/ingason_agenticai-erp-aiagents-activity-7396551539538141185-PxNU -- 2025-11-24
  152. facebook/sam3 -- 2025-11-24
  153. rl-research/DR-Tulu-8B -- 2025-11-24
  154. Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning -- 2025-11-24
  155. marinero4972/Open-o3-Video -- 2025-11-17
  156. nanonets/Nanonets-OCR2-3B -- 2025-11-17
  157. nvidia/ChronoEdit-14B-Diffusers-Upscaler-Lora -- 2025-11-17
  158. ibm-granite/granite-4.0-h-350m -- 2025-11-14
  159. tencent/HunyuanWorld-Mirror -- 2025-11-14
  160. [Editorial] https://www.npmjs.com/package/neural-trader -- 2025-11-14
  161. Critical RCE patched in Imunify360 affects up to 50M+ websites -- 2025-11-14
  162. Kubernetes Ingress Nginx is retiring -- 2025-11-14
  163. About KeePassXC's Code Quality Control -- 2025-11-14
  164. Visualizing Quantization Types -- 2025-11-11
  165. Co-authored a book called "Build DeepSeek from Scratch" | Live Now -- 2025-11-11
  166. Writing your own BEAM -- 2025-11-11
  167. DIY Powerwall Blows Clouds, Competition Out of the Water -- 2025-11-11
  168. BAAI/Emu3.5-Image -- 2025-11-07
  169. Riemannian Optimization for LoRA on the Stiefel Manifold -- 2025-11-07
  170. "On-the-fly" code reviews with ollama. It kinda works.. -- 2025-11-07
  171. I Built an "AI Art Director" Agent to Orchestrate Image and Video Models. -- 2025-11-07
  172. What's your most unexpected Claude workflow discovery? -- 2025-11-07
  173. [Research] Cross-Stage Vulnerabilities in Large Language Model Architectures -- 2025-11-07
  174. runZeroInc/runZeroHound -- 2025-11-07
  175. openai/gpt-oss-safeguard-120b -- 2025-11-07
  176. Why does Image Recognition work in llama-server but not through Open WebUI? -- 2025-11-06
  177. Has anyone tested ollama on Whisplay HAT with Raspberry pi zero 2W? -- 2025-11-06
  178. allenai/olmOCR-2-7B-1025 -- 2025-11-06
  179. is there simple way like .bat to compress to q4-q8 like Unsloth, Qwen3-VL-30B-A3B-Thinking-abliterated model -- 2025-11-06
  180. MiniMax M2 Llama.cpp support merged -- 2025-11-05
  181. Which AI IDE should I use under $20/month? -- 2025-11-05
  182. lzA6/video-to-txt -- 2025-11-05
  183. xaviviro/python-toon -- 2025-11-05
  184. [Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
  185. Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
  186. Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
  187. Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
  188. [Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
  189. [Editorial] Context Engineering Handbook -- 2025-11-03
  190. Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
  191. OSS alternative to Open WebUI - ChatGPT-like UI, API and CLI -- 2025-11-03
  192. Looking for a RAG UI manager to meet our needs to replace Zapier -- 2025-11-03
  193. LiquidAI/LFM2-1.2B-Extract -- 2025-11-03
  194. DeepSeek may have found a new way to improve AI’s ability to remember -- 2025-11-02
  195. Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection -- 2025-11-02
  196. OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) -- 2025-10-31
  197. briaai/FIBO -- 2025-10-31
  198. snowyfizz/Vision-Detection-API -- 2025-10-29
  199. Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf -- 2025-10-29
  200. huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning -- 2025-10-29
  201. AlphaXiv,Compare the Deepseek-OCR and Mistral-OCR OCR models -- 2025-10-26
  202. Open-Bee/Bee-8B-RL -- 2025-10-26
  203. datalab-to/chandra -- 2025-10-26
  204. Unlock the power of images with AI Sheets -- 2025-10-26
  205. Picture in Picture / Webcam detect model on HuggingFace -- 2025-10-25
  206. Show HN: Story Keeper – AI agents with narrative continuity instead of memory -- 2025-10-25
  207. Show HN: Deta Surf – An open source and local-first AI notebook -- 2025-10-25
  208. Show HN: Tommy – Turn ESP32 devices into through-wall motion sensors -- 2025-10-25
  209. 20x Max Plan (€216) takes 2% of weekly Opus usage for a single Deep Research Query. That equals 50 per week if you use it ONLY for this and never continue or respond -- 2025-10-24
  210. I got tired of OpenAI dependency. Built a multi-LLM control center instead. -- 2025-10-19
  211. Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
  212. Claude Code taking a coffee break 🤔 -- 2025-10-19
  213. Show HN: Cmux – Coding Agent Multiplexer -- 2025-10-19
  214. [Editorial] Agentic Orchestration -- 2025-10-19
  215. Chicken Squisher 3000: Squish-Proof Security -- 2025-10-19
  216. Show HN: Largest open-source multimodal AI dataset -- 2025-10-18
  217. KORMo-Team/KORMo-10B-sft -- 2025-10-18
  218. ByteDance/FaceCLIP -- 2025-10-18
  219. Do you use multiple AI models for coding? Trying to validate a workflow problem -- 2025-10-17
  220. The “Compounding Engineering” mindset changed how I think about AI coding tools -- 2025-10-17
  221. Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js -- 2025-10-13
  222. swiss-ai/Apertus-8B-Instruct-2509 -- 2025-10-13
  223. A Childhood Dream, Created and Open Sourced -- 2025-10-13
  224. Some small tools for you - Ollama Managment UI, Passkey authentication proxy -- 2025-10-12
  225. Single prompt I run after git commit (before push) for AI diff/commit review -- 2025-10-12
  226. Show HN: Gitcasso – Syntax Highlighting and Draft Recovery for GitHub Comments -- 2025-10-12
  227. simonw/claude-skills -- 2025-10-12
  228. Custom models don't work after v0.6.33 update - Anyone else? -- 2025-10-12
  229. Pardus AI: Open source AI Assistant thanks for the help with Ollama -- 2025-10-09
  230. What to use for refactoring -- 2025-10-09
  231. Claude Code finally has a planning partner — I built an AI backlog manager for solo devs -- 2025-10-09
  232. Granite4 Small-h 32b-A9b (Q4_K_M) at FULL 1M context window is using only 73GB of VRAM - Life is good! -- 2025-10-09
  233. Run Open AI GPT-OSS on a mobile phone (Demo) -- 2025-10-09
  234. AI21 releases Jamba 3B, the tiny model outperforming Qwen 3 4B and IBM Granite 4 Micro! -- 2025-10-09
  235. inclusionAI/Ling-mini-2.0 -- 2025-10-09
  236. [Editorial] The Tiny Recursive Mode -- 2025-10-08
  237. deepseek-ai/DeepSeek-V3.1-Terminus -- 2025-10-08
  238. Behavioral Modification Systems in Large Language Models: A Methodological Analysis of Long Conversation Reminders -- 2025-10-08
  239. Ring Flash 2.0 104B A6B with Linear Attention released a few days ago -- 2025-10-07
  240. Bring Your Own Data (BYOD) -- 2025-09-30
  241. CohereLabs/command-a-reasoning-08-2025 -- 2025-09-30
  242. Built an MCP server for Claude Desktop to browse Reddit in real-time -- 2025-09-30
  243. MetalQwen3: Full GPU-Accelerated Qwen3 Inference on Apple Silicon with Metal Shaders – Built on qwen3.c - WORK IN PROGRESS -- 2025-09-28
  244. yangdongchao/UniAudio2 -- 2025-09-28
  245. jimsweb/aiMIDI -- 2025-09-28
  246. Handy – Free open-source speech-to-text app written in Rust -- 2025-09-28
  247. IndexTeam/IndexTTS-2 -- 2025-09-28
  248. beankeji-cloud/SLiteIO -- 2025-09-28
  249. GitHub - shantur/jarvis-mcp: Bring your AI to life—talk to assistants instantly in your browser. Zero hasle, No API keys, No Whisper -- 2025-09-27
  250. PC memory costs to climb as fabs chase filthy lucre in servers and HBM -- 2025-09-27
  251. AI and licensing (commercial use) -- 2025-09-26
  252. I trained an LLM from scratch AMA! -- 2025-09-26
  253. pengzhangzhi/Open-dLLM -- 2025-09-26
  254. SWE-Bench Pro -- 2025-09-23
  255. Investigating Training Data Detection in AI Coders -- 2025-09-23
  256. A first stab at packaging llama.cpp in a performance-optimized manner -- 2025-09-23
  257. Model: Qwen3 Next Pull Request llama.cpp -- 2025-09-23
  258. Unobtanium No More; Perhaps We Already Have All The Elements We Need -- 2025-09-21
  259. support for the upcoming Olmo3 model has been merged into llama.cpp -- 2025-09-21
  260. Running Nvidia CUDA Pytorch/vLLM projects and pipelines on AMD with no modifications -- 2025-09-21
  261. A Quick Look At The AMD Instinct MI355X With ROCm 7.0 -- 2025-09-21
  262. Uncensored AI model for from 4b Max 8b -- 2025-09-21
  263. facebook/MobileLLM-R1-950M -- 2025-09-20
  264. OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
  265. KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
  266. GPT-OSS:20b & Qwen 4b are a match made in heaven for 24GB VRAM builds -- 2025-09-18
  267. Was working in RAG recently got to know how well Gemma3 4B performs -- 2025-09-18
  268. ROCm 6.4.3 -> 7.0-rc1 after updating got +13.5% at 2xR9700 -- 2025-09-18
  269. Is it possible for different brand GPUs to work together? -- 2025-09-18
  270. Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts -- 2025-09-18
  271. LFM2-1.2B safety benchmark -- 2025-09-18
  272. Has anyone successfully gotten Ollama models (or any models) to execute SQL queries through natural language in Openwebui? -- 2025-09-18
  273. yangzhou24/OmniWorld -- 2025-09-18
  274. Analog Optical Computer for Inference and Combinatorial Optimization -- 2025-09-18
  275. Running Qwen-Next (Instruct and Thinking) MLX BF16 with MLX-LM on Macs -- 2025-09-17
  276. [Editorial] REFRAG: Rethinking RAG based Decoding -- 2025-09-17
  277. The Practicality Of Solar Powered Meshtastic -- 2025-09-17
  278. Kwai-Klear/Klear-46B-A2.5B-Instruct -- 2025-09-16
  279. WestZhang/VibeVoice-Large-pt -- 2025-09-16
  280. FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference -- 2025-09-16
  281. Qwen 3 Next Series – Qwen/Qwen3 Next 80B A3B Instruct Detected -- 2025-09-16
  282. Test-time Prompt Intervention -- 2025-09-16
  283. [Editorial] Tricks from OpenAI gpt-oss YOU can use with transformers -- 2025-09-15
  284. openbmb/MiniCPM4.1-8B -- 2025-09-15
  285. nunchaku-tech/nunchaku-qwen-image -- 2025-09-15
  286. ggml-org/gpt-oss-20b-GGUF -- 2025-09-15
  287. MBZUAI releases K2 Think. 32B reasoning model based on Qwen 2.5 32B backbone, focusing on high performance in math, coding and science. -- 2025-09-14
  288. unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF -- 2025-09-14
  289. Effecient hot-swappable LoRA variant supported in llama.cpp -- 2025-09-12
  290. Qwen/Qwen3-Next-80B-A3B-Instruct -- 2025-09-12
  291. swiss-ai/Apertus-70B-Instruct-2509 -- 2025-09-12
  292. GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version) -- 2025-09-12
  293. stepfun-ai/step3 -- 2025-09-12
  294. Exploring State-Space-Model based Language Model in Music Generation -- 2025-09-12
  295. HuggingFaceModelDownloader v2.0 — fast resume, a slick TUI, and powerful filters for GGUF/variants -- 2025-09-12
  296. Roo Code Cloud is here with Task Sync & Roomote Control || Roo Code 3.28.0 Release Notes -- 2025-09-12
  297. Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers -- 2025-09-12
  298. Made a one-click SearXNG fork with Redis, plus Dockerized Tika+OCR, and soon: local TTS/STT on Intel iGPU + AMD NPU -- 2025-09-12
  299. Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search -- 2025-09-10
  300. Introducing FineVision: a huge open-source dataset for training SOTA Vision Language Models -- 2025-09-10
  301. wildminder/ComfyUI-VibeVoice -- 2025-09-10
  302. bytedance/USO -- 2025-09-10
  303. Wan-AI/Wan2.2-I2V-A14B -- 2025-09-10
  304. YanoljaNEXT-Rosetta: A Collection of Translation Models in Different Sizes -- 2025-09-07
  305. nasa-ibm-ai4science/Surya-1.0 -- 2025-09-06
  306. Vulkan back ends, what do you use? -- 2025-09-06
  307. A new OpenAI model? Could this be 5.1 or 5o? What do you think? -- 2025-09-06
  308. haasonsaas/dspy-0to1-guide -- 2025-09-06
  309. 16 reproducible failures → upgraded into a 300+ page Global Fix Map. one link inside, feedback wanted -- 2025-09-06
  310. VibeVoice RIP? What do you think? -- 2025-09-05
  311. lodestones/Chroma1-HD -- 2025-09-05
  312. Welcome EmbeddingGemma, Google's new efficient embedding model -- 2025-09-05
  313. NousResearch/Hermes-4-405B -- 2025-09-05
  314. After deepseekv3 I feel like other MoE architectures are old or outdated. Why did Qwen chose a simple MoE architecture with softmax routing and aux loss for their Qwen3 models when there’s been better architectures for a while? -- 2025-09-02
  315. The Hacker's Guide to Building an AI Supercluster -- 2025-09-02
  316. CAD, From Scratch: MakerCAD -- 2025-09-02
  317. PSO-Merging: Merging Models Based on Particle Swarm Optimization -- 2025-09-02
  318. Coral-Protocol/Anemoi -- 2025-09-01
  319. After researchers unmasked a prolific SMS scammer, a new operation has emerged -- 2025-09-01
  320. Silent No More: Open-Source Fix for Mic Mishaps -- 2025-09-01
  321. How to reliably detect cross-listed job ads across multiple sites? -- 2025-09-01
  322. GPT OSS Fine-tuning QAT -- 2025-08-31
  323. Semantic Structure in Large Language Model Embeddings -- 2025-08-31
  324. facebook/dinov3-vit7b16-pretrain-lvd1689m -- 2025-08-31
  325. MizzenAI/HPSv3 -- 2025-08-31
  326. Some thoughts on LLMs and software development -- 2025-08-30
  327. Show HN: Sideko – Hybrid deterministic/LLM generator for API SDKs and docs -- 2025-08-30
  328. [Editorial] AI interfaces for future -- 2025-08-29
  329. I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
  330. Updates to Consumer Terms and Privacy Policy -- 2025-08-29
  331. Hierarchical Reasoning Model (HRM) implementation for text generation -- 2025-08-27
  332. SQLite-Vector adds support for float16 and bfloat16 (CPU, NEON, AVX2 and SSE2) -- 2025-08-26
  333. zai-org/GLM-4.5 -- 2025-08-26
  334. rednote-hilab/dots.vlm1.inst -- 2025-08-26
  335. black-forest-labs/FLUX.1-Krea-dev -- 2025-08-26
  336. Datarus-R1-14B-Preview, an adaptive multi-step reasoning LLM for automated data analysis -- 2025-08-24
  337. Fully Open source, serverless, community-driven MCP alternative built in Python, TS and Go -- 2025-08-24
  338. unsloth/Kimi-K2-Instruct-GGUF -- 2025-08-24
  339. DeepSeek V3.1 Reasoner improves over DeepSeek R1 on the Extended NYT Connections benchmark -- 2025-08-24
  340. Qwen3-30B-A3B-Instruct 2507 vs Qwen3-Coder Flash -- 2025-08-23
  341. Your model zoo for Software dev / webdev -- 2025-08-23
  342. sriniously/go-boilerplate -- 2025-08-23
  343. Design Patterns in MCP: Literate Reasoning -- 2025-08-22
  344. FlyMyAI/flymyai-lora-trainer -- 2025-08-22
  345. Zedless: Zed fork focused on privacy and being local-first -- 2025-08-22
  346. Show HN: Rucat – Cat for Prompt Engineers -- 2025-08-22
  347. NEW VERSION: 0.6.23 Has Just Released! - Many fixes and new features, huge changelog -- 2025-08-22
  348. Docker Model Runner is really neat -- 2025-08-20
  349. Build a Powerful RAG Web Scraper with Ollama and LangChain -- 2025-08-20
  350. Ollama interface with memory -- 2025-08-20
  351. YuminosukeSato/pyproc -- 2025-08-20
  352. AGENTS.md – Open format for guiding coding agents -- 2025-08-20
  353. NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1 -- 2025-08-20
  354. deepseek-ai/DeepSeek-V3.1-Base · Hugging Face -- 2025-08-20
  355. mistralai/Devstral-Small-2507 -- 2025-08-20
  356. zai-org/GLM-4.5-Air -- 2025-08-20
  357. microsoft/Phi-4-mini-flash-reasoning -- 2025-08-20
  358. ModelTC/Qwen-Image-Lightning -- 2025-08-20
  359. Fast Type-Aware Linting in Oxlint -- 2025-08-20
  360. GPT-5, where does it shine for you? -- 2025-08-19
  361. vidore/colqwen-omni-v0.1 -- 2025-08-19
  362. Tutorial: Open WebUI and llama-swap works great together! Demo of setup, model swapping and activity monitoring. -- 2025-08-18
  363. Concurrency in open-weight/open-source models? -- 2025-08-18
  364. [Editorial] Claude Flow, Alpha 90 release -- 2025-08-17
  365. Optimizing Text gen webui (oobabooga) for MOE models (Qwen3-235b, GLM 4.5) -- 2025-08-17
  366. Chen-zexi/vllm-cli -- 2025-08-17
  367. zyfoxx/subhunter -- 2025-08-17
  368. HuggingFaceTB/SmolLM3-3B-Base -- 2025-08-16
  369. mistralai/Voxtral-Small-24B-2507 -- 2025-08-16
  370. TiTan - a tiny model for tags and titles -- 2025-08-16
  371. baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-08-16
  372. MiniLM (BERT) embeddings in C from scratch -- 2025-08-16
  373. GENNAI CLI - A ReAct-based agent CLI -- 2025-08-16
  374. gptme v0.28.0 major release - agent CLI with local model support -- 2025-08-16
  375. WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling -- 2025-08-16
  376. character-ai/pipelining-sft -- 2025-08-15
  377. Compass-Thinker-7B Technical Report -- 2025-08-15
  378. Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning -- 2025-08-15
  379. SaaS Is Dead -- 2025-08-15
  380. PYX: The next step in Python packaging -- 2025-08-15
  381. [Editorial] GLM-4.5, enterprise use -- 2025-08-13
  382. Best local model with function calling? -- 2025-08-13
  383. agentica-org/DeepSWE-Preview -- 2025-08-13
  384. janhq/Jan-v1-4B-GGUF -- 2025-08-13
  385. gpt-oss jailbreak workflow -- 2025-08-11
  386. GPT-5 removed logprob support from the API - technical breakdown and implications -- 2025-08-11
  387. A model for pure text continuation (not chirpy little Q&A assistant)? -- 2025-08-11
  388. Suggestion for upgrading hardware for MOE inference and fine-tuning. -- 2025-08-09
  389. Best models under 16GB?? -- 2025-08-09
  390. Explicit tail calls are now available on Rust Nightly (become keyword) -- 2025-08-09
  391. HuggingFaceTB/SmolLM3-3B -- 2025-08-09
  392. mistralai/Magistral-Small-2507 -- 2025-08-09
  393. N8N + OpenWebUI -- 2025-08-08
  394. Create space-saving clones on macOS with Python -- 2025-08-08
  395. Experience with GLM-4.5-Air + claude code? -- 2025-08-08
  396. Just when you thought Qwen was done... -- 2025-08-08
  397. Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs -- 2025-08-08
  398. Context Management by Trimming Conversation -- 2025-08-06
  399. Exploiting Primacy Effect To Improve Large Language Models -- 2025-08-06
  400. Help: Qwen3-Coder + LM Studio + Continue.dev (VSCode) + Mac 64GB M3 Max — 500 Internal Server Error, Even After Unsloth Fix -- 2025-08-04
  401. CWC now supports kimi.com (K2) and chat.z.ai (GLM-4.5) to enable coding with top tier models at no cost -- 2025-08-04
  402. 🧠 ICM+DPO: Used Qwen3's coherent understanding to improve Gemma3 at math - cross-model capability transfer with zero supervision -- 2025-08-03
  403. NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining -- 2025-08-03
  404. Build an AI Shopping Assistant with Gradio MCP Servers -- 2025-08-01
  405. [Editorial] Alternative to vector db rag -- 2025-07-30
  406. This year’s best open-source models and most cost-effective models -- 2025-07-29
  407. I’m looking for multimodal image input support and uncensored LLM -- 2025-07-29
  408. nvidia/audio-flamingo-3 -- 2025-07-29
  409. mistralai/Voxtral-Mini-3B-2507 -- 2025-07-29
  410. ziangcao0312/PhysX-3D -- 2025-07-29
  411. UI/UX benchmark update 7/22: Newest Qwen models added, Qwen3 takes the lead in terms of win rate (though still early) -- 2025-07-28
  412. unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF -- 2025-07-28
  413. zai-org/GLM-4.5 -- 2025-07-28
  414. Tesslate/UIGEN-X-32B-0727 -- 2025-07-28
  415. had to fine-tune qwen since llama sucks at summarizing -- 2025-07-28
  416. orchestre-dev/ccproxy -- 2025-07-28
  417. [Editorial] neural networks don’t need to be giant to be powerful -- 2025-07-27
  418. Qwen/Qwen3-235B-A22B-Thinking-2507 -- 2025-07-27
  419. mistralai/Magistral-Small-2507 -- 2025-07-27
  420. cmdaltctr/claude-gemini-mcp-slim -- 2025-07-26
  421. RichardAtCT/claude-code-openai-wrapper -- 2025-07-26
  422. Freigeist - The new Vibe Coding Platform -- 2025-07-26
  423. From chaotic prompting to structured workflow: My Claude evolution -- 2025-07-24
  424. A Request for Comments (RFC) for MCP-alternative Universal Tool Calling Protocol (UTCP) was created -- 2025-07-22
  425. How to use the same context across LLMs and Agents -- 2025-07-22
  426. Has anyone actually ran VLAs locally and how good are they? -- 2025-07-22
  427. google/medsiglip-448 -- 2025-07-22
  428. Questions about AI for translation -- 2025-07-22
  429. microsoft/Phi-4-mini-flash-reasoning -- 2025-07-22
  430. LGAI-EXAONE/EXAONE-4.0-32B -- 2025-07-22
  431. Replacing thinking with tool usage enables reasoning in small language models -- 2025-07-22
  432. Struggling to Generate Polished UI with Claude Code -- 2025-07-20
  433. Skywork/Skywork-R1V3-38B -- 2025-07-20
  434. ByteDance-Seed/Seed-X-PPO-7B -- 2025-07-20
  435. Diffusion model support in llama.cpp. -- 2025-07-16
  436. GLM-4 MoE incoming -- 2025-07-16
  437. GeoArrow and GeoParquet, and the Future of Geospatial Data Analysis -- 2025-07-15
  438. Programming Affordances That Invite Mistakes -- 2025-07-14
  439. HuggingFaceTB/SmolLM3-3B-Base -- 2025-07-14
  440. OLMo 2 - a family of fully-open language models -- 2025-07-12
  441. google/videoprism -- 2025-07-12
  442. Why don’t we have a big torrent repo for open-source LLMs? -- 2025-07-12
  443. Local PDF Database searchable with ollama - best setup? -- 2025-07-12
  444. Tinyllama on old Mediatek G80 android device -- 2025-07-12
  445. I used Ollama to build a Cursor for PDFs -- 2025-07-12
  446. Advice on switching to LLM -- 2025-07-12
  447. Building the Hugging Face MCP Server -- 2025-07-11
  448. Support for the upcoming IBM Granite 4.0 has been merged into llama.cpp -- 2025-07-11
  449. support for Falcon-H1 model family has been merged into llama.cpp -- 2025-07-11
  450. fsndzomga/metadspy -- 2025-07-10
  451. osmosis-ai/Osmosis-Apply-1.7B -- 2025-07-10
  452. IntervitensInc/pangu-pro-moe-model -- 2025-07-10
  453. Continual Gradient Low-Rank Projection Fine-Tuning for LLMs -- 2025-07-10
  454. pola-rs/polars -- 2025-07-09
  455. Introducing an open source cross-platform graphical interface LLM client -- 2025-07-08
  456. llama-server vs llama python binding -- 2025-07-08
  457. Llama server completion not working correctly -- 2025-07-08
  458. introducing cocoindex - super simple etl to prepare data for ai, with dynamic index (ollama integrated) -- 2025-07-08
  459. Higher topk and num_ctx or map/reduce ? -- 2025-07-08
  460. Looking for an upgrade from Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf, especially for letter parsing. Last time I looked into this was a very long time ago (7 months!) What are the best models nowadays? -- 2025-07-08
  461. Best models by size? -- 2025-07-08
  462. Planning a 7–8B Model Benchmark on 8GB GPU — What Should I Test & Measure? -- 2025-07-08
  463. DLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching -- 2025-07-08
  464. Efficient MultiModal Data Pipeline -- 2025-07-08
  465. Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
  466. Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
  467. Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
  468. Using local models with Void -- 2025-07-05
  469. Intel GPU vLLM Docker Compose Bootstrap with Phi-lthy4 on A770 -- 2025-07-05
  470. Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm -- 2025-07-05
  471. Gemma 3n fully available in the open-source ecosystem! -- 2025-07-05
  472. 5060ti 16gb or 9060xt 16gb for small llm server -- 2025-07-05
  473. Qwen3 models in MLX format! -- 2025-07-05
  474. Best local coding model right now? -- 2025-07-05
  475. Found a Web3 LLM That Actually Gets DeFi Right -- 2025-07-03
  476. apple/DiffuCoder-7B-cpGRPO -- 2025-07-03
  477. Hoshinonyaruko/Gensokyo-MCP -- 2025-07-01
  478. ahmadallobani/BaldHead -- 2025-06-29
  479. modelcontextprotocol/registry -- 2025-06-27
  480. 0-concordance of knotted surfaces and Alexander ideals -- 2025-06-26
  481. unfinishedtr/progressbar -- 2025-06-24
  482. wearyfurnitur/confiq -- 2025-06-24
  483. Show HN: Controlling 3D models with voice and hand gestures -- 2025-06-24
  484. open-webui/mcpo -- 2025-06-23
  485. Built a fully local Whisper + pyannote stack to replace Otter. Full diarisation, transcripts & summaries on GPU. -- 2025-06-23
  486. MiniMax latest open-sourcing LLM, MiniMax-M1 — setting new standards in long-context reasoning,m -- 2025-06-23
  487. Run qwen 30b-a3b on Android local with Alibaba MNN Chat -- 2025-06-23
  488. A new PDF translation tool -- 2025-06-23
  489. What Really Happens When You Ask a Cursor a Question with GitHub MCP Integrated -- 2025-06-23
  490. [Q] How to Speed Up Mistral 7B Inference in LM Studio? 31s/Chunk on RTX 3070 -- 2025-06-23
  491. Cyber security guys are about to become very on demand in the coming few years -- 2025-06-23
  492. The first big AI disaster is yet to happen -- 2025-06-23
  493. Trading with Claude, and writing your own MCP server -- 2025-06-23
  494. Ecne AI Podcast Generator - Update -- 2025-06-23
  495. Help me decide on hardware for LLMs -- 2025-06-23
  496. chungmin99/pyroki -- 2025-06-23
  497. nvidia/GR00T-N1.5-3B -- 2025-06-23
  498. nvidia/Cosmos-Predict2-2B-Text2Image -- 2025-06-22
  499. trendmicro/vision-one-mcp-server -- 2025-06-20
  500. Minidoracat/mcp-feedback-enhanced -- 2025-06-20
  501. meta-llama/Llama-3.1-8B-Instruct -- 2025-06-19
  502. MiniMaxAI/MiniMax-M1-80k -- 2025-06-19
  503. ckanthony/openapi-mcp -- 2025-06-18
  504. Ta0ing/MCP-SecurityTools -- 2025-06-18
  505. fluxions/vui -- 2025-06-13
  506. carbon-language/carbon-lang -- 2025-06-12
  507. CJackHwang/AIstudioProxyAPI -- 2025-06-11
  508. News publishers call Google's AI Mode 'theft' -- 2025-06-11
  509. typelevel/cats -- 2025-06-11
  510. wesm/pydata-book -- 2025-06-11
  511. jedisct1/openapi-mcp -- 2025-06-09
  512. arcee-ai/Homunculus -- 2025-06-04
  513. GitHub MCP exploited: Accessing private repositories via MCP -- 2025-06-01
  514. dipampaul17/KVSplit -- 2025-06-01
  515. facebook/OMol25 -- 2025-05-30
  516. Is Microsoft’s new Foundry Local going to be the “easy button” for running newer transformers models locally? -- 2025-05-28
  517. Cobolt is now available on Linux! 🎉 -- 2025-05-28
  518. Round Up: Current Best Local Models under 40B for Code & Tool Calling, General Chatting, Vision, and Creative Story Writing. -- 2025-05-28
  519. Best local model for M2 16gb MacBook Air for Analyzing Transcripts -- 2025-05-28
  520. Prompt Debugging -- 2025-05-28
  521. Not so Smart Agent (Ollama, Spring AI, MCP) -- 2025-05-28
  522. [Q] How can one get better at fixing models,training etc.? -- 2025-05-28
  523. FOSS - MCP Server generator from OpenAPI specification files (swagger/etapi) -- 2025-05-28
  524. Digital Payment System GNU Taler Gets Green Light to Operate in Switzerland -- 2025-05-28