Quantization & Efficiency

Model compression, GGUF, efficient inference, optimization

515 articles across 145 editions

Articles

  1. Separable Expert Architecture: Privacy-Preserving LLM Personalization via Composable Adapters -- 2026-05-21
  2. 512k Context Pre-training on a 12GB Consumer GPU with O(n) Attention -- 2026-05-21
  3. Introducing the Ettin Reranker Family -- 2026-05-20
  4. OlmoEarth v1.1: A more efficient family of models -- 2026-05-20
  5. Regex Chess: A 2-ply minimax chess engine in 84,688 regular expressions -- 2026-05-19
  6. FastDMS: 6.4X KV-cache compression running faster than vLLM BF16/FP8 -- 2026-05-11
  7. [Editorial] RuVector Sparse Attention Crate -- 2026-05-11
  8. AMD to release slottable GPU -- 2026-05-11
  9. Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts -- 2026-05-11
  10. Apple Removes 256GB M3 Ultra Mac Studio Model From Online Store -- 2026-05-11
  11. Virtual violin produces realistic sounds (MIT) -- 2026-05-06
  12. Lightricks/LTX-2.3-22b-IC-LoRA-HDR -- 2026-05-06
  13. [Editorial] Finding Zero-Days with Any Model -- 2026-05-01
  14. [Editorial] OMLX.ai -- 2026-04-30
  15. llama.cpp - NVFP4 native support on Blackwell from now - b8967 -- 2026-04-30
  16. Sigilant: GGUF Quality Benchmarking Beyond TPS — Tool-Calling Pass Rate as Selection Criterion -- 2026-04-30
  17. I'm done with using local LLMs for coding -- 2026-04-30
  18. Qwen3.6-27B IQ4_XS FULL VRAM with 110k context -- 2026-04-28
  19. Can we already use Google's TurboQuant (TQ) for KV Cache in llama-server? Or are we waiting for a PR? -- 2026-04-28
  20. Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup -- 2026-04-28
  21. Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable -- 2026-04-28
  22. BitNet is the AI future? -- 2026-04-28
  23. FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed -- 2026-04-27
  24. [Editorial] Bonsai-8B MLX 1-bit -- 2026-04-27
  25. VRAM.cpp: Running llama-fit-params directly in your browser -- 2026-04-27
  26. Thoughts on using an AMD Alveo V80 FPGA as a poor man's Taalas HC1 -- 2026-04-27
  27. [Editorial] hw-smi — Cross-Platform Hardware Monitor -- 2026-04-27
  28. China's DeepSeek valuation rockets above $20B!! -- 2026-04-24
  29. [Editorial] DeepSeek Open-Sources Tile Kernels -- 2026-04-24
  30. DeepSeek v4 -- 2026-04-24
  31. [Editorial] Video Content -- 2026-04-22
  32. [Editorial] Four Horsemen of the AIpocalypse -- 2026-04-22
  33. Closest replacement for Claude + Claude Code? (got banned, no explanation) -- 2026-04-22
  34. [Editorial] Roomote: Remote Development Tool -- 2026-04-22
  35. [Editorial] Video Content -- 2026-04-22
  36. Ternary Bonsai: Top Intelligence at 1.58 Bits -- 2026-04-22
  37. Personal Eval: Gemma4 26B MoE vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared -- 2026-04-22
  38. NVIDIA Nemotron-3-Super-120B-A12B-FP8 -- 2026-04-22
  39. [Editorial] Stanford HAI AI Index Report 2026 -- 2026-04-17
  40. [Editorial] Steve Yegge on AI -- 2026-04-17
  41. [Editorial] The AI Resentment Stage -- 2026-04-17
  42. A cryptography engineer's perspective on quantum computing timelines -- 2026-04-07
  43. Sam Altman may control our future – can he be trusted? -- 2026-04-07
  44. [Editorial] Anthropic's Claude Code Source Leak — What It Means -- 2026-04-01
  45. [Editorial] Claude Code Was Leaked — I Read All of It -- 2026-04-01
  46. [Editorial] The Claude Code 'Oops' — Source Code Leak -- 2026-04-01
  47. [Editorial] Claude Code Just Open-Sourced Itself (Not Intentionally) -- 2026-04-01
  48. [Editorial] nirholas/claude-code Repository -- 2026-04-01
  49. [Editorial] arxiv:2603.15569 -- 2026-03-30
  50. TinyLoRA: LoRA training works at just 13 parameters -- 2026-03-30
  51. KV rotation PR: q8 quants tank performance on AIME25, recovered with rotation -- 2026-03-30
  52. [Editorial] AI ASIC for LLMs -- 2026-03-30
  53. [Editorial] Heretic -- 2026-03-30
  54. mlx-snn: Spiking Neural Network library for Apple MLX -- 2026-03-30
  55. TurboQuant: Redefining AI efficiency with extreme compression -- 2026-03-27
  56. [Editorial] TurboQuant Deep Dive -- 2026-03-27
  57. RightNow-AI/autokernel -- 2026-03-27
  58. NVIDIA 2026 Conference LIVE. New Base model coming! -- 2026-03-20
  59. Nemotron 3 Nano 4B: A Compact Hybrid Model for Efficient Local AI -- 2026-03-20
  60. Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge -- 2026-03-20
  61. [New Model & Agent] LocoTrainer-4B: A Claude Code-style local agent designed specifically to master the MS-SWIFT framework (4B, 32K, GGUF) -- 2026-03-20
  62. shallowdream204/BitDance-14B-16x -- 2026-03-20
  63. [Editorial] Claude 1M Context GA -- 2026-03-14
  64. NVIDIA Nemotron 3 Super: open-weight 120B MoE hybrid with 1M-token context -- 2026-03-14
  65. Expert parallelism for 1T MoE finetuning on a single node - 50x faster and 2x cheaper than alternatives -- 2026-03-14
  66. Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks -- 2026-03-12
  67. Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis -- 2026-03-12
  68. [Editorial] -- 2026-03-09
  69. My journey through Reverse Engineering SynthID -- 2026-03-09
  70. Did Alibaba just kneecap its powerful Qwen AI team? -- 2026-03-07
  71. [Editorial] You're 1,191 Days Late — Here's What to Do -- 2026-03-07
  72. [Editorial] The Zero-Day Clock Is Ticking -- 2026-03-06
  73. [Editorial] Step-by-Step Guide to Exploiting AI Systems -- 2026-03-06
  74. [Editorial] Unprompted 2026: Top Insights Day One -- 2026-03-06
  75. [Editorial] Unprompted 2026: Top Insights Day Two -- 2026-03-06
  76. The L in "LLM" Stands for Lying -- 2026-03-05
  77. [Editorial] Am I Living in a Parallel AI Universe? -- 2026-03-05
  78. [Editorial] Video Pick -- 2026-03-05
  79. Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the "car wash test" -- 2026-03-03
  80. Qwen 3.5 is multimodal. Here is how to enable image understanding in opencode with llama cpp -- 2026-03-03
  81. pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval -- 2026-03-03
  82. andimarafioti/faster-qwen3-tts -- 2026-03-03
  83. Meta's AI smart glasses and data privacy concerns -- 2026-03-03
  84. [Editorial] AI Search Index -- 2026-03-03
  85. Mercury 2: Fast reasoning LLM powered by diffusion -- 2026-02-26
  86. I Benchmarked Opus 4.6 vs Sonnet 4.6 on agentic PR review and browser QA the results weren't what I expected -- 2026-02-26
  87. [Editorial] Bullshit meter :) -- 2026-02-26
  88. The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. -- 2026-02-25
  89. Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. -- 2026-02-25
  90. ChatGPT isn't the only chatbot pulling answers from Elon Musk's Grokipedia -- 2026-02-25
  91. [Editorial] Benchmarking LLMs for Voice Agent Use Cases -- 2026-02-21
  92. Claude Opus 4.6 Surges Past Forecasts on METR's 50% Time-Horizon Benchmark with Exponential Gains -- 2026-02-21
  93. [Editorial] Unsloth: MiniMax M2.5 Fine-Tuning Guide -- 2026-02-21
  94. Let your coding agent benchmark llama.cpp for you (auto-hunt the fastest params per model) -- 2026-02-06
  95. GGML implementation of Qwen3-ASR -- 2026-02-06
  96. Running LLMs & VLMs Fully On-Device on iPhone(6GB RAM) — Offline, Privacy-Focused, Real-Time Performance -- 2026-02-06
  97. We benchmarked every 4-bit quantization method in vLLM 👀 -- 2026-01-12
  98. Gpu inference with model that does not fit in one GPU -- 2026-01-12
  99. Llama.cpp rpc experiment -- 2026-01-12
  100. Performance improvements in llama.cpp over time -- 2026-01-12
  101. [Editorial] https://docs.rs/crate/bitchat-qudag/latest -- 2026-01-02
  102. [Editorial] https://github.com/permissionlesstech/bitchat/blob/main/WHITEPAPER.md -- 2026-01-02
  103. [Editorial] https://www.npmjs.com/package/@ruvector/edge-net -- 2026-01-02
  104. Why I Ditched Serverless Neptune/OpenSearch for Dockerized Neo4j/pgvector on EC2 (60% Cost Cut) -- 2025-12-30
  105. Llama-3.3-8B-Instruct -- 2025-12-30
  106. Benchmarking local llms for speed with CUDA and vulkan, found an unexpected speedup for select models -- 2025-12-30
  107. Why Kimi K2 Thinking choose Int4 QAT, from infra enginner of KImi -- 2025-12-30
  108. Help RTX 5090 + llama.cpp crashes after 2-3 inferences (VFIO passthrough, SM120 CUDA) -- 2025-12-30
  109. AI-Doomsday-Toolbox Distributed inference + workflows -- 2025-12-30
  110. [Tool] imesde: Zero-GPU, In-Memory Vector Engine for Real-Time Local RAG -- 2025-12-22
  111. I built a Rust-based HTML-to-Markdown converter to save RAG tokens (Self-Hosted / API) -- 2025-12-22
  112. Golang optimizations for high‑volume services -- 2025-12-12
  113. PaCoRe: The first open-source deep think 8B model beats GPT-5 on HMMT25 -- 2025-12-11
  114. RnJ-1-Instruct FP8 Quantization -- 2025-12-10
  115. Optical Context Compression Is Just (Bad) Autoencoding -- 2025-12-10
  116. Masked Diffusion Models as Energy Minimization -- 2025-12-10
  117. Miles + FSDP2 = Megatron-Level Performance with More Flexibility -- 2025-12-10
  118. P4nda0s/IDA-NO-MCP -- 2025-12-09
  119. Toyota unintended acceleration and the big bowl of "spaghetti" code (2013) -- 2025-12-09
  120. https://huggingface.co/Doradus/Hermes-4.3-36B-FP8 -- 2025-12-09
  121. Support for rnj-1 now in llama.cpp -- 2025-12-09
  122. Comfy-Org/flux2-dev -- 2025-12-09
  123. baidu/ERNIE-4.5-VL-28B-A3B-Thinking -- 2025-12-09
  124. I built a personal assistant script, and the CPU inference speed beats my Llama setup. -- 2025-12-08
  125. Semantic Compression (2014) -- 2025-12-08
  126. A Deep Dive into Using PIO and DMA on the RP2350 -- 2025-12-05
  127. Free yourself from the Spotify desktop client with spotifyd -- 2025-12-04
  128. I cooked abliterated gemma3-27b-it with norm-preserving technique -- 2025-12-04
  129. Qwen3 VL built from scratch with PyTorch -- 2025-12-03
  130. EmbeddingGemma: Powerful and Lightweight Text Representations -- 2025-12-03
  131. Z-Image: Powerful and highly efficient image generation model with 6B parameters -- 2025-12-03
  132. RTX 5090 + Qwen 30B MoE @ 135 tok/s in NVFP4 - Full guide with C++ patches -- 2025-12-02
  133. [Editorial] https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration -- 2025-12-02
  134. Optimizing Token Generation in llama.cpp's CUDA Backend -- 2025-12-01
  135. [Editorial] https://arxiv.org/html/2511.09030v1 -- 2025-11-28
  136. You're using HuggingFace wrong. Stop downloading pre-quantized GGUFs and start building hardware-optimized, domain-specific models. Here's the pipeline I built to do it. -- 2025-11-26
  137. Binary Quantization For LLMs Through Dynamic Grouping -- 2025-11-26
  138. dx8152/Relight -- 2025-11-26
  139. Question About Motherboards -- 2025-11-26
  140. [Release] DragonMemory: 16× semantic compression for local RAG context (open-source, AGPL) -- 2025-11-25
  141. ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation -- 2025-11-25
  142. Continuous batching from first principles -- 2025-11-25
  143. Can an expert chime in and explain what is holding Vulkan back from becoming the standard API for ML? -- 2025-11-25
  144. luozijian1990/network-traffic-ebpf-exporter -- 2025-11-24
  145. We found cryptography bugs in the elliptic library using Wycheproof -- 2025-11-24
  146. Show HN: Cynthia – Reliably play MIDI music files – MIT / Portable / Windows -- 2025-11-24
  147. Browser Fingerprinting and Why VPNs Won’t Make You Anonymous -- 2025-11-24
  148. [Release] Memory-Isolated Recursive Compression (MIRC). A local-first probabilistic compression utility for Apple Silicon. Research Preview (Open Source) -- 2025-11-21
  149. Read long podcasts locally with Whisper + LLM, open sourced -- 2025-11-21
  150. Local all-in-one AI system (Local multimodal AI) -- 2025-11-21
  151. JMS1717/8mb.local -- 2025-11-21
  152. Mimir Memory Bank now uses llama.cpp! -- 2025-11-21
  153. Quantum physicists have shrunk and "de-censored" DeepSeek R1 -- 2025-11-20
  154. Built a tool to solve the "how much GPU do I actually need?" problem for LLM deployment -- 2025-11-20
  155. New Parameter Browser added to Llamacpp Model Launcher! experimental model parameter tuning(window/cuda only) -- 2025-11-20
  156. cuda device list mismatch - ggml_cuda_init / ubuntu - significance to using --main-gpu flag -- 2025-11-20
  157. What Size of LLM Can 4x RTX 5090 Handle? (96GB VRAM) -- 2025-11-20
  158. Gain 60% performance on RDNA 4 using this fix -- 2025-11-19
  159. wildminder/ComfyUI-DyPE -- 2025-11-19
  160. lightx2v/Autoencoders -- 2025-11-19
  161. Scale-out is the silent killer of LLM applications. Are we solving the wrong problem? -- 2025-11-19
  162. PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
  163. Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
  164. [Editorial] https://www.linkedin.com/posts/andriyburkov_when-you-train-a-model-on-one-dataset-it-activity-7392804316769701888-166x/ -- 2025-11-14
  165. xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning -- 2025-11-14
  166. [Editorial] Balancing order, freedom, and technology -- 2025-11-12
  167. AMD warns the Intel and Nvidia partnership is a risk to its business -- 2025-11-12
  168. A Pentium In Your Hand -- 2025-11-12
  169. [Editorial] https://www.linkedin.com/posts/ismaelvelasco_theres-an-ai-text-model-comparable-to-sota-activity-7393850964731912192-nZT1 -- 2025-11-12
  170. Last week in Multimodal AI - Local Edition -- 2025-11-12
  171. Apache Iggy is a high-performance, persistent message streaming platform -- 2025-11-07
  172. [P] Training Better LLMs with 30% Less Data – Entropy-Based Data Distillation -- 2025-11-06
  173. I fine tuned a (small) model to help with reasoning backfill on old/non-reasoning datasets -- 2025-11-06
  174. Superhuman AI for Multiplayer Poker -- 2025-11-06
  175. cerebras/GLM-4.5-Air-REAP-82B-A12B -- 2025-11-06
  176. Retrieval Enhanced Feedback via In-context Neural Error-book -- 2025-11-06
  177. Kimi release Kimi K2 Thinking, an open-source trillion-parameter reasoning model -- 2025-11-06
  178. OpenAI asks U.S. for loan guarantees to fund $1T AI expansion -- 2025-11-06
  179. Kimi released Kimi K2 Thinking, an open-source trillion-parameter reasoning model -- 2025-11-06
  180. [Editorial] Frequently wrong, but never in doubt’ -- 2025-11-05
  181. The Zero Freeze Formula: Teaching Local LLaMA Real Physics Through Python (SU(3) Mass Gap Simulation) to solve the Yang–Mills Mass Gap -- 2025-11-05
  182. Audio Sound Capture Project Needs Help -- 2025-11-05
  183. [D] It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions? -- 2025-11-05
  184. GLaDOS TTS finetuning on MLX from the original game files -- 2025-11-04
  185. zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
  186. guyyariv/DyPE -- 2025-11-04
  187. Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
  188. Help me decide: EPYC 7532 128GB + 2 x 3080 20GB vs GMtec EVO-X2 -- 2025-11-03
  189. amd/Nitro-E -- 2025-11-03
  190. CISA and NSA share tips on securing Microsoft Exchange servers -- 2025-11-02
  191. The Smol Training Playbook: The Secrets to Building World-Class LLMs -- 2025-11-02
  192. Latest Update from Anthropic's new model - Neptune V6 -- 2025-11-02
  193. AI "Phone Farm" Startup Gets Funding from Marc Andreessen to Flood Social Media With Spam -- 2025-11-02
  194. FlashPack: High-throughput tensor loading for PyTorch -- 2025-11-01
  195. M5 Neural Accelerator benchmark results from Llama.cpp -- 2025-11-01
  196. Kafka is Fast – I'll use Postgres -- 2025-11-01
  197. [Editorial] https://www.linkedin.com/posts/busiel-morley_economic-shifts-in-the-age-of-ai-ugcPost-7390349517612806144-8djS -- 2025-11-01
  198. Analog Surround Sound Was Everywhere, But You Probably Didn’t Notice -- 2025-11-01
  199. US Gas Turbine Shortage Likely to Slow AI Demand Growth -- 2025-10-31
  200. The Supercon 2025 Badge is Built to be Customized -- 2025-10-31
  201. Experimenting with Qwen3-VL for Computer-Using Agents -- 2025-10-30
  202. Built a full voice AI assistant running locally on my RX 6700 with Vulkan - Proof AMD cards excel at LLM inference -- 2025-10-30
  203. Streaming datasets: 100x More Efficient -- 2025-10-30
  204. Cerebras REAP'd GLM4.6: 25%, 30%, 40% pruned FP8 checkpoints on HF! -- 2025-10-28
  205. Qwen/Qwen3-VL-30B-A3B-Instruct-FP8 -- 2025-10-28
  206. lightx2v/Wan2.2-Distill-Loras -- 2025-10-28
  207. [Editorial] Periodic table for ai algorithms -- 2025-10-26
  208. Need help understanding OpenAIs API usage for text-embedding -- 2025-10-26
  209. Qwen3 Next support in llama.cpp ready for review -- 2025-10-25
  210. GLM Air REAP tool call problems -- 2025-10-25
  211. Reverse Engineering STL Files with FreeCAD -- 2025-10-25
  212. Un-LOCC (Universal Lossy Optical Context Compression), Achieve Up To 3× context compression with 93.65% Accuracy. -- 2025-10-24
  213. LiquidAI/LFM2-1.2B-RAG -- 2025-10-24
  214. zai-org/GLM-4.6 -- 2025-10-24
  215. inference-net/Schematron-3B -- 2025-10-21
  216. [By GLM Team] Glyph: Scaling Context Windows via Visual-Text Compression -- 2025-10-21
  217. DGX SPARK Compiled llama.cpp Benchmarks Compared to M4 MAX (non-MLX) -- 2025-10-21
  218. perplexityai/search_evals -- 2025-10-21
  219. Hetzner: The Simple Cloud just got more flexible and more affordable -- 2025-10-21
  220. A new, super simple LLM benchmark for testing changes across models, quants, parameters, samplers, engines, etc -- 2025-10-21
  221. riptideslabs/tokenex -- 2025-10-20
  222. Multi-Tenant SaaS's Wildcard TLS: An Overview of DNS-01 Challenges -- 2025-10-20
  223. From cloud to OCP? Be ready to wrangle firmware -- 2025-10-20
  224. FLOSS Weekly Episode 851: Buckets of Money -- 2025-10-20
  225. Significant speedup for local models -- 2025-10-20
  226. Cursor tricking paid users with fake Claude Sonnet 4.5 -- 2025-10-20
  227. inclusionAI/Ring-1T -- 2025-10-20
  228. volantvm/volant -- 2025-10-19
  229. Wireshark 4.6.0 Supports macOS Pktap Metadata (PID, Process Name, etc.) -- 2025-10-19
  230. A classified network of SpaceX satellites is emitting a mysterious signal -- 2025-10-19
  231. linkedlist771/SoraWatermarkCleaner -- 2025-10-19
  232. Qwen/Qwen-Image-Edit-2509 -- 2025-10-19
  233. The Entire Process of Building an Open Source Analog ASIC -- 2025-10-15
  234. Built a 1288x RTFx Parakeet Speech-to-Text server... Enjoy! -- 2025-10-13
  235. Novel OpenGL Pixel Shader Dewarping -- 2025-10-13
  236. lovis93/next-scene-qwen-image-lora-2509 -- 2025-10-13
  237. Beyond Token Count: Our Research Suggests "Contextual Weight" is a Key Limiter on Large Context Windows -- 2025-10-13
  238. FractalAIResearch/Fathom-Search-4B -- 2025-10-13
  239. LLM Robustness Leaderboard v1 --Technical report -- 2025-10-13
  240. Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis -- 2025-10-13
  241. Preference optimization with ORPO and LoRA -- 2025-10-12
  242. [Show] SpiralTorch: A Rust-based PyTorch-style autograd engine (Python 3.14-ready) -- 2025-10-12
  243. Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it -- 2025-10-10
  244. Did anyone try out GLM-4.5-Air-GLM-4.6-Distill ? -- 2025-10-10
  245. What and when 7900xtx is boosted? -- 2025-10-10
  246. Modelfile. Do I need these tags PER prompt? -- 2025-10-10
  247. Divining Air Quality With A Cheap Computer Vision Device -- 2025-10-09
  248. Awesome Local LLM Speech-to-Speech Models & Frameworks -- 2025-10-08
  249. FabioSarracino/VibeVoice-Large-Q8 -- 2025-10-08
  250. CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models -- 2025-10-08
  251. Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking -- 2025-10-08
  252. How to make the AI Bot to understand the exact design and App flow -- 2025-10-04
  253. Creating a Full Stack App W/Cloudflare Works and BetterAuth -- 2025-10-04
  254. Comprehension debt: A ticking time bomb of LLM-generated code -- 2025-10-04
  255. deepseek-ai/DeepSeek-V3.2-Exp -- 2025-10-03
  256. moondream/moondream3-preview -- 2025-10-03
  257. [Editorial] https://github.com/emcie-co/parlant -- 2025-10-02
  258. Built a persistent memory system for LLMs - 3 months testing with Claude/Llama -- 2025-10-02
  259. Do I need to run /init on a repo if I already have AGENTS.md? -- 2025-10-02
  260. Inside NVIDIA GPUs: Anatomy of high performance matmul kernels -- 2025-09-29
  261. Bit is all we need: binary normalized neural networks -- 2025-09-29
  262. Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models -- 2025-09-29
  263. This $5,999 RTX PRO 6000 Ebay listing is a scam, right? -- 2025-09-26
  264. Accelerating Local AI on Consumer GPUs: A Hardware-Aware Dynamic Strategy for YOLOv10s -- 2025-09-26
  265. Efficient 4B parameter gpt OSS distillation without the over-censorship -- 2025-09-22
  266. [Project] I created an AI photo organizer that uses Ollama to sort photos, filter duplicates, and write Instagram captions. -- 2025-09-22
  267. Pointer Tagging in C++: The Art of Packing Bits into a Pointer -- 2025-09-22
  268. inclusionAI/Ring-mini-2.0 -- 2025-09-22
  269. Local real-time assistant that remembers convo + drafts a doc -- 2025-09-22
  270. XiaomiMiMo/MiMo-Audio-7B-Instruct -- 2025-09-21
  271. Scaling Self-Supervised Representation Learning for Symbolic Piano Performance -- 2025-09-21
  272. Uncensor Qwen3 models without retraining -- 2025-09-20
  273. Depth upscaling? -- 2025-09-20
  274. Qwen3‑Next‑80B‑A3B‑Instruct (FP8) on Windows 11 WSL2 + vLLM + Docker (Blackwell) -- 2025-09-19
  275. unsloth/Qwen3-Next-80B-A3B-Instruct -- 2025-09-19
  276. The AI-Scraping Free-for-All Is Coming to an End -- 2025-09-18
  277. Visible Watermarking with Gradio -- 2025-09-18
  278. xiaomi-research/q-frame -- 2025-09-17
  279. google/embeddinggemma-300m -- 2025-09-17
  280. 3-month Claude Code Max user review - considering alternatives -- 2025-09-15
  281. Chesars/whatsapp-mcp -- 2025-09-15
  282. Claude’s memory architecture is the opposite of ChatGPT’s -- 2025-09-15
  283. The Internet Will Be More Dead Than Alive Within 3 Years, Trend Shows | All signs point to a future internet where bot-driven interactions far outnumber human ones. -- 2025-09-15
  284. New "speech" mode in Imagine... -- 2025-09-15
  285. I made local RAG, web search, and voice mode on iPhones completely open source, private, and free -- 2025-09-08
  286. jwest33/jam_model_memory -- 2025-09-08
  287. How was your experience with Claude vs Codex? -- 2025-09-08
  288. [Project/Code] Fine-Tuning LLMs on Windows with GRPO + TRL -- 2025-09-07
  289. nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base -- 2025-09-07
  290. huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated -- 2025-09-07
  291. RX570 compatibility issues -- 2025-09-07
  292. Continue.dev setup -- 2025-09-07
  293. Little SSM (RWKV7 7B) state checkpointing demo. -- 2025-09-04
  294. Need advice on how to get VLLM working with 2xR9700 + 2x7900xtx? -- 2025-09-04
  295. pwnfuzz/diffrays -- 2025-09-04
  296. Chromium Hardening Guide -- 2025-09-04
  297. roomkangali/dursgo -- 2025-09-04
  298. QuEST/Quartet authors discuss their work on SOTA 4-bit training optimizations -- 2025-09-01
  299. F-Stack – A network development kit with high performance based on DPDK -- 2025-09-01
  300. An Empirical Study of Knowledge Distillation for Code Understanding Tasks -- 2025-09-01
  301. A Comparative Analysis of Vision Language Models for Scientific Data Interpretation -- 2025-08-31
  302. Sparrow: Custom language model architecture for microcontrollers like the ESP32 -- 2025-08-30
  303. Password only for this week: Welcome to Hugston -- 2025-08-26
  304. Prism MCP Rust SDK v0.1.0 - Production-Grade Model Context Protocol Implementation -- 2025-08-26
  305. Compute Where It Counts: High Quality Sparsely Activated LLMs -- 2025-08-25
  306. moonshotai/Kimi-K2-Base -- 2025-08-25
  307. unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-08-25
  308. BlueLM-2.5-3B Technical Report -- 2025-08-25
  309. lightx2v/Qwen-Image-Lightning -- 2025-08-25
  310. Made Chatterbox TTS a bit faster again on CUDA (155it/s on 3090) -- 2025-08-25
  311. KittenML/KittenTTS -- 2025-08-25
  312. city96/Qwen-Image-gguf -- 2025-08-23
  313. Menlo/Lucy-128k -- 2025-08-23
  314. NVIDIA just accelerated output of OpenAI’s gpt-oss-120B by nearly 2x -- 2025-08-23
  315. COMponent-Aware Pruning for Accelerated Control Tasks in Latent Space Models -- 2025-08-23
  316. Speculative decoding in archgw candidate release 0.4.0. Could use feedback, -- 2025-08-16
  317. Nvidia Tilus: A Tile-Level GPU Kernel Programming Language -- 2025-08-16
  318. SynapseRoute: An Auto-Route Switching Framework on Dual-State Large Language Model -- 2025-08-16
  319. New Tool for Finding Why Your LLM Inference is Slow -- 2025-08-14
  320. I ran OpenAI’s GPT-OSS 20B locally on a 16GB Mac with Ollama — setup, gotchas, and mini demo -- 2025-08-14
  321. GLM 4.5 Air - Optimizing - Vulkan vs. CUDA? -- 2025-08-14
  322. GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface -- 2025-08-13
  323. TextQuests: How Good are LLMs at Text-Based Video Games? -- 2025-08-13
  324. Mitigate Hallucinations by Fine-tuning gpt-oss-120b with One Example -- 2025-08-10
  325. uncensored gpt-oss-20b, bf16 and mxfp4 both available -- 2025-08-10
  326. LGAI-EXAONE/EXAONE-4.0-1.2B -- 2025-08-10
  327. New Open-Source Text-to-Image Model Just Dropped Qwen-Image (20B MMDiT) by Alibaba! -- 2025-08-10
  328. Kitten TTS Web Demo -- 2025-08-09
  329. Show HN: I built a tool to replace capcut audio transcription -- 2025-08-09
  330. Whispers From The Void, Transcribed With AI -- 2025-08-09
  331. The Tape Speed Keyboard -- 2025-08-08
  332. 0.82 um 105 W diode-pumped thulium-doped all silica fiber laser -- 2025-08-06
  333. GLM-4.5 llama.cpp PR is nearing completion -- 2025-08-05
  334. glm-4.5-Air appreciation poist - if you have not done so already, give this model a try -- 2025-08-05
  335. Amazon's AI Coding Revealed a Dirty Little Secret -- 2025-08-02
  336. On the Interaction of Compressibility and Adversarial Robustness -- 2025-08-02
  337. realtime-ai/blastoff-llm -- 2025-08-02
  338. Quantize your own GGUFs the same way as your fav Unsloth Dynamic GGUFs -- 2025-08-01
  339. unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF -- 2025-08-01
  340. On the Predictive Power of Representation Dispersion in Language Models -- 2025-08-01
  341. Wan 2.2 T2V,I2V 14B MoE Models -- 2025-07-31
  342. PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
  343. Ollama + Open WebUI -- is there a way for the same query to run through the same model multiple times (could be 3 times, could be 100 times), then gather all the answers together to summarise/count? -- 2025-07-25
  344. WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding -- 2025-07-25
  345. Semantic chunking using LLMs -- 2025-07-20
  346. Does the OpenWebUi run the sentence transformer models locally? -- 2025-07-20
  347. Dataset for structured (JSON) output? -- 2025-07-19
  348. support for Kimi-K2 has been merged into llama.cpp -- 2025-07-19
  349. t-tech/T-pro-it-2.0 -- 2025-07-19
  350. Madness, the ignorant's question. Would it be possible to lighten an LLM model? -- 2025-07-18
  351. ETH Zurich and EPFL will release a fully open-source LLM developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS). Trained on 60% english/40% non-english, it will be released in 8B and 70B sizes. -- 2025-07-17
  352. Moonshot AI’s open source Kimi K2 outperforms GPT-4 in key benchmarks -- 2025-07-17
  353. Advice Needed: Best way to replace Together API with self-hosted LLM for high-concurrency app -- 2025-07-17
  354. baidu/ERNIE-4.5-0.3B-PT -- 2025-07-17
  355. LiquidAI/LFM2-700M -- 2025-07-17
  356. RekaAI/reka-flash-3.1 · Hugging Face -- 2025-07-17
  357. Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders -- 2025-07-17
  358. T5Gemma: A new collection of encoder-decoder Gemma models- Google Developers Blog -- 2025-07-17
  359. H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data -- 2025-07-17
  360. How I build software quickly -- 2025-07-16
  361. RekaAI/reka-flash-3.1 -- 2025-07-15
  362. What kind of throughput can I expect with Llama 3.1 on a H200? -- 2025-07-15
  363. MIDI-VALLE: Improving Expressive Piano Performance Synthesis Through Neural Codec Language Modelling -- 2025-07-14
  364. Replication of Quantum Factorisation Records with an 8-bit Home Computer [pdf] -- 2025-07-14
  365. Local llms works great! -- 2025-07-12
  366. LiquidAI/LFM2-350M -- 2025-07-12
  367. QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference -- 2025-07-12
  368. Issues with Qwen 3 Embedding models (4B and 0.6B) -- 2025-07-12
  369. [Tool Release] Finetune & Quantize 1–3B LLMs on 8GB RAM using LoFT CLI (TinyLlama + QLoRA + llama.cpp) -- 2025-07-11
  370. Qwen3-8B-BitNet -- 2025-07-11
  371. Megakernel doubles Llama-1B inference speed for batch size 1 -- 2025-07-09
  372. Smallest & best OCR model that can read math & code? -- 2025-07-07
  373. Qwen/WorldPM-72B -- 2025-07-07
  374. Code single file with multiple LLM models -- 2025-07-07
  375. Gen-Verse/CURE -- 2025-07-07
  376. Run Deepseek locally on a 24g GPU: Quantizing on our Giga Computing 6980P Xeon -- 2025-07-01
  377. I built a document workflow system using VLMs: processes complex docs end-to-end (runs locally!!) -- 2025-07-01
  378. Jan Nano + Deepseek R1: Combining Remote Reasoning with Local Models using MCP -- 2025-07-01
  379. Query Classifier for RAG - Save your $$$ and users from irrelevant responses -- 2025-07-01
  380. Building a memory-heavy AI agent — looking for local-first storage & recall solutions -- 2025-07-01
  381. Is there any easy way to get up and running with chatgpt-like capabilities at home? -- 2025-07-01
  382. No recognition of slavic characters. English characters recognized are separate singular characters, not a block of text when using PaddleOCR. -- 2025-07-01
  383. Tired of copy-pasting from ChatGPT for coding? I am building an open-source tool (Athanor) to fix that - Alpha testers/feedback wanted! -- 2025-07-01
  384. VideoGameBench from Princeton: Can vision-language models play 90s video games? -- 2025-07-01
  385. New band surges to 500k listeners on Spotify, but turns out it's AI slop -- 2025-07-01
  386. How we cut CKEditor's bundle size by 40% -- 2025-07-01
  387. My VSCode → AI chat website connector extension just got 3 new features! -- 2025-07-01
  388. 100 Gbps Indoor Access and 4.8 Gbps Outdoor Point-to-Point LiFi Transmission Systems using Laser-based Light Sources -- 2025-06-30
  389. (0,4) brane box models -- 2025-06-30
  390. Cursor 1.0 -- 2025-06-30
  391. Help me design a robust on-prem Llama 3 70B infrastructure for 30 users – Complete hardware/software list wanted -- 2025-06-30
  392. Jan-nano, a 4B model that can outperform 671B on MCP -- 2025-06-30
  393. Models that are good and fast at Long Document Processing -- 2025-06-30
  394. I am making an AI batteries included Web Framework (like Django but for AI) -- 2025-06-30
  395. [New Features & Better] Tabulens: A Vision-LLM Powered PDF Table Extractor -- 2025-06-30
  396. I tested 10 LLMs locally on my MacBook Air M1 (8GB RAM!) – Here's what actually works- -- 2025-06-30
  397. Chatbot without ChatGPT -- 2025-06-30
  398. What's the best way to save and manage different text files for the models to reference? PRD, cursor rules, tech stack, design reference, etc? -- 2025-06-30
  399. Bzip2 crate switches from C to 100% Rust -- 2025-06-30
  400. Litestream: Revamped -- 2025-06-30
  401. Stop using REST for state synchronization (2024) -- 2025-06-30
  402. Announcing `mcp-protocol-sdk`: A New Enterprise grade Rust SDK for AI Tool Calling (Model Context Protocol) -- 2025-06-30
  403. Reinforcement Pre-Training -- 2025-06-29
  404. unsloth/gemma-3n-E4B-it-GGUF -- 2025-06-29
  405. chandar-lab/NeoBERT -- 2025-06-29
  406. tencent/Hunyuan-A13B-Instruct -- 2025-06-27
  407. maya-research/Veena -- 2025-06-27
  408. Meet Mistral Devstral, SOTA open model designed specifically for coding agents -- 2025-06-26
  409. 1.93bit Deepseek R1 0528 beats Claude Sonnet 4 -- 2025-06-26
  410. DeepSeek R1 05/28 performance on five independent benchmarks -- 2025-06-26
  411. Few-Shot Examples: Overfitting / Leakage -- 2025-06-26
  412. Finetune a model to think and use tools -- 2025-06-26
  413. I need help using open web UI with Ollama. Help installing and getting it running win 11 -- 2025-06-26
  414. I built/am building a micro-transformer for learning and experimentation -- 2025-06-26
  415. I shipped more code yesterday with Claude 4 than the last 3 weeks combined -- 2025-06-26
  416. A deep dive into self-improving AI and the Darwin-Gödel Machine -- 2025-06-26
  417. 100% of the zeros of the Riemann zeta-function are on the critical line -- 2025-06-25
  418. 100% of odd hyperelliptic Jacobians have no rational points of small height -- 2025-06-25
  419. deepseek-ai/DualPipe -- 2025-06-23
  420. 100 Particles Quantum Heat Engine: Exploring the Impact of Criticality on Efficiency -- 2025-06-23
  421. 0-Auslander correspondence -- 2025-06-23
  422. Advanced Time Manipulation with GDB -- 2025-06-21
  423. Practical SDR: Getting started with software-defined radio -- 2025-06-21
  424. 1000-10,000 M$_\odot$ Primordial Stars Created the Nitrogen Excess in the Galaxy GS 3073 at $z = 5.55$ -- 2025-06-21
  425. $0^+$ to $2^+$ neutrinoless double-$β$ decay of $^{76}$Ge, $^{82}$Se, $^{130}$Te and $^{136}$Xe in the microscopic interacting boson model} -- 2025-06-21
  426. 0-1 laws for pattern occurrences in phylogenetic trees and networks -- 2025-06-20
  427. 100ps time resolution with thin silicon pixel detectors and a SiGe HBT amplifier -- 2025-06-18
  428. 0-$\pi$ quantum transition in a carbon nanotube Josephson junction: universal phase dependence and orbital degeneracy -- 2025-06-18
  429. openbmb/MiniCPM4-8B -- 2025-06-17
  430. lym00/Wan2.1-T2V-1.3B-Self-Forcing-VACE-Addon-Experiment -- 2025-06-17
  431. DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. -- 2025-06-17
  432. ubergarm/DeepSeek-R1-0528-GGUF -- 2025-06-17
  433. LLM training on RTX 5090 -- 2025-06-17
  434. [DEMO] I created a coding agent that can do dynamic, runtime debugging. -- 2025-06-17
  435. Is anyone productively using Aider and Ollama together? -- 2025-06-17
  436. For everyone who's still confused by Attention... I made this spreadsheet just for you(FREE) -- 2025-06-17
  437. What setup/model do you use and what’s your monthly spend? -- 2025-06-17
  438. Xiaomi released an updated 7B reasoning model and VLM version claiming SOTA for their size -- 2025-06-17
  439. UPDATE: Inference needs nontrivial amount of PCIe bandwidth (8x RTX 3090 rig, tensor parallelism) -- 2025-06-16
  440. IQ1_Smol_Boi -- 2025-06-16
  441. Qwen releases official MLX quants for Qwen3 models in 4 quantization levels: 4bit, 6bit, 8bit, and BF16 -- 2025-06-16
  442. Seeking Help Setting Up a Local LLM Assistant for TTRPG Worldbuilding + RAG on Windows 11 -- 2025-06-16
  443. New VS Code Pair Programming Extension, Need Help Testing -- 2025-06-16
  444. Claude-Trace -- 2025-06-16
  445. Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training -- 2025-06-16
  446. best fine tuned local LLM for Github Copilot Agent specificaly -- 2025-06-16
  447. A Simulation in C++ of Joseph Weizenbaum's 1966 Eliza -- 2025-06-16
  448. 0D-2D Heterostructure for making very Large Quantum Registers using itinerant Bose-Einstein Condensate of Excitons -- 2025-06-16
  449. 100-mJ class, sub-two-cycle, carrier-envelope phase-stable dual-chirped optical parametric amplification -- 2025-06-16
  450. [Update] Rensa: added full CMinHash + OptDensMinHash support (fast MinHash in Rust for dataset deduplication / LLM fine-tuning) -- 2025-06-15
  451. Open Source Unsiloed AI Chunker (EF2024) -- 2025-06-15
  452. ether0 - Mistral 24B with RL on several molecular design tasks in chemistry -- 2025-06-15
  453. Need selfhosted AI to generate better bash scripts and ansible playbooks -- 2025-06-15
  454. How do I finetune Devstral with vision support? -- 2025-06-15
  455. What's the best approach for including niche dependency source files and associated documentation reference material in context? -- 2025-06-15
  456. Airlines Don't Want You to Know They Sold Your Flight Data to DHS -- 2025-06-15
  457. John Deere Must Face Second Right to Repair Lawsuit -- 2025-06-15
  458. What vector database and embeddings are y'all using -- 2025-06-15
  459. Turn based two model critique for rounds to refine answer - any examples or FOSS projects? -- 2025-06-15
  460. mistralai/Magistral-Small-2506_gguf -- 2025-06-14
  461. Ruminate: From All-or-Nothing to Just-Right Reasoning in LLMs -- 2025-06-14
  462. [update] Restructured repo under rvn-tools — modular CLI for LLM formats -- 2025-06-14
  463. Testing Quant Quality for Shisa V2 405B -- 2025-06-14
  464. Old model, new implementation -- 2025-06-14
  465. Ollama vs Llamacpp: Different output for same model -- 2025-06-14
  466. How to improve my ViT model -- 2025-06-14
  467. From RPC to transactions and durable executions -- 2025-06-14
  468. Flattening Rust’s learning curve -- 2025-06-14
  469. Async from scratch 3: Pinned against the wall -- 2025-06-14
  470. How to get the most out of my AMD 7900XT? -- 2025-06-14
  471. Faulty 120W charger analysis (Anker GAN Prime) [video] -- 2025-06-13
  472. unsloth/Magistral-Small-2506-GGUF -- 2025-06-12
  473. mistralai/Magistral-Small-2506 -- 2025-06-12
  474. rednote-hilab/dots.llm1.base -- 2025-06-12
  475. [Tool] rvn-convert: OSS Rust-based SafeTensors to GGUF v3 converter (single-shard, fast, no Python) -- 2025-06-12
  476. GuidedQuant: Boost LLM layer-wise PTQ methods using the end loss guidance (Qwen3, Gemma3, Llama3.3 / 2~4bit Quantization) -- 2025-06-12
  477. I built a memory MCP that understands you (so Sam Altman can't). -- 2025-06-12
  478. Built an open source desktop app to easily play with local LLMs and MCP -- 2025-06-12
  479. mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) by ngxson · Pull Request #13784 · ggml-org/llama.cpp -- 2025-06-12
  480. i got tired of the errors, so automated debugging using Ollama -- 2025-06-12
  481. Ablating Gemma 3 27B variants with synthetic data from Sonnet 4 (Few-shot vs LoRA) -- 2025-06-12
  482. The LLM Gateway gets a major upgrade: becomes a data-plane for Agents. -- 2025-06-12
  483. Introducing stronger dependencies on systemd -- 2025-06-12
  484. How we decreased GitLab repo backup times from 48 hours to 41 minutes -- 2025-06-12
  485. The Quest for 100k - LLAMA.CPP Setting for a Noobie -- 2025-06-12
  486. Clipjacking: Hacked by copying text – Clickjacking but better -- 2025-06-11
  487. 0/1 Deep Neural Networks via Block Coordinate Descent -- 2025-06-10
  488. turbulentdrom/sing-srs-converter -- 2025-06-10
  489. abi/screenshot-to-code -- 2025-06-10
  490. Qwen/Qwen3-Embedding-4B -- 2025-06-10
  491. 100 Gbps Quantum-safe IPsec VPN Tunnels over 46 km Deployed Fiber -- 2025-06-09
  492. Qwen/Qwen3-Embedding-0.6B -- 2025-06-08
  493. 0-$π$ qubit in one Josephson junction -- 2025-06-07
  494. 100-kT Magnetic field generation using paisley targets by femtosecond laser-plasma interactions -- 2025-06-07
  495. 100 GHz Micrometer compact broadband Monolithic ITO Mach Zehnder Interferometer Modulator enabling 3500 times higher Packing Density -- 2025-06-06
  496. 0-$\pi$ phase-controllable $thermal$ Josephson junction -- 2025-06-06
  497. Precomputing Transparency Order in 3D -- 2025-06-06
  498. ban6cat6/aparecium -- 2025-06-03
  499. 0-Gaps on 3D Digital Curves -- 2025-06-03
  500. Reports of Deno's Demise Have Been Greatly Exaggerated -- 2025-06-02
  501. Comparing Parallel Functional Array Languages: Programming and Performance -- 2025-06-02
  502. What Every Programmer Should Know About Enumerative Combinatorics -- 2025-06-02
  503. DuckLake: SQL as a Lakehouse Format -- 2025-05-31
  504. 1000x Faster Camera and Machine Vision with Ordinary Devices -- 2025-05-31
  505. 0.75 Gbit/s high-speed classical key distribution with mode-shift keying chaos synchronization of Fabry-Perot lasers -- 2025-05-31
  506. EdinburghNLP/MMLongBench -- 2025-05-29
  507. Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust -- 2025-05-29
  508. unsloth/DeepSeek-R1-0528-GGUF -- 2025-05-29
  509. QuantStack/Wan2.1-VACE-14B-GGUF -- 2025-05-29
  510. 100,000 frames-per-second compressive imaging with a conventional rolling-shutter camera by random point-spread-function engineering -- 2025-05-29
  511. 1,000-Fold Enhancement of Light-Induced Magnetism in Plasmonic Au Nanoparticles -- 2025-05-29
  512. nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 -- 2025-05-28
  513. Tongyi-Zhiwen/QwenLong-L1-32B -- 2025-05-28
  514. PKU-DS-LAB/FairyR1-32B -- 2025-05-28
  515. google/medgemma-4b-pt -- 2025-05-28