Benchmarks & Evaluation

Leaderboards, evaluation frameworks, model comparison

352 articles across 117 editions

Articles

  1. Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! -- 2026-05-20
  2. Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers -- 2026-05-20
  3. Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage -- 2026-05-20
  4. Sipeed's K3 RISC-V SBCs can run 30B-parameter LLMs 60 TOPS (INT4), Supports BF16/FP16/INT4 -- 2026-05-20
  5. club-5060ti: practical RTX 5060 Ti local LLM notes and configs -- 2026-05-20
  6. [Editorial] -- 2026-05-18
  7. [Editorial] -- 2026-05-18
  8. [Editorial] -- 2026-05-18
  9. [Editorial] -- 2026-05-18
  10. [Editorial] Synaptic-Tuner — LLM Tuning Framework -- 2026-05-11
  11. [Editorial] Video — AI Tools & Frameworks -- 2026-05-11
  12. A C++ port of Echo-TTS -- 2026-05-11
  13. [Editorial] -- 2026-05-07
  14. ProgramBench: Can we really rebuild huge binaries from scratch? (doesn't look like it) -- 2026-05-07
  15. Adding Benchmaxxer Repellant to the Open ASR Leaderboard -- 2026-05-07
  16. AI Evals Are Becoming the New Compute Bottleneck -- 2026-05-04
  17. Function Calling Harness 2: Schema-Driven CoT Compliance from 9.91% to 100% -- 2026-05-04
  18. Microsoft and OpenAI end their exclusive and revenue-sharing deal -- 2026-05-01
  19. [Editorial] Video: AI Development Insights -- 2026-05-01
  20. Talkie: a 13B vintage language model from 1930 -- 2026-05-01
  21. SWE-bench Verified no longer measures frontier coding capabilities -- 2026-04-29
  22. Opus 4.7: Are these first signs of model collapse? -- 2026-04-29
  23. Ternary Bonsai: Top Intelligence at 1.58 Bits -- 2026-04-22
  24. Personal Eval: Gemma4 26B MoE vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared -- 2026-04-22
  25. NVIDIA Nemotron-3-Super-120B-A12B-FP8 -- 2026-04-22
  26. High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction -- 2026-04-21
  27. Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals -- 2026-04-21
  28. Building a Fast Multilingual OCR Model with Synthetic Data -- 2026-04-21
  29. Physical Simulator In-the-Loop Video Generation -- 2026-04-21
  30. LLM Novice Uplift on Dual-Use Biology Tasks — 4x Accuracy Boost Bypasses Safeguards -- 2026-04-10
  31. [Editorial] Your AI Is Developing Capabilities Nobody Tested -- 2026-04-10
  32. 1-bit llms on device?! -- 2026-04-07
  33. Running SmolLM2-360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp -- 2026-04-07
  34. Built my 10x NVidia V100 AI Server - 320gb vram - vLLM Testing Linux Headless -- 2026-04-07
  35. [Editorial] arxiv:2603.15569 -- 2026-03-30
  36. TinyLoRA: LoRA training works at just 13 parameters -- 2026-03-30
  37. KV rotation PR: q8 quants tank performance on AIME25, recovered with rotation -- 2026-03-30
  38. [Editorial] AI ASIC for LLMs -- 2026-03-30
  39. [Editorial] Heretic -- 2026-03-30
  40. mlx-snn: Spiking Neural Network library for Apple MLX -- 2026-03-30
  41. ARC-AGI-3 -- 2026-03-27
  42. LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories -- 2026-03-27
  43. [Editorial] IAWG — AI Governance Working Group -- 2026-03-18
  44. Antrophic CEO says 50% entry-level white-collar jobs will be eradicated within 3 years -- 2026-03-18
  45. GPT-5.4 -- 2026-03-09
  46. [Editorial] -- 2026-03-09
  47. How do you automate end to end testing without coding when you vibe coded the whole app -- 2026-03-03
  48. [Editorial] Visual Learning for AI Coding -- 2026-03-03
  49. darrenburns/dv -- 2026-03-03
  50. ReasonDB – open-source document DB where the LLM navigates a tree instead of vector search (RAG alternative) -- 2026-03-03
  51. [Editorial] Claude Code Nano-Banana Plugin -- 2026-03-03
  52. Mercury 2: Fast reasoning LLM powered by diffusion -- 2026-02-26
  53. I Benchmarked Opus 4.6 vs Sonnet 4.6 on agentic PR review and browser QA the results weren't what I expected -- 2026-02-26
  54. [Editorial] Bullshit meter :) -- 2026-02-26
  55. The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. -- 2026-02-25
  56. Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. -- 2026-02-25
  57. ChatGPT isn't the only chatbot pulling answers from Elon Musk's Grokipedia -- 2026-02-25
  58. [Editorial] Benchmarking LLMs for Voice Agent Use Cases -- 2026-02-21
  59. Claude Opus 4.6 Surges Past Forecasts on METR's 50% Time-Horizon Benchmark with Exponential Gains -- 2026-02-21
  60. [Editorial] Unsloth: MiniMax M2.5 Fine-Tuning Guide -- 2026-02-21
  61. [Editorial] When everyone can build software, who learns well? -- 2026-02-19
  62. Sonnet 4.6 feels like Opus 4.5 at Sonnet pricing -- 2026-02-19
  63. Anthropic Raises $30,000,000,000 As Run-Rate Revenue Grew 10x Annually Over Three Years -- 2026-02-19
  64. REASONING AUGMENTED RETRIEVAL (RAR) is the production-grade successor to single-pass RAG -- 2026-02-19
  65. [Editorial] Antigravity Awesome Skills -- 2026-02-18
  66. I built an MCP that connects your agent to 8,000+ skills with zero setup -- 2026-02-18
  67. Is the Nvidia T4 actually viable for 70B (EXL2) daily driving, or is it just pure cope compared to dual 3090s? -- 2026-02-13
  68. Open weight kimi k2.5 overtakes opus 4.5 non thinking on arena -- 2026-02-13
  69. When did we go from 400k to 256k? -- 2026-02-13
  70. [Editorial] https://github.com/d-Rickyy-b/certstream-server-go?tab=readme-ov-file -- 2026-02-13
  71. [Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
  72. [Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
  73. [Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
  74. Pros/Cons and use case for bypassing permissions -- 2026-02-12
  75. [Editorial] https://www.linkedin.com/posts/dragan-spiridonov_agentic-qe-competitive-landscape-2026-activity-7427362099175211010-pd1J -- 2026-02-11
  76. jmuncor/sherlock -- 2026-02-11
  77. [Editorial] https://github.com/mitkox/megacode -- 2026-02-10
  78. [Editorial] https://www.marktechpost.com/2026/02/07/google-ai-introduces-paperbanana-an-agentic-framework-that-automates-publication-ready-methodology-diagrams-and-statistical-plots -- 2026-02-10
  79. [Editorial] https://www.linkedin.com/posts/ryansmith108_frank-lee-amplitude-skills-are-now-indexed-activity-7426777024284893184-8eTf -- 2026-02-10
  80. [Editorial] https://arxiv.org/abs/2602.04118 -- 2026-02-10
  81. Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
  82. BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
  83. ykushch/ask -- 2026-02-05
  84. zaolin/vanguard -- 2026-02-05
  85. Tadpole – A modular and extensible DSL built for web scraping -- 2026-02-05
  86. Coding assistants are solving the wrong problem -- 2026-02-05
  87. How Vibe Coding is Killing Open Source -- 2026-02-05
  88. [Editorial] https://github.com/mondweep/vibe-cast/tree/claude/claude-code-v3-skill-KucJF/claude-code-v3-qe-skill -- 2026-02-04
  89. [Editorial] https://forge-quality.dev/articles/case-of-passing-tests-investigation -- 2026-02-02
  90. MultiX0/last-archive -- 2026-01-28
  91. roborev-dev/roborev -- 2026-01-28
  92. Why I Stopped Using Nbdev -- 2026-01-21
  93. VectorDBZ update: Pinecone, pgvector, custom embeddings, search stats -- 2026-01-19
  94. Prompt tool I built/use with Ollama daily - render prompt variations without worrying about text files -- 2026-01-19
  95. Need people to get excited part 2 -- 2026-01-19
  96. Binary Fuse Filters: Fast and Smaller Than XOR Filters -- 2026-01-19
  97. Read_once(), Write_once(), but Not for Rust -- 2026-01-19
  98. Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference -- 2026-01-19
  99. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_last-year-i-spent-most-of-my-time-reviewing-activity-7414597548050665472-dYjg -- 2026-01-08
  100. Anyone tried IQuest-Coder-V1 yet? The 40B numbers look wild -- 2026-01-06
  101. open-thoughts/OpenThinker-Agent-v1 -- 2026-01-06
  102. zrougamed/orion-belt -- 2026-01-06
  103. Leo-Mu/montecarlo-ip-searcher -- 2026-01-06
  104. wkhtmltopdf - Convert HTML to PDF Using QtWebKit (2021) -- 2026-01-06
  105. zakirkun/guardian-cli -- 2026-01-05
  106. orneryd/NornicDB -- 2026-01-02
  107. Build a Deep Learning Library -- 2026-01-02
  108. Liquid CO2 For Grid Scale Energy Storage Isn’t Just Hot Air -- 2026-01-02
  109. How llama.cpp implements 2.9x faster top-k sampling with bucket sort -- 2025-12-31
  110. Built an offline-first vector database (v0.2.0) looking for real-world feedback -- 2025-12-31
  111. Linux 7.0 Expected to Bring IO_uring Iopoll Polling Improvements -- 2025-12-31
  112. rix4uni/subhijack -- 2025-12-30
  113. Worktrunk – CLI for Git worktree management -- 2025-12-30
  114. [Editorial] https://github.com/JohannesLks/CVE-2025-14558 -- 2025-12-29
  115. batterdaysahead/cipher0 -- 2025-12-29
  116. MongoBleed -- 2025-12-29
  117. dsl-learn/cutile-learn -- 2025-12-18
  118. Errors in Rust: A Deep Dive -- 2025-12-18
  119. Plug Into USB, Read Hostname and IP Address -- 2025-12-18
  120. Gouryella/drip -- 2025-12-17
  121. Koko-boya/Comfyui-Z-Image-Utilities -- 2025-12-17
  122. Show HN: Generate Passwords from Regex Constraints -- 2025-12-17
  123. Generating synthetic test data for LLM applications (our approach) -- 2025-12-12
  124. Benchmarked A100 vs H100 local storage for Multi-GPU loading. The Gen4 bottleneck is brutal for cold starts. -- 2025-12-11
  125. [Toolkit] TinyLlama Fine-Tuning + RAG Lab (Full FT / LoRA / QLoRA | T4-friendly | Unified pipeline) -- 2025-12-04
  126. Introducing Lynkr — an open-source Claude-style AI coding proxy built specifically for Databricks model endpoints 🚀 -- 2025-12-04
  127. AI Runner v5.0.5 -- 2025-12-04
  128. Which local model for 3090 5069 TI combo -- 2025-12-04
  129. I built a macOS app to monitor all my Claude Code sessions at once -- 2025-12-04
  130. nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
  131. Llamacpp Parameters Tuning -- 2025-12-02
  132. 4xRTX 4000 Pro Blackwell vs 1x6000 RTX Pro -- 2025-12-02
  133. ardanlabs/kronk -- 2025-12-02
  134. Zig Book – An open, technical and introductory book for Zig -- 2025-12-02
  135. Arcee Trinity Mini: US-Trained Moe Model -- 2025-12-02
  136. Build Your Own Glasshole Detector -- 2025-12-02
  137. Askimo: Open source of Ollama native desktop client -- 2025-12-01
  138. Created 24 Claude Code learning units (beginner → power user) - Free on GitHub -- 2025-12-01
  139. You can now do FP8 reinforcement learning locally! (<5GB VRAM) -- 2025-12-01
  140. A Repository with 44 Years of Unix Evolution -- 2025-11-28
  141. Strix Halo batching with tensor parallel and pipeline parallel using vllm benchmarked -- 2025-11-28
  142. RTX 3090 vs RX 7900 with ROCm, also Vulcan -- 2025-11-26
  143. moonshotai/Kimi-K2-Thinking -- 2025-11-26
  144. Ollama Not Using GPU on RTX 5070 Ti (Blackwell) -- 2025-11-25
  145. PCIE Bifurcation - More than 4 GPUs on a consumer motherboard -- 2025-11-18
  146. Qual a melhor GPU para o llama 3(.1 ou .3) -- 2025-11-18
  147. PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
  148. Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
  149. Real-Time BART in a Box Smaller Than Your Coffee Mug -- 2025-11-17
  150. etalazz/vsa -- 2025-11-13
  151. Pi Compute Modules Make for Compact Cluster -- 2025-11-13
  152. antarys-ai/antarys -- 2025-11-11
  153. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_a-month-ago-gadi-evron-and-i-set-about-building-ugcPost-7393643597729845248-TSTD -- 2025-11-11
  154. Breakdown of New RunC Vulnerabilities -- 2025-11-11
  155. When Your Hash Becomes a String: Hunting Ruby's Million-to-One Memory Bug -- 2025-11-07
  156. Maude 3 Manual -- 2025-11-07
  157. [Editorial] Frequently wrong, but never in doubt’ -- 2025-11-05
  158. The Zero Freeze Formula: Teaching Local LLaMA Real Physics Through Python (SU(3) Mass Gap Simulation) to solve the Yang–Mills Mass Gap -- 2025-11-05
  159. Audio Sound Capture Project Needs Help -- 2025-11-05
  160. [Editorial] https://blog.peerllm.com/2025/11/02/announcing-v0.7.6.html -- 2025-11-04
  161. Faster llama.cpp ROCm performance for AMD RDNA3 (tested on Strix Halo/Ryzen AI Max 395) -- 2025-11-04
  162. KTransformers Open Source New Era: Local Fine-tuning of Kimi K2 and DeepSeek V3 -- 2025-11-04
  163. FlashPack: High-throughput tensor loading for PyTorch -- 2025-11-01
  164. M5 Neural Accelerator benchmark results from Llama.cpp -- 2025-11-01
  165. Kafka is Fast – I'll use Postgres -- 2025-11-01
  166. ZOZO's Contact Solver for physics-based simulations -- 2025-11-01
  167. Need advice on building a GPU-based render/Al compute setup: Unsure about hardware direction -- 2025-11-01
  168. [Editorial] https://pivot-to-ai.com/2025/10/15/ai-is-not-popular-and-ai-users-are-unpleasant-asshats/ -- 2025-10-30
  169. [Editorial] Developer machine part of attack chain -- 2025-10-29
  170. DGX SPARK Compiled llama.cpp Benchmarks Compared to M4 MAX (non-MLX) -- 2025-10-21
  171. perplexityai/search_evals -- 2025-10-21
  172. Hetzner: The Simple Cloud just got more flexible and more affordable -- 2025-10-21
  173. A new, super simple LLM benchmark for testing changes across models, quants, parameters, samplers, engines, etc -- 2025-10-21
  174. Significant speedup for local models -- 2025-10-20
  175. Cursor tricking paid users with fake Claude Sonnet 4.5 -- 2025-10-20
  176. inclusionAI/Ring-1T -- 2025-10-20
  177. 1r0BIT/TaskHound -- 2025-10-18
  178. armai92/goauth -- 2025-10-18
  179. Chinese gang used ArcGIS as a backdoor for a year – and no one noticed -- 2025-10-18
  180. We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source -- 2025-10-17
  181. Comparing Popular AI Evaluation Platforms for 2025 -- 2025-10-17
  182. State of AI Report 2025 -- 2025-10-17
  183. Signed Backdoor Hiding in Plain Sight on Framework Devices -- 2025-10-15
  184. Three ways formally verified code can go wrong in practice -- 2025-10-15
  185. Jeep pushed software update that bricked all 2024 Wrangler 4xe models -- 2025-10-15
  186. junron/agar -- 2025-10-15
  187. A modern approach to preventing CSRF in Go -- 2025-10-15
  188. Stop flexing Pass@N — show Pass-all-N -- 2025-10-11
  189. Architecting a project for optimal AI coding, any tips? -- 2025-10-11
  190. Basekick-Labs/arc -- 2025-10-11
  191. ServiceNow-AI/Apriel-1.5-15b-Thinker -- 2025-10-11
  192. meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
  193. Did anyone try out GLM-4.5-Air-GLM-4.6-Distill ? -- 2025-10-10
  194. Thank you Anthropic & this community! Our little side project just hit 1M visits and even made it on National TV! -- 2025-10-10
  195. Sneak Preview: Ollama Bench -- 2025-10-08
  196. When Curl Works but IntelliJ Doesn't: The Ollama Connection Mystery -- 2025-10-08
  197. Local Open Deep Research with Offline Wikipedia Search Source -- 2025-10-07
  198. Ollama drops MI50 support -- 2025-10-07
  199. CoexistAI Now Supports Docker Setup, Also now you can turn any text into Podcasts and Speech Easily -- 2025-10-07
  200. MCP_File_Generation_Tool - v0.6.0 Update! -- 2025-10-07
  201. How do I help Codex critique my ideas rather than just go along with it everytime? -- 2025-10-06
  202. Plan with Codex, code with Sonnet 4.5. What's your simple workflow here? -- 2025-10-06
  203. aminofox/zentrox -- 2025-10-06
  204. Linus Torvalds Vents over "Completely Crazy Rust Format Checking" -- 2025-10-06
  205. vllm setup for nvidia (can use llama) -- 2025-10-05
  206. Full-fine tuning doesn't require much vRAM with gradient checkpointing... -- 2025-10-05
  207. Qwen/Qwen3-Omni-30B-A3B-Thinking -- 2025-10-05
  208. inclusionAI/Ring-mini-linear-2.0 -- 2025-10-05
  209. llama.cpp: Quantizing from bf16 vs f16 -- 2025-10-05
  210. GLM 4.6 is nice -- 2025-10-04
  211. NVFP4 or MXFP4 MOE on sm120 (RTX 5900 RTX 6000 PRO) -- 2025-10-04
  212. K2-Think 32B - Reasoning model from UAE -- 2025-10-03
  213. MoonshotAI/checkpoint-engine -- 2025-10-03
  214. Whither the Chip Shortage? -- 2025-10-02
  215. A tiny receipt per AI run: κ (stress), Δhol (drift), and guards—in plain JSON. -- 2025-10-02
  216. Microsoft Agent Framework (Preview): Making AI Agents Simple for Every Developer -- 2025-10-02
  217. How bad to have RTX Pro 6000 run at PCIE x8? -- 2025-09-24
  218. A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code -- 2025-09-24
  219. SWE-Bench Pro -- 2025-09-23
  220. Investigating Training Data Detection in AI Coders -- 2025-09-23
  221. Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B -- 2025-09-23
  222. Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
  223. facebook/MobileLLM-R1-950M -- 2025-09-20
  224. OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
  225. KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
  226. Definitive proof openai/gpt-oss-20b is dumb as hell -- 2025-09-19
  227. Qwen3‑Next‑80B‑A3B‑Instruct (FP8) on Windows 11 WSL2 + vLLM + Docker (Blackwell) -- 2025-09-19
  228. Free 10%+ Speedup for CPU/Hybrid Inference on Intel CPUs with Efficiency Cores -- 2025-09-17
  229. PSA/RFC: KV Cache quantization forces excess processing onto CPU in llama.cpp -- 2025-09-15
  230. native tool calling support for DeepSeek V3.1 just merged in llama.cpp -- 2025-09-15
  231. model : add grok-2 support by CISC · Pull Request #15539 · ggml-org/llama.cpp -- 2025-09-15
  232. [Editorial] Defeating Nondeterminism in LLM Inference -- 2025-09-14
  233. Nvidia Unveils Rubin CPX Amidst Chart-Topping Blackwell Ultra MLPerf Results -- 2025-09-14
  234. Repair-R1: Better Test Before Repair -- 2025-09-14
  235. Jupyter Agents: training LLMs to reason with notebooks -- 2025-09-14
  236. Intel Files Patent for "Software Defined Super Cores" -- 2025-09-04
  237. I tried almost every tts model on my ryzen 7 5000 series 16gb ram rtx 3060 laptop 6-8GB Vram -- 2025-09-02
  238. devnen/Kitten-TTS-Server -- 2025-09-02
  239. internlm/Intern-S1-mini -- 2025-09-02
  240. stepfun-ai/Step-Audio-2-mini -- 2025-09-02
  241. QuEST/Quartet authors discuss their work on SOTA 4-bit training optimizations -- 2025-09-01
  242. F-Stack – A network development kit with high performance based on DPDK -- 2025-09-01
  243. An Empirical Study of Knowledge Distillation for Code Understanding Tasks -- 2025-09-01
  244. A Comparative Analysis of Vision Language Models for Scientific Data Interpretation -- 2025-08-31
  245. CaddyManager 0.0.1 – Web UI for managing Caddy servers -- 2025-08-30
  246. [Editorial] AI interfaces for future -- 2025-08-29
  247. I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
  248. Updates to Consumer Terms and Privacy Policy -- 2025-08-29
  249. LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
  250. Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
  251. unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
  252. Deepseek V3.1 benchmarks released -- 2025-08-25
  253. Is openrouters tokens per second reading super bugged? -- 2025-08-22
  254. It’s a Pi, But it’s not Quite a Raspberry Pi -- 2025-08-22
  255. nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 -- 2025-08-21
  256. Mistral 7B fine tuning training loss stagnant after adding more fine tuning prompts -- 2025-08-20
  257. Detecting Hallucinations in LLM Function Calling with Entropy (Part 2) -- 2025-08-20
  258. Anyone have the deets on ROCM 7.0's 3x perf claims? -- 2025-08-19
  259. Rust in 2025: Targeting foundational software -- 2025-08-19
  260. I built a small cli tool to execute agentic workflows -- 2025-08-19
  261. AvatarNova - Local AI companion -- 2025-08-19
  262. 🤖 Built an AI-powered DOCX viewer that extracts & analyzes images with Ollama! -- 2025-08-19
  263. Davincible/claude-code-open -- 2025-08-19
  264. OpenVINO GenAI 2025.2 adds a GGUF reader (preview) -- 2025-08-18
  265. CLI Agent that Supports Multiple Models? -- 2025-08-18
  266. Is there a standard oci image format for models? -- 2025-08-18
  267. moonshotai/Kimi-K2-Instruct -- 2025-08-17
  268. KittenML/kitten-tts-nano-0.1 -- 2025-08-17
  269. ilkerzgi/Overlay-Kontext-Dev-LoRA -- 2025-08-17
  270. JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 -- 2025-08-17
  271. PSA: Don't waste time trying Gemma 3 27B on V100s - it's architecturally impossible -- 2025-08-16
  272. People with MacBook Pro with 36gb of memory, which models you are running for coding? -- 2025-08-16
  273. GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface -- 2025-08-13
  274. TextQuests: How Good are LLMs at Text-Based Video Games? -- 2025-08-13
  275. Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face -- 2025-08-09
  276. [Editorial] https://unhypedai.substack.com/p/unhyped-ai-week-4-digest -- 2025-08-04
  277. 100+ AI Benchmarks list -- 2025-08-04
  278. google/langextract -- 2025-08-04
  279. Chain-GPT/Solidity-LLM -- 2025-07-25
  280. Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
  281. What kind of rig would you build with a 5k budget for local LLM? -- 2025-07-16
  282. What is your "perfect" £10,000 for Local LLM, Gaming, plex with the following conditional and context. -- 2025-07-16
  283. How to use Claude code -- 2025-07-16
  284. unsloth/Kimi-K2-Instruct-GGUF -- 2025-07-16
  285. moonshotai/Kimi-K2-Base -- 2025-07-16
  286. It's been a while, I'm out of date, suggest me a model -- 2025-07-16
  287. i need the best local llm i can run on my gaming pc -- 2025-07-16
  288. Import of chatgbt Export Zip File with Images of entire previous chats -- 2025-07-14
  289. Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
  290. Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
  291. Deep Research with local LLM and local documents -- 2025-07-10
  292. WikipeQA : An evaluation dataset for both web-browsing agents and vector DB RAG systems -- 2025-07-10
  293. Looking for advice. -- 2025-07-10
  294. OpenAI to release open-source model this summer - everything we know so far -- 2025-07-09
  295. Accelerating Docker Builds by Halving EC2 Boot Time -- 2025-06-30
  296. Show HN: Inspect and extract files from MSI installers directly in your browser -- 2025-06-28
  297. Meet Mistral Devstral, SOTA open model designed specifically for coding agents -- 2025-06-26
  298. 1.93bit Deepseek R1 0528 beats Claude Sonnet 4 -- 2025-06-26
  299. DeepSeek R1 05/28 performance on five independent benchmarks -- 2025-06-26
  300. Few-Shot Examples: Overfitting / Leakage -- 2025-06-26
  301. Finetune a model to think and use tools -- 2025-06-26
  302. I need help using open web UI with Ollama. Help installing and getting it running win 11 -- 2025-06-26
  303. I built/am building a micro-transformer for learning and experimentation -- 2025-06-26
  304. I shipped more code yesterday with Claude 4 than the last 3 weeks combined -- 2025-06-26
  305. A deep dive into self-improving AI and the Darwin-Gödel Machine -- 2025-06-26
  306. Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
  307. Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
  308. Chonkie update. -- 2025-06-25
  309. Memory Layer Compatible with Local Llama -- 2025-06-25
  310. Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
  311. How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
  312. After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
  313. The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
  314. Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
  315. Show HN: FaynoSync Self-Hosted API for Automatic App Updates -- 2025-06-23
  316. identicallead/mse6 -- 2025-06-22
  317. GCC 13.4 Released with 129 additional bug fixes -- 2025-06-22
  318. Databricks acquires Neon -- 2025-06-22
  319. Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory -- 2025-06-20
  320. Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format -- 2025-06-20
  321. DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. -- 2025-06-17
  322. ubergarm/DeepSeek-R1-0528-GGUF -- 2025-06-17
  323. LLM training on RTX 5090 -- 2025-06-17
  324. [DEMO] I created a coding agent that can do dynamic, runtime debugging. -- 2025-06-17
  325. Is anyone productively using Aider and Ollama together? -- 2025-06-17
  326. For everyone who's still confused by Attention... I made this spreadsheet just for you(FREE) -- 2025-06-17
  327. What setup/model do you use and what’s your monthly spend? -- 2025-06-17
  328. Xiaomi released an updated 7B reasoning model and VLM version claiming SOTA for their size -- 2025-06-17
  329. hhftechnology/middleware-manager -- 2025-06-17
  330. Show HN: McWig – A modal, Vim-like text editor written in Go -- 2025-06-17
  331. The Unreliability of LLMs and What Lies Ahead -- 2025-06-10
  332. 007: Democratically Finding The Cause of Packet Drops -- 2025-06-08
  333. langtalks/swe-agent -- 2025-06-08
  334. wey-gu/py-pglite -- 2025-06-08
  335. 0-$π$ qubit in one Josephson junction -- 2025-06-07
  336. 100-kT Magnetic field generation using paisley targets by femtosecond laser-plasma interactions -- 2025-06-07
  337. fileshare-go/fileshare -- 2025-06-04
  338. Rust Coreutils 0.1.0 Release -- 2025-06-04
  339. Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing -- 2025-06-04
  340. 100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems -- 2025-06-04
  341. 100+ Metrics for Software Startups - A Multi-Vocal Literature Review -- 2025-06-04
  342. Building a plug-and-play vector store for any data stream (text, audio, video, etc.)—searchable by your LLM via MCP -- 2025-05-29
  343. Building a real-world LLM agent with open-source models—structure > prompt engineering -- 2025-05-29
  344. New LocalLLM Hardware complete -- 2025-05-29
  345. Parameter-Efficient Fine-Tuning (PEFT) Explained -- 2025-05-29
  346. LLM help for recovering deleted data? -- 2025-05-29
  347. AI Runner v4.10.0 Release Notes -- 2025-05-29
  348. Unpopular opinion: RAG is actively hurting your coding agents -- 2025-05-29
  349. Teal – A statically-typed dialect of Lua -- 2025-05-29
  350. I think it's time to give Nix a chance -- 2025-05-29
  351. deepseek-ai/DeepSeek-R1-0528 -- 2025-05-29
  352. AM5 or TRX4 for local LLMs? -- 2025-05-29