Benchmarks & Evaluation

Leaderboards, evaluation frameworks, model comparison

295 articles across 99 editions

Articles

  1. [Editorial] Benchmarking LLMs for Voice Agent Use Cases -- 2026-02-21
  2. Claude Opus 4.6 Surges Past Forecasts on METR's 50% Time-Horizon Benchmark with Exponential Gains -- 2026-02-21
  3. [Editorial] Unsloth: MiniMax M2.5 Fine-Tuning Guide -- 2026-02-21
  4. [Editorial] When everyone can build software, who learns well? -- 2026-02-19
  5. Sonnet 4.6 feels like Opus 4.5 at Sonnet pricing -- 2026-02-19
  6. Anthropic Raises $30,000,000,000 As Run-Rate Revenue Grew 10x Annually Over Three Years -- 2026-02-19
  7. REASONING AUGMENTED RETRIEVAL (RAR) is the production-grade successor to single-pass RAG -- 2026-02-19
  8. [Editorial] Antigravity Awesome Skills -- 2026-02-18
  9. I built an MCP that connects your agent to 8,000+ skills with zero setup -- 2026-02-18
  10. Is the Nvidia T4 actually viable for 70B (EXL2) daily driving, or is it just pure cope compared to dual 3090s? -- 2026-02-13
  11. Open weight kimi k2.5 overtakes opus 4.5 non thinking on arena -- 2026-02-13
  12. When did we go from 400k to 256k? -- 2026-02-13
  13. [Editorial] https://github.com/d-Rickyy-b/certstream-server-go?tab=readme-ov-file -- 2026-02-13
  14. [Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
  15. [Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
  16. [Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
  17. Pros/Cons and use case for bypassing permissions -- 2026-02-12
  18. [Editorial] https://www.linkedin.com/posts/dragan-spiridonov_agentic-qe-competitive-landscape-2026-activity-7427362099175211010-pd1J -- 2026-02-11
  19. jmuncor/sherlock -- 2026-02-11
  20. [Editorial] https://github.com/mitkox/megacode -- 2026-02-10
  21. [Editorial] https://www.marktechpost.com/2026/02/07/google-ai-introduces-paperbanana-an-agentic-framework-that-automates-publication-ready-methodology-diagrams-and-statistical-plots -- 2026-02-10
  22. [Editorial] https://www.linkedin.com/posts/ryansmith108_frank-lee-amplitude-skills-are-now-indexed-activity-7426777024284893184-8eTf -- 2026-02-10
  23. [Editorial] https://arxiv.org/abs/2602.04118 -- 2026-02-10
  24. Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
  25. BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
  26. ykushch/ask -- 2026-02-05
  27. zaolin/vanguard -- 2026-02-05
  28. Tadpole – A modular and extensible DSL built for web scraping -- 2026-02-05
  29. Coding assistants are solving the wrong problem -- 2026-02-05
  30. How Vibe Coding is Killing Open Source -- 2026-02-05
  31. [Editorial] https://github.com/mondweep/vibe-cast/tree/claude/claude-code-v3-skill-KucJF/claude-code-v3-qe-skill -- 2026-02-04
  32. [Editorial] https://forge-quality.dev/articles/case-of-passing-tests-investigation -- 2026-02-02
  33. MultiX0/last-archive -- 2026-01-28
  34. roborev-dev/roborev -- 2026-01-28
  35. Why I Stopped Using Nbdev -- 2026-01-21
  36. VectorDBZ update: Pinecone, pgvector, custom embeddings, search stats -- 2026-01-19
  37. Prompt tool I built/use with Ollama daily - render prompt variations without worrying about text files -- 2026-01-19
  38. Need people to get excited part 2 -- 2026-01-19
  39. Binary Fuse Filters: Fast and Smaller Than XOR Filters -- 2026-01-19
  40. Read_once(), Write_once(), but Not for Rust -- 2026-01-19
  41. Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference -- 2026-01-19
  42. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_last-year-i-spent-most-of-my-time-reviewing-activity-7414597548050665472-dYjg -- 2026-01-08
  43. Anyone tried IQuest-Coder-V1 yet? The 40B numbers look wild -- 2026-01-06
  44. open-thoughts/OpenThinker-Agent-v1 -- 2026-01-06
  45. zrougamed/orion-belt -- 2026-01-06
  46. Leo-Mu/montecarlo-ip-searcher -- 2026-01-06
  47. wkhtmltopdf - Convert HTML to PDF Using QtWebKit (2021) -- 2026-01-06
  48. zakirkun/guardian-cli -- 2026-01-05
  49. orneryd/NornicDB -- 2026-01-02
  50. Build a Deep Learning Library -- 2026-01-02
  51. Liquid CO2 For Grid Scale Energy Storage Isn’t Just Hot Air -- 2026-01-02
  52. How llama.cpp implements 2.9x faster top-k sampling with bucket sort -- 2025-12-31
  53. Built an offline-first vector database (v0.2.0) looking for real-world feedback -- 2025-12-31
  54. Linux 7.0 Expected to Bring IO_uring Iopoll Polling Improvements -- 2025-12-31
  55. rix4uni/subhijack -- 2025-12-30
  56. Worktrunk – CLI for Git worktree management -- 2025-12-30
  57. [Editorial] https://github.com/JohannesLks/CVE-2025-14558 -- 2025-12-29
  58. batterdaysahead/cipher0 -- 2025-12-29
  59. MongoBleed -- 2025-12-29
  60. dsl-learn/cutile-learn -- 2025-12-18
  61. Errors in Rust: A Deep Dive -- 2025-12-18
  62. Plug Into USB, Read Hostname and IP Address -- 2025-12-18
  63. Gouryella/drip -- 2025-12-17
  64. Koko-boya/Comfyui-Z-Image-Utilities -- 2025-12-17
  65. Show HN: Generate Passwords from Regex Constraints -- 2025-12-17
  66. Generating synthetic test data for LLM applications (our approach) -- 2025-12-12
  67. Benchmarked A100 vs H100 local storage for Multi-GPU loading. The Gen4 bottleneck is brutal for cold starts. -- 2025-12-11
  68. [Toolkit] TinyLlama Fine-Tuning + RAG Lab (Full FT / LoRA / QLoRA | T4-friendly | Unified pipeline) -- 2025-12-04
  69. Introducing Lynkr — an open-source Claude-style AI coding proxy built specifically for Databricks model endpoints 🚀 -- 2025-12-04
  70. AI Runner v5.0.5 -- 2025-12-04
  71. Which local model for 3090 5069 TI combo -- 2025-12-04
  72. I built a macOS app to monitor all my Claude Code sessions at once -- 2025-12-04
  73. nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
  74. Llamacpp Parameters Tuning -- 2025-12-02
  75. 4xRTX 4000 Pro Blackwell vs 1x6000 RTX Pro -- 2025-12-02
  76. ardanlabs/kronk -- 2025-12-02
  77. Zig Book – An open, technical and introductory book for Zig -- 2025-12-02
  78. Arcee Trinity Mini: US-Trained Moe Model -- 2025-12-02
  79. Build Your Own Glasshole Detector -- 2025-12-02
  80. Askimo: Open source of Ollama native desktop client -- 2025-12-01
  81. Created 24 Claude Code learning units (beginner → power user) - Free on GitHub -- 2025-12-01
  82. You can now do FP8 reinforcement learning locally! (<5GB VRAM) -- 2025-12-01
  83. A Repository with 44 Years of Unix Evolution -- 2025-11-28
  84. Strix Halo batching with tensor parallel and pipeline parallel using vllm benchmarked -- 2025-11-28
  85. RTX 3090 vs RX 7900 with ROCm, also Vulcan -- 2025-11-26
  86. moonshotai/Kimi-K2-Thinking -- 2025-11-26
  87. Ollama Not Using GPU on RTX 5070 Ti (Blackwell) -- 2025-11-25
  88. PCIE Bifurcation - More than 4 GPUs on a consumer motherboard -- 2025-11-18
  89. Qual a melhor GPU para o llama 3(.1 ou .3) -- 2025-11-18
  90. PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
  91. Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
  92. Real-Time BART in a Box Smaller Than Your Coffee Mug -- 2025-11-17
  93. etalazz/vsa -- 2025-11-13
  94. Pi Compute Modules Make for Compact Cluster -- 2025-11-13
  95. antarys-ai/antarys -- 2025-11-11
  96. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_a-month-ago-gadi-evron-and-i-set-about-building-ugcPost-7393643597729845248-TSTD -- 2025-11-11
  97. Breakdown of New RunC Vulnerabilities -- 2025-11-11
  98. When Your Hash Becomes a String: Hunting Ruby's Million-to-One Memory Bug -- 2025-11-07
  99. Maude 3 Manual -- 2025-11-07
  100. [Editorial] Frequently wrong, but never in doubt’ -- 2025-11-05
  101. The Zero Freeze Formula: Teaching Local LLaMA Real Physics Through Python (SU(3) Mass Gap Simulation) to solve the Yang–Mills Mass Gap -- 2025-11-05
  102. Audio Sound Capture Project Needs Help -- 2025-11-05
  103. [Editorial] https://blog.peerllm.com/2025/11/02/announcing-v0.7.6.html -- 2025-11-04
  104. Faster llama.cpp ROCm performance for AMD RDNA3 (tested on Strix Halo/Ryzen AI Max 395) -- 2025-11-04
  105. KTransformers Open Source New Era: Local Fine-tuning of Kimi K2 and DeepSeek V3 -- 2025-11-04
  106. FlashPack: High-throughput tensor loading for PyTorch -- 2025-11-01
  107. M5 Neural Accelerator benchmark results from Llama.cpp -- 2025-11-01
  108. Kafka is Fast – I'll use Postgres -- 2025-11-01
  109. ZOZO's Contact Solver for physics-based simulations -- 2025-11-01
  110. Need advice on building a GPU-based render/Al compute setup: Unsure about hardware direction -- 2025-11-01
  111. [Editorial] https://pivot-to-ai.com/2025/10/15/ai-is-not-popular-and-ai-users-are-unpleasant-asshats/ -- 2025-10-30
  112. [Editorial] Developer machine part of attack chain -- 2025-10-29
  113. DGX SPARK Compiled llama.cpp Benchmarks Compared to M4 MAX (non-MLX) -- 2025-10-21
  114. perplexityai/search_evals -- 2025-10-21
  115. Hetzner: The Simple Cloud just got more flexible and more affordable -- 2025-10-21
  116. A new, super simple LLM benchmark for testing changes across models, quants, parameters, samplers, engines, etc -- 2025-10-21
  117. Significant speedup for local models -- 2025-10-20
  118. Cursor tricking paid users with fake Claude Sonnet 4.5 -- 2025-10-20
  119. inclusionAI/Ring-1T -- 2025-10-20
  120. 1r0BIT/TaskHound -- 2025-10-18
  121. armai92/goauth -- 2025-10-18
  122. Chinese gang used ArcGIS as a backdoor for a year – and no one noticed -- 2025-10-18
  123. We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source -- 2025-10-17
  124. Comparing Popular AI Evaluation Platforms for 2025 -- 2025-10-17
  125. State of AI Report 2025 -- 2025-10-17
  126. Signed Backdoor Hiding in Plain Sight on Framework Devices -- 2025-10-15
  127. Three ways formally verified code can go wrong in practice -- 2025-10-15
  128. Jeep pushed software update that bricked all 2024 Wrangler 4xe models -- 2025-10-15
  129. junron/agar -- 2025-10-15
  130. A modern approach to preventing CSRF in Go -- 2025-10-15
  131. Stop flexing Pass@N — show Pass-all-N -- 2025-10-11
  132. Architecting a project for optimal AI coding, any tips? -- 2025-10-11
  133. Basekick-Labs/arc -- 2025-10-11
  134. ServiceNow-AI/Apriel-1.5-15b-Thinker -- 2025-10-11
  135. meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
  136. Did anyone try out GLM-4.5-Air-GLM-4.6-Distill ? -- 2025-10-10
  137. Thank you Anthropic & this community! Our little side project just hit 1M visits and even made it on National TV! -- 2025-10-10
  138. Sneak Preview: Ollama Bench -- 2025-10-08
  139. When Curl Works but IntelliJ Doesn't: The Ollama Connection Mystery -- 2025-10-08
  140. Local Open Deep Research with Offline Wikipedia Search Source -- 2025-10-07
  141. Ollama drops MI50 support -- 2025-10-07
  142. CoexistAI Now Supports Docker Setup, Also now you can turn any text into Podcasts and Speech Easily -- 2025-10-07
  143. MCP_File_Generation_Tool - v0.6.0 Update! -- 2025-10-07
  144. How do I help Codex critique my ideas rather than just go along with it everytime? -- 2025-10-06
  145. Plan with Codex, code with Sonnet 4.5. What's your simple workflow here? -- 2025-10-06
  146. aminofox/zentrox -- 2025-10-06
  147. Linus Torvalds Vents over "Completely Crazy Rust Format Checking" -- 2025-10-06
  148. vllm setup for nvidia (can use llama) -- 2025-10-05
  149. Full-fine tuning doesn't require much vRAM with gradient checkpointing... -- 2025-10-05
  150. Qwen/Qwen3-Omni-30B-A3B-Thinking -- 2025-10-05
  151. inclusionAI/Ring-mini-linear-2.0 -- 2025-10-05
  152. llama.cpp: Quantizing from bf16 vs f16 -- 2025-10-05
  153. GLM 4.6 is nice -- 2025-10-04
  154. NVFP4 or MXFP4 MOE on sm120 (RTX 5900 RTX 6000 PRO) -- 2025-10-04
  155. K2-Think 32B - Reasoning model from UAE -- 2025-10-03
  156. MoonshotAI/checkpoint-engine -- 2025-10-03
  157. Whither the Chip Shortage? -- 2025-10-02
  158. A tiny receipt per AI run: κ (stress), Δhol (drift), and guards—in plain JSON. -- 2025-10-02
  159. Microsoft Agent Framework (Preview): Making AI Agents Simple for Every Developer -- 2025-10-02
  160. How bad to have RTX Pro 6000 run at PCIE x8? -- 2025-09-24
  161. A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code -- 2025-09-24
  162. SWE-Bench Pro -- 2025-09-23
  163. Investigating Training Data Detection in AI Coders -- 2025-09-23
  164. Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B -- 2025-09-23
  165. Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
  166. facebook/MobileLLM-R1-950M -- 2025-09-20
  167. OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
  168. KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
  169. Definitive proof openai/gpt-oss-20b is dumb as hell -- 2025-09-19
  170. Qwen3‑Next‑80B‑A3B‑Instruct (FP8) on Windows 11 WSL2 + vLLM + Docker (Blackwell) -- 2025-09-19
  171. Free 10%+ Speedup for CPU/Hybrid Inference on Intel CPUs with Efficiency Cores -- 2025-09-17
  172. PSA/RFC: KV Cache quantization forces excess processing onto CPU in llama.cpp -- 2025-09-15
  173. native tool calling support for DeepSeek V3.1 just merged in llama.cpp -- 2025-09-15
  174. model : add grok-2 support by CISC · Pull Request #15539 · ggml-org/llama.cpp -- 2025-09-15
  175. [Editorial] Defeating Nondeterminism in LLM Inference -- 2025-09-14
  176. Nvidia Unveils Rubin CPX Amidst Chart-Topping Blackwell Ultra MLPerf Results -- 2025-09-14
  177. Repair-R1: Better Test Before Repair -- 2025-09-14
  178. Jupyter Agents: training LLMs to reason with notebooks -- 2025-09-14
  179. Intel Files Patent for "Software Defined Super Cores" -- 2025-09-04
  180. I tried almost every tts model on my ryzen 7 5000 series 16gb ram rtx 3060 laptop 6-8GB Vram -- 2025-09-02
  181. devnen/Kitten-TTS-Server -- 2025-09-02
  182. internlm/Intern-S1-mini -- 2025-09-02
  183. stepfun-ai/Step-Audio-2-mini -- 2025-09-02
  184. QuEST/Quartet authors discuss their work on SOTA 4-bit training optimizations -- 2025-09-01
  185. F-Stack – A network development kit with high performance based on DPDK -- 2025-09-01
  186. An Empirical Study of Knowledge Distillation for Code Understanding Tasks -- 2025-09-01
  187. A Comparative Analysis of Vision Language Models for Scientific Data Interpretation -- 2025-08-31
  188. CaddyManager 0.0.1 – Web UI for managing Caddy servers -- 2025-08-30
  189. [Editorial] AI interfaces for future -- 2025-08-29
  190. I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
  191. Updates to Consumer Terms and Privacy Policy -- 2025-08-29
  192. LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
  193. Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
  194. unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
  195. Deepseek V3.1 benchmarks released -- 2025-08-25
  196. Is openrouters tokens per second reading super bugged? -- 2025-08-22
  197. It’s a Pi, But it’s not Quite a Raspberry Pi -- 2025-08-22
  198. nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 -- 2025-08-21
  199. Mistral 7B fine tuning training loss stagnant after adding more fine tuning prompts -- 2025-08-20
  200. Detecting Hallucinations in LLM Function Calling with Entropy (Part 2) -- 2025-08-20
  201. Anyone have the deets on ROCM 7.0's 3x perf claims? -- 2025-08-19
  202. Rust in 2025: Targeting foundational software -- 2025-08-19
  203. I built a small cli tool to execute agentic workflows -- 2025-08-19
  204. AvatarNova - Local AI companion -- 2025-08-19
  205. 🤖 Built an AI-powered DOCX viewer that extracts & analyzes images with Ollama! -- 2025-08-19
  206. Davincible/claude-code-open -- 2025-08-19
  207. OpenVINO GenAI 2025.2 adds a GGUF reader (preview) -- 2025-08-18
  208. CLI Agent that Supports Multiple Models? -- 2025-08-18
  209. Is there a standard oci image format for models? -- 2025-08-18
  210. moonshotai/Kimi-K2-Instruct -- 2025-08-17
  211. KittenML/kitten-tts-nano-0.1 -- 2025-08-17
  212. ilkerzgi/Overlay-Kontext-Dev-LoRA -- 2025-08-17
  213. JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 -- 2025-08-17
  214. PSA: Don't waste time trying Gemma 3 27B on V100s - it's architecturally impossible -- 2025-08-16
  215. People with MacBook Pro with 36gb of memory, which models you are running for coding? -- 2025-08-16
  216. GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface -- 2025-08-13
  217. TextQuests: How Good are LLMs at Text-Based Video Games? -- 2025-08-13
  218. Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face -- 2025-08-09
  219. [Editorial] https://unhypedai.substack.com/p/unhyped-ai-week-4-digest -- 2025-08-04
  220. 100+ AI Benchmarks list -- 2025-08-04
  221. google/langextract -- 2025-08-04
  222. Chain-GPT/Solidity-LLM -- 2025-07-25
  223. Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
  224. What kind of rig would you build with a 5k budget for local LLM? -- 2025-07-16
  225. What is your "perfect" £10,000 for Local LLM, Gaming, plex with the following conditional and context. -- 2025-07-16
  226. How to use Claude code -- 2025-07-16
  227. unsloth/Kimi-K2-Instruct-GGUF -- 2025-07-16
  228. moonshotai/Kimi-K2-Base -- 2025-07-16
  229. It's been a while, I'm out of date, suggest me a model -- 2025-07-16
  230. i need the best local llm i can run on my gaming pc -- 2025-07-16
  231. Import of chatgbt Export Zip File with Images of entire previous chats -- 2025-07-14
  232. Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
  233. Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
  234. Deep Research with local LLM and local documents -- 2025-07-10
  235. WikipeQA : An evaluation dataset for both web-browsing agents and vector DB RAG systems -- 2025-07-10
  236. Looking for advice. -- 2025-07-10
  237. OpenAI to release open-source model this summer - everything we know so far -- 2025-07-09
  238. Accelerating Docker Builds by Halving EC2 Boot Time -- 2025-06-30
  239. Show HN: Inspect and extract files from MSI installers directly in your browser -- 2025-06-28
  240. Meet Mistral Devstral, SOTA open model designed specifically for coding agents -- 2025-06-26
  241. 1.93bit Deepseek R1 0528 beats Claude Sonnet 4 -- 2025-06-26
  242. DeepSeek R1 05/28 performance on five independent benchmarks -- 2025-06-26
  243. Few-Shot Examples: Overfitting / Leakage -- 2025-06-26
  244. Finetune a model to think and use tools -- 2025-06-26
  245. I need help using open web UI with Ollama. Help installing and getting it running win 11 -- 2025-06-26
  246. I built/am building a micro-transformer for learning and experimentation -- 2025-06-26
  247. I shipped more code yesterday with Claude 4 than the last 3 weeks combined -- 2025-06-26
  248. A deep dive into self-improving AI and the Darwin-Gödel Machine -- 2025-06-26
  249. Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
  250. Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
  251. Chonkie update. -- 2025-06-25
  252. Memory Layer Compatible with Local Llama -- 2025-06-25
  253. Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
  254. How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
  255. After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
  256. The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
  257. Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
  258. Show HN: FaynoSync Self-Hosted API for Automatic App Updates -- 2025-06-23
  259. identicallead/mse6 -- 2025-06-22
  260. GCC 13.4 Released with 129 additional bug fixes -- 2025-06-22
  261. Databricks acquires Neon -- 2025-06-22
  262. Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory -- 2025-06-20
  263. Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format -- 2025-06-20
  264. DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. -- 2025-06-17
  265. ubergarm/DeepSeek-R1-0528-GGUF -- 2025-06-17
  266. LLM training on RTX 5090 -- 2025-06-17
  267. [DEMO] I created a coding agent that can do dynamic, runtime debugging. -- 2025-06-17
  268. Is anyone productively using Aider and Ollama together? -- 2025-06-17
  269. For everyone who's still confused by Attention... I made this spreadsheet just for you(FREE) -- 2025-06-17
  270. What setup/model do you use and what’s your monthly spend? -- 2025-06-17
  271. Xiaomi released an updated 7B reasoning model and VLM version claiming SOTA for their size -- 2025-06-17
  272. hhftechnology/middleware-manager -- 2025-06-17
  273. Show HN: McWig – A modal, Vim-like text editor written in Go -- 2025-06-17
  274. The Unreliability of LLMs and What Lies Ahead -- 2025-06-10
  275. 007: Democratically Finding The Cause of Packet Drops -- 2025-06-08
  276. langtalks/swe-agent -- 2025-06-08
  277. wey-gu/py-pglite -- 2025-06-08
  278. 0-$π$ qubit in one Josephson junction -- 2025-06-07
  279. 100-kT Magnetic field generation using paisley targets by femtosecond laser-plasma interactions -- 2025-06-07
  280. fileshare-go/fileshare -- 2025-06-04
  281. Rust Coreutils 0.1.0 Release -- 2025-06-04
  282. Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing -- 2025-06-04
  283. 100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems -- 2025-06-04
  284. 100+ Metrics for Software Startups - A Multi-Vocal Literature Review -- 2025-06-04
  285. Building a plug-and-play vector store for any data stream (text, audio, video, etc.)—searchable by your LLM via MCP -- 2025-05-29
  286. Building a real-world LLM agent with open-source models—structure > prompt engineering -- 2025-05-29
  287. New LocalLLM Hardware complete -- 2025-05-29
  288. Parameter-Efficient Fine-Tuning (PEFT) Explained -- 2025-05-29
  289. LLM help for recovering deleted data? -- 2025-05-29
  290. AI Runner v4.10.0 Release Notes -- 2025-05-29
  291. Unpopular opinion: RAG is actively hurting your coding agents -- 2025-05-29
  292. Teal – A statically-typed dialect of Lua -- 2025-05-29
  293. I think it's time to give Nix a chance -- 2025-05-29
  294. deepseek-ai/DeepSeek-R1-0528 -- 2025-05-29
  295. AM5 or TRX4 for local LLMs? -- 2025-05-29