Benchmarks & Evaluation

Leaderboards, evaluation frameworks, model comparison

323 articles across 108 editions

Articles

  1. LLM Novice Uplift on Dual-Use Biology Tasks — 4x Accuracy Boost Bypasses Safeguards -- 2026-04-10
  2. [Editorial] Your AI Is Developing Capabilities Nobody Tested -- 2026-04-10
  3. 1-bit llms on device?! -- 2026-04-07
  4. Running SmolLM2-360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp -- 2026-04-07
  5. Built my 10x NVidia V100 AI Server - 320gb vram - vLLM Testing Linux Headless -- 2026-04-07
  6. [Editorial] arxiv:2603.15569 -- 2026-03-30
  7. TinyLoRA: LoRA training works at just 13 parameters -- 2026-03-30
  8. KV rotation PR: q8 quants tank performance on AIME25, recovered with rotation -- 2026-03-30
  9. [Editorial] AI ASIC for LLMs -- 2026-03-30
  10. [Editorial] Heretic -- 2026-03-30
  11. mlx-snn: Spiking Neural Network library for Apple MLX -- 2026-03-30
  12. ARC-AGI-3 -- 2026-03-27
  13. LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories -- 2026-03-27
  14. [Editorial] IAWG — AI Governance Working Group -- 2026-03-18
  15. Antrophic CEO says 50% entry-level white-collar jobs will be eradicated within 3 years -- 2026-03-18
  16. GPT-5.4 -- 2026-03-09
  17. [Editorial] -- 2026-03-09
  18. How do you automate end to end testing without coding when you vibe coded the whole app -- 2026-03-03
  19. [Editorial] Visual Learning for AI Coding -- 2026-03-03
  20. darrenburns/dv -- 2026-03-03
  21. ReasonDB – open-source document DB where the LLM navigates a tree instead of vector search (RAG alternative) -- 2026-03-03
  22. [Editorial] Claude Code Nano-Banana Plugin -- 2026-03-03
  23. Mercury 2: Fast reasoning LLM powered by diffusion -- 2026-02-26
  24. I Benchmarked Opus 4.6 vs Sonnet 4.6 on agentic PR review and browser QA the results weren't what I expected -- 2026-02-26
  25. [Editorial] Bullshit meter :) -- 2026-02-26
  26. The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. -- 2026-02-25
  27. Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. -- 2026-02-25
  28. ChatGPT isn't the only chatbot pulling answers from Elon Musk's Grokipedia -- 2026-02-25
  29. [Editorial] Benchmarking LLMs for Voice Agent Use Cases -- 2026-02-21
  30. Claude Opus 4.6 Surges Past Forecasts on METR's 50% Time-Horizon Benchmark with Exponential Gains -- 2026-02-21
  31. [Editorial] Unsloth: MiniMax M2.5 Fine-Tuning Guide -- 2026-02-21
  32. [Editorial] When everyone can build software, who learns well? -- 2026-02-19
  33. Sonnet 4.6 feels like Opus 4.5 at Sonnet pricing -- 2026-02-19
  34. Anthropic Raises $30,000,000,000 As Run-Rate Revenue Grew 10x Annually Over Three Years -- 2026-02-19
  35. REASONING AUGMENTED RETRIEVAL (RAR) is the production-grade successor to single-pass RAG -- 2026-02-19
  36. [Editorial] Antigravity Awesome Skills -- 2026-02-18
  37. I built an MCP that connects your agent to 8,000+ skills with zero setup -- 2026-02-18
  38. Is the Nvidia T4 actually viable for 70B (EXL2) daily driving, or is it just pure cope compared to dual 3090s? -- 2026-02-13
  39. Open weight kimi k2.5 overtakes opus 4.5 non thinking on arena -- 2026-02-13
  40. When did we go from 400k to 256k? -- 2026-02-13
  41. [Editorial] https://github.com/d-Rickyy-b/certstream-server-go?tab=readme-ov-file -- 2026-02-13
  42. [Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
  43. [Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
  44. [Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
  45. Pros/Cons and use case for bypassing permissions -- 2026-02-12
  46. [Editorial] https://www.linkedin.com/posts/dragan-spiridonov_agentic-qe-competitive-landscape-2026-activity-7427362099175211010-pd1J -- 2026-02-11
  47. jmuncor/sherlock -- 2026-02-11
  48. [Editorial] https://github.com/mitkox/megacode -- 2026-02-10
  49. [Editorial] https://www.marktechpost.com/2026/02/07/google-ai-introduces-paperbanana-an-agentic-framework-that-automates-publication-ready-methodology-diagrams-and-statistical-plots -- 2026-02-10
  50. [Editorial] https://www.linkedin.com/posts/ryansmith108_frank-lee-amplitude-skills-are-now-indexed-activity-7426777024284893184-8eTf -- 2026-02-10
  51. [Editorial] https://arxiv.org/abs/2602.04118 -- 2026-02-10
  52. Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
  53. BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
  54. ykushch/ask -- 2026-02-05
  55. zaolin/vanguard -- 2026-02-05
  56. Tadpole – A modular and extensible DSL built for web scraping -- 2026-02-05
  57. Coding assistants are solving the wrong problem -- 2026-02-05
  58. How Vibe Coding is Killing Open Source -- 2026-02-05
  59. [Editorial] https://github.com/mondweep/vibe-cast/tree/claude/claude-code-v3-skill-KucJF/claude-code-v3-qe-skill -- 2026-02-04
  60. [Editorial] https://forge-quality.dev/articles/case-of-passing-tests-investigation -- 2026-02-02
  61. MultiX0/last-archive -- 2026-01-28
  62. roborev-dev/roborev -- 2026-01-28
  63. Why I Stopped Using Nbdev -- 2026-01-21
  64. VectorDBZ update: Pinecone, pgvector, custom embeddings, search stats -- 2026-01-19
  65. Prompt tool I built/use with Ollama daily - render prompt variations without worrying about text files -- 2026-01-19
  66. Need people to get excited part 2 -- 2026-01-19
  67. Binary Fuse Filters: Fast and Smaller Than XOR Filters -- 2026-01-19
  68. Read_once(), Write_once(), but Not for Rust -- 2026-01-19
  69. Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference -- 2026-01-19
  70. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_last-year-i-spent-most-of-my-time-reviewing-activity-7414597548050665472-dYjg -- 2026-01-08
  71. Anyone tried IQuest-Coder-V1 yet? The 40B numbers look wild -- 2026-01-06
  72. open-thoughts/OpenThinker-Agent-v1 -- 2026-01-06
  73. zrougamed/orion-belt -- 2026-01-06
  74. Leo-Mu/montecarlo-ip-searcher -- 2026-01-06
  75. wkhtmltopdf - Convert HTML to PDF Using QtWebKit (2021) -- 2026-01-06
  76. zakirkun/guardian-cli -- 2026-01-05
  77. orneryd/NornicDB -- 2026-01-02
  78. Build a Deep Learning Library -- 2026-01-02
  79. Liquid CO2 For Grid Scale Energy Storage Isn’t Just Hot Air -- 2026-01-02
  80. How llama.cpp implements 2.9x faster top-k sampling with bucket sort -- 2025-12-31
  81. Built an offline-first vector database (v0.2.0) looking for real-world feedback -- 2025-12-31
  82. Linux 7.0 Expected to Bring IO_uring Iopoll Polling Improvements -- 2025-12-31
  83. rix4uni/subhijack -- 2025-12-30
  84. Worktrunk – CLI for Git worktree management -- 2025-12-30
  85. [Editorial] https://github.com/JohannesLks/CVE-2025-14558 -- 2025-12-29
  86. batterdaysahead/cipher0 -- 2025-12-29
  87. MongoBleed -- 2025-12-29
  88. dsl-learn/cutile-learn -- 2025-12-18
  89. Errors in Rust: A Deep Dive -- 2025-12-18
  90. Plug Into USB, Read Hostname and IP Address -- 2025-12-18
  91. Gouryella/drip -- 2025-12-17
  92. Koko-boya/Comfyui-Z-Image-Utilities -- 2025-12-17
  93. Show HN: Generate Passwords from Regex Constraints -- 2025-12-17
  94. Generating synthetic test data for LLM applications (our approach) -- 2025-12-12
  95. Benchmarked A100 vs H100 local storage for Multi-GPU loading. The Gen4 bottleneck is brutal for cold starts. -- 2025-12-11
  96. [Toolkit] TinyLlama Fine-Tuning + RAG Lab (Full FT / LoRA / QLoRA | T4-friendly | Unified pipeline) -- 2025-12-04
  97. Introducing Lynkr — an open-source Claude-style AI coding proxy built specifically for Databricks model endpoints 🚀 -- 2025-12-04
  98. AI Runner v5.0.5 -- 2025-12-04
  99. Which local model for 3090 5069 TI combo -- 2025-12-04
  100. I built a macOS app to monitor all my Claude Code sessions at once -- 2025-12-04
  101. nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
  102. Llamacpp Parameters Tuning -- 2025-12-02
  103. 4xRTX 4000 Pro Blackwell vs 1x6000 RTX Pro -- 2025-12-02
  104. ardanlabs/kronk -- 2025-12-02
  105. Zig Book – An open, technical and introductory book for Zig -- 2025-12-02
  106. Arcee Trinity Mini: US-Trained Moe Model -- 2025-12-02
  107. Build Your Own Glasshole Detector -- 2025-12-02
  108. Askimo: Open source of Ollama native desktop client -- 2025-12-01
  109. Created 24 Claude Code learning units (beginner → power user) - Free on GitHub -- 2025-12-01
  110. You can now do FP8 reinforcement learning locally! (<5GB VRAM) -- 2025-12-01
  111. A Repository with 44 Years of Unix Evolution -- 2025-11-28
  112. Strix Halo batching with tensor parallel and pipeline parallel using vllm benchmarked -- 2025-11-28
  113. RTX 3090 vs RX 7900 with ROCm, also Vulcan -- 2025-11-26
  114. moonshotai/Kimi-K2-Thinking -- 2025-11-26
  115. Ollama Not Using GPU on RTX 5070 Ti (Blackwell) -- 2025-11-25
  116. PCIE Bifurcation - More than 4 GPUs on a consumer motherboard -- 2025-11-18
  117. Qual a melhor GPU para o llama 3(.1 ou .3) -- 2025-11-18
  118. PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
  119. Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
  120. Real-Time BART in a Box Smaller Than Your Coffee Mug -- 2025-11-17
  121. etalazz/vsa -- 2025-11-13
  122. Pi Compute Modules Make for Compact Cluster -- 2025-11-13
  123. antarys-ai/antarys -- 2025-11-11
  124. [Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_a-month-ago-gadi-evron-and-i-set-about-building-ugcPost-7393643597729845248-TSTD -- 2025-11-11
  125. Breakdown of New RunC Vulnerabilities -- 2025-11-11
  126. When Your Hash Becomes a String: Hunting Ruby's Million-to-One Memory Bug -- 2025-11-07
  127. Maude 3 Manual -- 2025-11-07
  128. [Editorial] Frequently wrong, but never in doubt’ -- 2025-11-05
  129. The Zero Freeze Formula: Teaching Local LLaMA Real Physics Through Python (SU(3) Mass Gap Simulation) to solve the Yang–Mills Mass Gap -- 2025-11-05
  130. Audio Sound Capture Project Needs Help -- 2025-11-05
  131. [Editorial] https://blog.peerllm.com/2025/11/02/announcing-v0.7.6.html -- 2025-11-04
  132. Faster llama.cpp ROCm performance for AMD RDNA3 (tested on Strix Halo/Ryzen AI Max 395) -- 2025-11-04
  133. KTransformers Open Source New Era: Local Fine-tuning of Kimi K2 and DeepSeek V3 -- 2025-11-04
  134. FlashPack: High-throughput tensor loading for PyTorch -- 2025-11-01
  135. M5 Neural Accelerator benchmark results from Llama.cpp -- 2025-11-01
  136. Kafka is Fast – I'll use Postgres -- 2025-11-01
  137. ZOZO's Contact Solver for physics-based simulations -- 2025-11-01
  138. Need advice on building a GPU-based render/Al compute setup: Unsure about hardware direction -- 2025-11-01
  139. [Editorial] https://pivot-to-ai.com/2025/10/15/ai-is-not-popular-and-ai-users-are-unpleasant-asshats/ -- 2025-10-30
  140. [Editorial] Developer machine part of attack chain -- 2025-10-29
  141. DGX SPARK Compiled llama.cpp Benchmarks Compared to M4 MAX (non-MLX) -- 2025-10-21
  142. perplexityai/search_evals -- 2025-10-21
  143. Hetzner: The Simple Cloud just got more flexible and more affordable -- 2025-10-21
  144. A new, super simple LLM benchmark for testing changes across models, quants, parameters, samplers, engines, etc -- 2025-10-21
  145. Significant speedup for local models -- 2025-10-20
  146. Cursor tricking paid users with fake Claude Sonnet 4.5 -- 2025-10-20
  147. inclusionAI/Ring-1T -- 2025-10-20
  148. 1r0BIT/TaskHound -- 2025-10-18
  149. armai92/goauth -- 2025-10-18
  150. Chinese gang used ArcGIS as a backdoor for a year – and no one noticed -- 2025-10-18
  151. We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source -- 2025-10-17
  152. Comparing Popular AI Evaluation Platforms for 2025 -- 2025-10-17
  153. State of AI Report 2025 -- 2025-10-17
  154. Signed Backdoor Hiding in Plain Sight on Framework Devices -- 2025-10-15
  155. Three ways formally verified code can go wrong in practice -- 2025-10-15
  156. Jeep pushed software update that bricked all 2024 Wrangler 4xe models -- 2025-10-15
  157. junron/agar -- 2025-10-15
  158. A modern approach to preventing CSRF in Go -- 2025-10-15
  159. Stop flexing Pass@N — show Pass-all-N -- 2025-10-11
  160. Architecting a project for optimal AI coding, any tips? -- 2025-10-11
  161. Basekick-Labs/arc -- 2025-10-11
  162. ServiceNow-AI/Apriel-1.5-15b-Thinker -- 2025-10-11
  163. meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
  164. Did anyone try out GLM-4.5-Air-GLM-4.6-Distill ? -- 2025-10-10
  165. Thank you Anthropic & this community! Our little side project just hit 1M visits and even made it on National TV! -- 2025-10-10
  166. Sneak Preview: Ollama Bench -- 2025-10-08
  167. When Curl Works but IntelliJ Doesn't: The Ollama Connection Mystery -- 2025-10-08
  168. Local Open Deep Research with Offline Wikipedia Search Source -- 2025-10-07
  169. Ollama drops MI50 support -- 2025-10-07
  170. CoexistAI Now Supports Docker Setup, Also now you can turn any text into Podcasts and Speech Easily -- 2025-10-07
  171. MCP_File_Generation_Tool - v0.6.0 Update! -- 2025-10-07
  172. How do I help Codex critique my ideas rather than just go along with it everytime? -- 2025-10-06
  173. Plan with Codex, code with Sonnet 4.5. What's your simple workflow here? -- 2025-10-06
  174. aminofox/zentrox -- 2025-10-06
  175. Linus Torvalds Vents over "Completely Crazy Rust Format Checking" -- 2025-10-06
  176. vllm setup for nvidia (can use llama) -- 2025-10-05
  177. Full-fine tuning doesn't require much vRAM with gradient checkpointing... -- 2025-10-05
  178. Qwen/Qwen3-Omni-30B-A3B-Thinking -- 2025-10-05
  179. inclusionAI/Ring-mini-linear-2.0 -- 2025-10-05
  180. llama.cpp: Quantizing from bf16 vs f16 -- 2025-10-05
  181. GLM 4.6 is nice -- 2025-10-04
  182. NVFP4 or MXFP4 MOE on sm120 (RTX 5900 RTX 6000 PRO) -- 2025-10-04
  183. K2-Think 32B - Reasoning model from UAE -- 2025-10-03
  184. MoonshotAI/checkpoint-engine -- 2025-10-03
  185. Whither the Chip Shortage? -- 2025-10-02
  186. A tiny receipt per AI run: κ (stress), Δhol (drift), and guards—in plain JSON. -- 2025-10-02
  187. Microsoft Agent Framework (Preview): Making AI Agents Simple for Every Developer -- 2025-10-02
  188. How bad to have RTX Pro 6000 run at PCIE x8? -- 2025-09-24
  189. A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code -- 2025-09-24
  190. SWE-Bench Pro -- 2025-09-23
  191. Investigating Training Data Detection in AI Coders -- 2025-09-23
  192. Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B -- 2025-09-23
  193. Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
  194. facebook/MobileLLM-R1-950M -- 2025-09-20
  195. OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
  196. KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
  197. Definitive proof openai/gpt-oss-20b is dumb as hell -- 2025-09-19
  198. Qwen3‑Next‑80B‑A3B‑Instruct (FP8) on Windows 11 WSL2 + vLLM + Docker (Blackwell) -- 2025-09-19
  199. Free 10%+ Speedup for CPU/Hybrid Inference on Intel CPUs with Efficiency Cores -- 2025-09-17
  200. PSA/RFC: KV Cache quantization forces excess processing onto CPU in llama.cpp -- 2025-09-15
  201. native tool calling support for DeepSeek V3.1 just merged in llama.cpp -- 2025-09-15
  202. model : add grok-2 support by CISC · Pull Request #15539 · ggml-org/llama.cpp -- 2025-09-15
  203. [Editorial] Defeating Nondeterminism in LLM Inference -- 2025-09-14
  204. Nvidia Unveils Rubin CPX Amidst Chart-Topping Blackwell Ultra MLPerf Results -- 2025-09-14
  205. Repair-R1: Better Test Before Repair -- 2025-09-14
  206. Jupyter Agents: training LLMs to reason with notebooks -- 2025-09-14
  207. Intel Files Patent for "Software Defined Super Cores" -- 2025-09-04
  208. I tried almost every tts model on my ryzen 7 5000 series 16gb ram rtx 3060 laptop 6-8GB Vram -- 2025-09-02
  209. devnen/Kitten-TTS-Server -- 2025-09-02
  210. internlm/Intern-S1-mini -- 2025-09-02
  211. stepfun-ai/Step-Audio-2-mini -- 2025-09-02
  212. QuEST/Quartet authors discuss their work on SOTA 4-bit training optimizations -- 2025-09-01
  213. F-Stack – A network development kit with high performance based on DPDK -- 2025-09-01
  214. An Empirical Study of Knowledge Distillation for Code Understanding Tasks -- 2025-09-01
  215. A Comparative Analysis of Vision Language Models for Scientific Data Interpretation -- 2025-08-31
  216. CaddyManager 0.0.1 – Web UI for managing Caddy servers -- 2025-08-30
  217. [Editorial] AI interfaces for future -- 2025-08-29
  218. I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
  219. Updates to Consumer Terms and Privacy Policy -- 2025-08-29
  220. LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
  221. Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
  222. unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
  223. Deepseek V3.1 benchmarks released -- 2025-08-25
  224. Is openrouters tokens per second reading super bugged? -- 2025-08-22
  225. It’s a Pi, But it’s not Quite a Raspberry Pi -- 2025-08-22
  226. nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 -- 2025-08-21
  227. Mistral 7B fine tuning training loss stagnant after adding more fine tuning prompts -- 2025-08-20
  228. Detecting Hallucinations in LLM Function Calling with Entropy (Part 2) -- 2025-08-20
  229. Anyone have the deets on ROCM 7.0's 3x perf claims? -- 2025-08-19
  230. Rust in 2025: Targeting foundational software -- 2025-08-19
  231. I built a small cli tool to execute agentic workflows -- 2025-08-19
  232. AvatarNova - Local AI companion -- 2025-08-19
  233. 🤖 Built an AI-powered DOCX viewer that extracts & analyzes images with Ollama! -- 2025-08-19
  234. Davincible/claude-code-open -- 2025-08-19
  235. OpenVINO GenAI 2025.2 adds a GGUF reader (preview) -- 2025-08-18
  236. CLI Agent that Supports Multiple Models? -- 2025-08-18
  237. Is there a standard oci image format for models? -- 2025-08-18
  238. moonshotai/Kimi-K2-Instruct -- 2025-08-17
  239. KittenML/kitten-tts-nano-0.1 -- 2025-08-17
  240. ilkerzgi/Overlay-Kontext-Dev-LoRA -- 2025-08-17
  241. JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 -- 2025-08-17
  242. PSA: Don't waste time trying Gemma 3 27B on V100s - it's architecturally impossible -- 2025-08-16
  243. People with MacBook Pro with 36gb of memory, which models you are running for coding? -- 2025-08-16
  244. GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface -- 2025-08-13
  245. TextQuests: How Good are LLMs at Text-Based Video Games? -- 2025-08-13
  246. Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face -- 2025-08-09
  247. [Editorial] https://unhypedai.substack.com/p/unhyped-ai-week-4-digest -- 2025-08-04
  248. 100+ AI Benchmarks list -- 2025-08-04
  249. google/langextract -- 2025-08-04
  250. Chain-GPT/Solidity-LLM -- 2025-07-25
  251. Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
  252. What kind of rig would you build with a 5k budget for local LLM? -- 2025-07-16
  253. What is your "perfect" £10,000 for Local LLM, Gaming, plex with the following conditional and context. -- 2025-07-16
  254. How to use Claude code -- 2025-07-16
  255. unsloth/Kimi-K2-Instruct-GGUF -- 2025-07-16
  256. moonshotai/Kimi-K2-Base -- 2025-07-16
  257. It's been a while, I'm out of date, suggest me a model -- 2025-07-16
  258. i need the best local llm i can run on my gaming pc -- 2025-07-16
  259. Import of chatgbt Export Zip File with Images of entire previous chats -- 2025-07-14
  260. Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
  261. Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
  262. Deep Research with local LLM and local documents -- 2025-07-10
  263. WikipeQA : An evaluation dataset for both web-browsing agents and vector DB RAG systems -- 2025-07-10
  264. Looking for advice. -- 2025-07-10
  265. OpenAI to release open-source model this summer - everything we know so far -- 2025-07-09
  266. Accelerating Docker Builds by Halving EC2 Boot Time -- 2025-06-30
  267. Show HN: Inspect and extract files from MSI installers directly in your browser -- 2025-06-28
  268. Meet Mistral Devstral, SOTA open model designed specifically for coding agents -- 2025-06-26
  269. 1.93bit Deepseek R1 0528 beats Claude Sonnet 4 -- 2025-06-26
  270. DeepSeek R1 05/28 performance on five independent benchmarks -- 2025-06-26
  271. Few-Shot Examples: Overfitting / Leakage -- 2025-06-26
  272. Finetune a model to think and use tools -- 2025-06-26
  273. I need help using open web UI with Ollama. Help installing and getting it running win 11 -- 2025-06-26
  274. I built/am building a micro-transformer for learning and experimentation -- 2025-06-26
  275. I shipped more code yesterday with Claude 4 than the last 3 weeks combined -- 2025-06-26
  276. A deep dive into self-improving AI and the Darwin-Gödel Machine -- 2025-06-26
  277. Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
  278. Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
  279. Chonkie update. -- 2025-06-25
  280. Memory Layer Compatible with Local Llama -- 2025-06-25
  281. Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
  282. How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
  283. After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
  284. The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
  285. Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
  286. Show HN: FaynoSync Self-Hosted API for Automatic App Updates -- 2025-06-23
  287. identicallead/mse6 -- 2025-06-22
  288. GCC 13.4 Released with 129 additional bug fixes -- 2025-06-22
  289. Databricks acquires Neon -- 2025-06-22
  290. Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory -- 2025-06-20
  291. Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format -- 2025-06-20
  292. DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. -- 2025-06-17
  293. ubergarm/DeepSeek-R1-0528-GGUF -- 2025-06-17
  294. LLM training on RTX 5090 -- 2025-06-17
  295. [DEMO] I created a coding agent that can do dynamic, runtime debugging. -- 2025-06-17
  296. Is anyone productively using Aider and Ollama together? -- 2025-06-17
  297. For everyone who's still confused by Attention... I made this spreadsheet just for you(FREE) -- 2025-06-17
  298. What setup/model do you use and what’s your monthly spend? -- 2025-06-17
  299. Xiaomi released an updated 7B reasoning model and VLM version claiming SOTA for their size -- 2025-06-17
  300. hhftechnology/middleware-manager -- 2025-06-17
  301. Show HN: McWig – A modal, Vim-like text editor written in Go -- 2025-06-17
  302. The Unreliability of LLMs and What Lies Ahead -- 2025-06-10
  303. 007: Democratically Finding The Cause of Packet Drops -- 2025-06-08
  304. langtalks/swe-agent -- 2025-06-08
  305. wey-gu/py-pglite -- 2025-06-08
  306. 0-$π$ qubit in one Josephson junction -- 2025-06-07
  307. 100-kT Magnetic field generation using paisley targets by femtosecond laser-plasma interactions -- 2025-06-07
  308. fileshare-go/fileshare -- 2025-06-04
  309. Rust Coreutils 0.1.0 Release -- 2025-06-04
  310. Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing -- 2025-06-04
  311. 100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems -- 2025-06-04
  312. 100+ Metrics for Software Startups - A Multi-Vocal Literature Review -- 2025-06-04
  313. Building a plug-and-play vector store for any data stream (text, audio, video, etc.)—searchable by your LLM via MCP -- 2025-05-29
  314. Building a real-world LLM agent with open-source models—structure > prompt engineering -- 2025-05-29
  315. New LocalLLM Hardware complete -- 2025-05-29
  316. Parameter-Efficient Fine-Tuning (PEFT) Explained -- 2025-05-29
  317. LLM help for recovering deleted data? -- 2025-05-29
  318. AI Runner v4.10.0 Release Notes -- 2025-05-29
  319. Unpopular opinion: RAG is actively hurting your coding agents -- 2025-05-29
  320. Teal – A statically-typed dialect of Lua -- 2025-05-29
  321. I think it's time to give Nix a chance -- 2025-05-29
  322. deepseek-ai/DeepSeek-R1-0528 -- 2025-05-29
  323. AM5 or TRX4 for local LLMs? -- 2025-05-29