Benchmarks & Evaluation

Leaderboards, evaluation frameworks, model comparison

389 articles across 125 editions

Articles

Claude Sonnet 5 -- 2026-07-03
A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models -- 2026-07-03
[Editorial] ruvnet Technical Reference -- 2026-07-03
Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers -- 2026-07-02
New bench designed for smaller models: ObviousBench.com -- 2026-07-02
I built an autonomous dev pipeline and ran the same project head to head: a 27B local on a modded 4090, then again on cheap cloud LLMs -- 2026-07-02
Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries -- 2026-07-02
StarTrail-org/PixelRAG -- 2026-06-25
georgebuilds/anneal -- 2026-06-25
numind/NuExtract3 -- 2026-06-25
[Editorial] Explainer Agent Harness Generator -- 2026-06-25
[Editorial] Chainguard Scans Source Code for Malware and Greyware -- 2026-06-12
Are insecure code completions in PyCharm a vulnerability? -- 2026-06-12
ShieldNet-360/prompt-gate -- 2026-06-12
[Editorial] Staris Tech -- 2026-06-12
[Editorial] Hallucinations -- 2026-06-11
Lines of Code Got a Better Publicist -- 2026-06-11
How's Linear so fast? A technical breakdown -- 2026-06-11
Port React Compiler to Rust -- 2026-06-11
Show HN: Extend UI – open-source UI kit for modern document apps -- 2026-06-11
[Editorial] LiveContainer — iOS App Sideloading Container -- 2026-06-03
[Editorial] SideStore — Alternative iOS App Store -- 2026-06-03
[Editorial] idevice_pair — Rust iOS Device Pairing -- 2026-06-03
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action -- 2026-06-02
Nvidia announces new AI chip for personal computers -- 2026-06-02
Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL) -- 2026-06-02
New DeepSWE benchmark finds Claude Opus cheats -- 2026-05-29
ITBench-AA: Frontier Models Score Below 50% on Enterprise IT Tasks — by Artificial Analysis and IBM -- 2026-05-29
Context, Reasoning, and Hierarchy: Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP -- 2026-05-29
10 years of AI robustness tricks (PGD, RLHF, Data Augmentation) are actually computing the same hidden matrix -- 2026-05-29
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization -- 2026-05-29
Shard — getting to 10× KV cache compression -- 2026-05-29
LightVLM: Efficient inference toolkit for vision-language models -- 2026-05-29
Speech Tokenizer Arena: Side-by-side benchmarking for discrete speech tokenizers -- 2026-05-29
DeepSeek just popped the American AI bubble. -- 2026-05-29
DeepSeek V4 Flash at 8.4 tok/s on 3×3090 — patching GGUF metadata for cchuter's fork -- 2026-05-29
Why are the AI Companies spreading F.U.D. about AI? -- 2026-05-29
Running DeepSeek-V4 locally with 4x legacy RTX 2080 Ti ($2k budget setup). Custom Turing kernels, W8A8 quantization, and 255 prefill tok/s! -- 2026-05-20
Ran the same models across Strix Halo, RTX 3090, and RTX 5070 because I wanted my own numbers -- 2026-05-20
Intel's Crescent Island PCB Leaks, Showing a Massive Xe3P GPU, 16-Pin Connector, 160GB LPDDR5X as Intel Sidesteps the HBM Shortage -- 2026-05-20
Sipeed's K3 RISC-V SBCs can run 30B-parameter LLMs 60 TOPS (INT4), Supports BF16/FP16/INT4 -- 2026-05-20
club-5060ti: practical RTX 5060 Ti local LLM notes and configs -- 2026-05-20
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
[Editorial] Synaptic-Tuner — LLM Tuning Framework -- 2026-05-11
[Editorial] Video — AI Tools & Frameworks -- 2026-05-11
A C++ port of Echo-TTS -- 2026-05-11
[Editorial] -- 2026-05-07
ProgramBench: Can we really rebuild huge binaries from scratch? (doesn't look like it) -- 2026-05-07
Adding Benchmaxxer Repellant to the Open ASR Leaderboard -- 2026-05-07
AI Evals Are Becoming the New Compute Bottleneck -- 2026-05-04
Function Calling Harness 2: Schema-Driven CoT Compliance from 9.91% to 100% -- 2026-05-04
Microsoft and OpenAI end their exclusive and revenue-sharing deal -- 2026-05-01
[Editorial] Video: AI Development Insights -- 2026-05-01
Talkie: a 13B vintage language model from 1930 -- 2026-05-01
SWE-bench Verified no longer measures frontier coding capabilities -- 2026-04-29
Opus 4.7: Are these first signs of model collapse? -- 2026-04-29
Ternary Bonsai: Top Intelligence at 1.58 Bits -- 2026-04-22
Personal Eval: Gemma4 26B MoE vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared -- 2026-04-22
NVIDIA Nemotron-3-Super-120B-A12B-FP8 -- 2026-04-22
High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction -- 2026-04-21
Density-Guided Response Optimization: Community-Grounded Alignment via Implicit Acceptance Signals -- 2026-04-21
Building a Fast Multilingual OCR Model with Synthetic Data -- 2026-04-21
Physical Simulator In-the-Loop Video Generation -- 2026-04-21
LLM Novice Uplift on Dual-Use Biology Tasks — 4x Accuracy Boost Bypasses Safeguards -- 2026-04-10
[Editorial] Your AI Is Developing Capabilities Nobody Tested -- 2026-04-10
1-bit llms on device?! -- 2026-04-07
Running SmolLM2-360M on a Samsung Galaxy Watch 4 (380MB RAM) – 74% RAM reduction in llama.cpp -- 2026-04-07
Built my 10x NVidia V100 AI Server - 320gb vram - vLLM Testing Linux Headless -- 2026-04-07
[Editorial] arxiv:2603.15569 -- 2026-03-30
TinyLoRA: LoRA training works at just 13 parameters -- 2026-03-30
KV rotation PR: q8 quants tank performance on AIME25, recovered with rotation -- 2026-03-30
[Editorial] AI ASIC for LLMs -- 2026-03-30
[Editorial] Heretic -- 2026-03-30
mlx-snn: Spiking Neural Network library for Apple MLX -- 2026-03-30
ARC-AGI-3 -- 2026-03-27
LABSHIELD: A Multimodal Benchmark for Safety-Critical Reasoning and Planning in Scientific Laboratories -- 2026-03-27
[Editorial] IAWG — AI Governance Working Group -- 2026-03-18
Antrophic CEO says 50% entry-level white-collar jobs will be eradicated within 3 years -- 2026-03-18
GPT-5.4 -- 2026-03-09
[Editorial] -- 2026-03-09
How do you automate end to end testing without coding when you vibe coded the whole app -- 2026-03-03
[Editorial] Visual Learning for AI Coding -- 2026-03-03
darrenburns/dv -- 2026-03-03
ReasonDB – open-source document DB where the LLM navigates a tree instead of vector search (RAG alternative) -- 2026-03-03
[Editorial] Claude Code Nano-Banana Plugin -- 2026-03-03
Mercury 2: Fast reasoning LLM powered by diffusion -- 2026-02-26
I Benchmarked Opus 4.6 vs Sonnet 4.6 on agentic PR review and browser QA the results weren't what I expected -- 2026-02-26
[Editorial] Bullshit meter :) -- 2026-02-26
The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. -- 2026-02-25
Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. -- 2026-02-25
ChatGPT isn't the only chatbot pulling answers from Elon Musk's Grokipedia -- 2026-02-25
[Editorial] Benchmarking LLMs for Voice Agent Use Cases -- 2026-02-21
Claude Opus 4.6 Surges Past Forecasts on METR's 50% Time-Horizon Benchmark with Exponential Gains -- 2026-02-21
[Editorial] Unsloth: MiniMax M2.5 Fine-Tuning Guide -- 2026-02-21
[Editorial] When everyone can build software, who learns well? -- 2026-02-19
Sonnet 4.6 feels like Opus 4.5 at Sonnet pricing -- 2026-02-19
Anthropic Raises $30,000,000,000 As Run-Rate Revenue Grew 10x Annually Over Three Years -- 2026-02-19
REASONING AUGMENTED RETRIEVAL (RAR) is the production-grade successor to single-pass RAG -- 2026-02-19
[Editorial] Antigravity Awesome Skills -- 2026-02-18
I built an MCP that connects your agent to 8,000+ skills with zero setup -- 2026-02-18
Is the Nvidia T4 actually viable for 70B (EXL2) daily driving, or is it just pure cope compared to dual 3090s? -- 2026-02-13
Open weight kimi k2.5 overtakes opus 4.5 non thinking on arena -- 2026-02-13
When did we go from 400k to 256k? -- 2026-02-13
[Editorial] https://github.com/d-Rickyy-b/certstream-server-go?tab=readme-ov-file -- 2026-02-13
[Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
[Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
[Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
Pros/Cons and use case for bypassing permissions -- 2026-02-12
[Editorial] https://www.linkedin.com/posts/dragan-spiridonov_agentic-qe-competitive-landscape-2026-activity-7427362099175211010-pd1J -- 2026-02-11
jmuncor/sherlock -- 2026-02-11
[Editorial] https://github.com/mitkox/megacode -- 2026-02-10
[Editorial] https://www.marktechpost.com/2026/02/07/google-ai-introduces-paperbanana-an-agentic-framework-that-automates-publication-ready-methodology-diagrams-and-statistical-plots -- 2026-02-10
[Editorial] https://www.linkedin.com/posts/ryansmith108_frank-lee-amplitude-skills-are-now-indexed-activity-7426777024284893184-8eTf -- 2026-02-10
[Editorial] https://arxiv.org/abs/2602.04118 -- 2026-02-10
Measuring output stability across LLM runs (JSON drift problem) -- 2026-02-09
BalatroBench - Benchmark LLMs' strategic performance in Balatro -- 2026-02-09
ykushch/ask -- 2026-02-05
zaolin/vanguard -- 2026-02-05
Tadpole – A modular and extensible DSL built for web scraping -- 2026-02-05
Coding assistants are solving the wrong problem -- 2026-02-05
How Vibe Coding is Killing Open Source -- 2026-02-05
[Editorial] https://github.com/mondweep/vibe-cast/tree/claude/claude-code-v3-skill-KucJF/claude-code-v3-qe-skill -- 2026-02-04
[Editorial] https://forge-quality.dev/articles/case-of-passing-tests-investigation -- 2026-02-02
MultiX0/last-archive -- 2026-01-28
roborev-dev/roborev -- 2026-01-28
Why I Stopped Using Nbdev -- 2026-01-21
VectorDBZ update: Pinecone, pgvector, custom embeddings, search stats -- 2026-01-19
Prompt tool I built/use with Ollama daily - render prompt variations without worrying about text files -- 2026-01-19
Need people to get excited part 2 -- 2026-01-19
Binary Fuse Filters: Fast and Smaller Than XOR Filters -- 2026-01-19
Read_once(), Write_once(), but Not for Rust -- 2026-01-19
Show HN: HTTP:COLON – A quick HTTP header/directive inspector and reference -- 2026-01-19
[Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_last-year-i-spent-most-of-my-time-reviewing-activity-7414597548050665472-dYjg -- 2026-01-08
Anyone tried IQuest-Coder-V1 yet? The 40B numbers look wild -- 2026-01-06
open-thoughts/OpenThinker-Agent-v1 -- 2026-01-06
zrougamed/orion-belt -- 2026-01-06
Leo-Mu/montecarlo-ip-searcher -- 2026-01-06
wkhtmltopdf - Convert HTML to PDF Using QtWebKit (2021) -- 2026-01-06
zakirkun/guardian-cli -- 2026-01-05
orneryd/NornicDB -- 2026-01-02
Build a Deep Learning Library -- 2026-01-02
Liquid CO2 For Grid Scale Energy Storage Isn’t Just Hot Air -- 2026-01-02
How llama.cpp implements 2.9x faster top-k sampling with bucket sort -- 2025-12-31
Built an offline-first vector database (v0.2.0) looking for real-world feedback -- 2025-12-31
Linux 7.0 Expected to Bring IO_uring Iopoll Polling Improvements -- 2025-12-31
rix4uni/subhijack -- 2025-12-30
Worktrunk – CLI for Git worktree management -- 2025-12-30
[Editorial] https://github.com/JohannesLks/CVE-2025-14558 -- 2025-12-29
batterdaysahead/cipher0 -- 2025-12-29
MongoBleed -- 2025-12-29
dsl-learn/cutile-learn -- 2025-12-18
Errors in Rust: A Deep Dive -- 2025-12-18
Plug Into USB, Read Hostname and IP Address -- 2025-12-18
Gouryella/drip -- 2025-12-17
Koko-boya/Comfyui-Z-Image-Utilities -- 2025-12-17
Show HN: Generate Passwords from Regex Constraints -- 2025-12-17
Generating synthetic test data for LLM applications (our approach) -- 2025-12-12
Benchmarked A100 vs H100 local storage for Multi-GPU loading. The Gen4 bottleneck is brutal for cold starts. -- 2025-12-11
[Toolkit] TinyLlama Fine-Tuning + RAG Lab (Full FT / LoRA / QLoRA | T4-friendly | Unified pipeline) -- 2025-12-04
Introducing Lynkr — an open-source Claude-style AI coding proxy built specifically for Databricks model endpoints 🚀 -- 2025-12-04
AI Runner v5.0.5 -- 2025-12-04
Which local model for 3090 5069 TI combo -- 2025-12-04
I built a macOS app to monitor all my Claude Code sessions at once -- 2025-12-04
nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
Llamacpp Parameters Tuning -- 2025-12-02
4xRTX 4000 Pro Blackwell vs 1x6000 RTX Pro -- 2025-12-02
ardanlabs/kronk -- 2025-12-02
Zig Book – An open, technical and introductory book for Zig -- 2025-12-02
Arcee Trinity Mini: US-Trained Moe Model -- 2025-12-02
Build Your Own Glasshole Detector -- 2025-12-02
Askimo: Open source of Ollama native desktop client -- 2025-12-01
Created 24 Claude Code learning units (beginner → power user) - Free on GitHub -- 2025-12-01
You can now do FP8 reinforcement learning locally! (<5GB VRAM) -- 2025-12-01
A Repository with 44 Years of Unix Evolution -- 2025-11-28
Strix Halo batching with tensor parallel and pipeline parallel using vllm benchmarked -- 2025-11-28
RTX 3090 vs RX 7900 with ROCm, also Vulcan -- 2025-11-26
moonshotai/Kimi-K2-Thinking -- 2025-11-26
Ollama Not Using GPU on RTX 5070 Ti (Blackwell) -- 2025-11-25
PCIE Bifurcation - More than 4 GPUs on a consumer motherboard -- 2025-11-18
Qual a melhor GPU para o llama 3(.1 ou .3) -- 2025-11-18
PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
Real-Time BART in a Box Smaller Than Your Coffee Mug -- 2025-11-17
etalazz/vsa -- 2025-11-13
Pi Compute Modules Make for Compact Cluster -- 2025-11-13
antarys-ai/antarys -- 2025-11-11
[Editorial] https://www.linkedin.com/posts/daniel-cuthbert0x_a-month-ago-gadi-evron-and-i-set-about-building-ugcPost-7393643597729845248-TSTD -- 2025-11-11
Breakdown of New RunC Vulnerabilities -- 2025-11-11
When Your Hash Becomes a String: Hunting Ruby's Million-to-One Memory Bug -- 2025-11-07
Maude 3 Manual -- 2025-11-07
[Editorial] Frequently wrong, but never in doubt’ -- 2025-11-05
The Zero Freeze Formula: Teaching Local LLaMA Real Physics Through Python (SU(3) Mass Gap Simulation) to solve the Yang–Mills Mass Gap -- 2025-11-05
Audio Sound Capture Project Needs Help -- 2025-11-05
[Editorial] https://blog.peerllm.com/2025/11/02/announcing-v0.7.6.html -- 2025-11-04
Faster llama.cpp ROCm performance for AMD RDNA3 (tested on Strix Halo/Ryzen AI Max 395) -- 2025-11-04
KTransformers Open Source New Era: Local Fine-tuning of Kimi K2 and DeepSeek V3 -- 2025-11-04
FlashPack: High-throughput tensor loading for PyTorch -- 2025-11-01
M5 Neural Accelerator benchmark results from Llama.cpp -- 2025-11-01
Kafka is Fast – I'll use Postgres -- 2025-11-01
ZOZO's Contact Solver for physics-based simulations -- 2025-11-01
Need advice on building a GPU-based render/Al compute setup: Unsure about hardware direction -- 2025-11-01
[Editorial] https://pivot-to-ai.com/2025/10/15/ai-is-not-popular-and-ai-users-are-unpleasant-asshats/ -- 2025-10-30
[Editorial] Developer machine part of attack chain -- 2025-10-29
DGX SPARK Compiled llama.cpp Benchmarks Compared to M4 MAX (non-MLX) -- 2025-10-21
perplexityai/search_evals -- 2025-10-21
Hetzner: The Simple Cloud just got more flexible and more affordable -- 2025-10-21
A new, super simple LLM benchmark for testing changes across models, quants, parameters, samplers, engines, etc -- 2025-10-21
Significant speedup for local models -- 2025-10-20
Cursor tricking paid users with fake Claude Sonnet 4.5 -- 2025-10-20
inclusionAI/Ring-1T -- 2025-10-20
1r0BIT/TaskHound -- 2025-10-18
armai92/goauth -- 2025-10-18
Chinese gang used ArcGIS as a backdoor for a year – and no one noticed -- 2025-10-18
We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source -- 2025-10-17
Comparing Popular AI Evaluation Platforms for 2025 -- 2025-10-17
State of AI Report 2025 -- 2025-10-17
Signed Backdoor Hiding in Plain Sight on Framework Devices -- 2025-10-15
Three ways formally verified code can go wrong in practice -- 2025-10-15
Jeep pushed software update that bricked all 2024 Wrangler 4xe models -- 2025-10-15
junron/agar -- 2025-10-15
A modern approach to preventing CSRF in Go -- 2025-10-15
Stop flexing Pass@N — show Pass-all-N -- 2025-10-11
Architecting a project for optimal AI coding, any tips? -- 2025-10-11
Basekick-Labs/arc -- 2025-10-11
ServiceNow-AI/Apriel-1.5-15b-Thinker -- 2025-10-11
meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
Did anyone try out GLM-4.5-Air-GLM-4.6-Distill ? -- 2025-10-10
Thank you Anthropic & this community! Our little side project just hit 1M visits and even made it on National TV! -- 2025-10-10
Sneak Preview: Ollama Bench -- 2025-10-08
When Curl Works but IntelliJ Doesn't: The Ollama Connection Mystery -- 2025-10-08
Local Open Deep Research with Offline Wikipedia Search Source -- 2025-10-07
Ollama drops MI50 support -- 2025-10-07
CoexistAI Now Supports Docker Setup, Also now you can turn any text into Podcasts and Speech Easily -- 2025-10-07
MCP_File_Generation_Tool - v0.6.0 Update! -- 2025-10-07
How do I help Codex critique my ideas rather than just go along with it everytime? -- 2025-10-06
Plan with Codex, code with Sonnet 4.5. What's your simple workflow here? -- 2025-10-06
aminofox/zentrox -- 2025-10-06
Linus Torvalds Vents over "Completely Crazy Rust Format Checking" -- 2025-10-06
vllm setup for nvidia (can use llama) -- 2025-10-05
Full-fine tuning doesn't require much vRAM with gradient checkpointing... -- 2025-10-05
Qwen/Qwen3-Omni-30B-A3B-Thinking -- 2025-10-05
inclusionAI/Ring-mini-linear-2.0 -- 2025-10-05
llama.cpp: Quantizing from bf16 vs f16 -- 2025-10-05
GLM 4.6 is nice -- 2025-10-04
NVFP4 or MXFP4 MOE on sm120 (RTX 5900 RTX 6000 PRO) -- 2025-10-04
K2-Think 32B - Reasoning model from UAE -- 2025-10-03
MoonshotAI/checkpoint-engine -- 2025-10-03
Whither the Chip Shortage? -- 2025-10-02
A tiny receipt per AI run: κ (stress), Δhol (drift), and guards—in plain JSON. -- 2025-10-02
Microsoft Agent Framework (Preview): Making AI Agents Simple for Every Developer -- 2025-10-02
How bad to have RTX Pro 6000 run at PCIE x8? -- 2025-09-24
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code -- 2025-09-24
SWE-Bench Pro -- 2025-09-23
Investigating Training Data Detection in AI Coders -- 2025-09-23
Comparison H100 vs RTX 6000 PRO with VLLM and GPT-OSS-120B -- 2025-09-23
Answer Matching Outperforms Multiple Choice for Language Model Evaluation -- 2025-09-21
facebook/MobileLLM-R1-950M -- 2025-09-20
OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
Definitive proof openai/gpt-oss-20b is dumb as hell -- 2025-09-19
Qwen3‑Next‑80B‑A3B‑Instruct (FP8) on Windows 11 WSL2 + vLLM + Docker (Blackwell) -- 2025-09-19
Free 10%+ Speedup for CPU/Hybrid Inference on Intel CPUs with Efficiency Cores -- 2025-09-17
PSA/RFC: KV Cache quantization forces excess processing onto CPU in llama.cpp -- 2025-09-15
native tool calling support for DeepSeek V3.1 just merged in llama.cpp -- 2025-09-15
model : add grok-2 support by CISC · Pull Request #15539 · ggml-org/llama.cpp -- 2025-09-15
[Editorial] Defeating Nondeterminism in LLM Inference -- 2025-09-14
Nvidia Unveils Rubin CPX Amidst Chart-Topping Blackwell Ultra MLPerf Results -- 2025-09-14
Repair-R1: Better Test Before Repair -- 2025-09-14
Jupyter Agents: training LLMs to reason with notebooks -- 2025-09-14
Intel Files Patent for "Software Defined Super Cores" -- 2025-09-04
I tried almost every tts model on my ryzen 7 5000 series 16gb ram rtx 3060 laptop 6-8GB Vram -- 2025-09-02
devnen/Kitten-TTS-Server -- 2025-09-02
internlm/Intern-S1-mini -- 2025-09-02
stepfun-ai/Step-Audio-2-mini -- 2025-09-02
QuEST/Quartet authors discuss their work on SOTA 4-bit training optimizations -- 2025-09-01
F-Stack – A network development kit with high performance based on DPDK -- 2025-09-01
An Empirical Study of Knowledge Distillation for Code Understanding Tasks -- 2025-09-01
A Comparative Analysis of Vision Language Models for Scientific Data Interpretation -- 2025-08-31
CaddyManager 0.0.1 – Web UI for managing Caddy servers -- 2025-08-30
[Editorial] AI interfaces for future -- 2025-08-29
I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
Updates to Consumer Terms and Privacy Policy -- 2025-08-29
LLM speedup breakthrough? 53x faster generation and 6x prefilling from NVIDIA -- 2025-08-29
Intel Granite Rapids CPU on sale at Newegg up to 65% off MSRP -- 2025-08-29
unsloth/DeepSeek-V3.1-GGUF -- 2025-08-29
Deepseek V3.1 benchmarks released -- 2025-08-25
Is openrouters tokens per second reading super bugged? -- 2025-08-22
It’s a Pi, But it’s not Quite a Raspberry Pi -- 2025-08-22
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 -- 2025-08-21
Mistral 7B fine tuning training loss stagnant after adding more fine tuning prompts -- 2025-08-20
Detecting Hallucinations in LLM Function Calling with Entropy (Part 2) -- 2025-08-20
Anyone have the deets on ROCM 7.0's 3x perf claims? -- 2025-08-19
Rust in 2025: Targeting foundational software -- 2025-08-19
I built a small cli tool to execute agentic workflows -- 2025-08-19
AvatarNova - Local AI companion -- 2025-08-19
🤖 Built an AI-powered DOCX viewer that extracts & analyzes images with Ollama! -- 2025-08-19
Davincible/claude-code-open -- 2025-08-19
OpenVINO GenAI 2025.2 adds a GGUF reader (preview) -- 2025-08-18
CLI Agent that Supports Multiple Models? -- 2025-08-18
Is there a standard oci image format for models? -- 2025-08-18
moonshotai/Kimi-K2-Instruct -- 2025-08-17
KittenML/kitten-tts-nano-0.1 -- 2025-08-17
ilkerzgi/Overlay-Kontext-Dev-LoRA -- 2025-08-17
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 -- 2025-08-17
PSA: Don't waste time trying Gemma 3 27B on V100s - it's architecturally impossible -- 2025-08-16
People with MacBook Pro with 36gb of memory, which models you are running for coding? -- 2025-08-16
GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface -- 2025-08-13
TextQuests: How Good are LLMs at Text-Based Video Games? -- 2025-08-13
Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face -- 2025-08-09
[Editorial] https://unhypedai.substack.com/p/unhyped-ai-week-4-digest -- 2025-08-04
100+ AI Benchmarks list -- 2025-08-04
google/langextract -- 2025-08-04
Chain-GPT/Solidity-LLM -- 2025-07-25
Anyone interested in adding their fine-tuned / open source models to this benchmark? -- 2025-07-25
What kind of rig would you build with a 5k budget for local LLM? -- 2025-07-16
What is your "perfect" £10,000 for Local LLM, Gaming, plex with the following conditional and context. -- 2025-07-16
How to use Claude code -- 2025-07-16
unsloth/Kimi-K2-Instruct-GGUF -- 2025-07-16
moonshotai/Kimi-K2-Base -- 2025-07-16
It's been a while, I'm out of date, suggest me a model -- 2025-07-16
i need the best local llm i can run on my gaming pc -- 2025-07-16
Import of chatgbt Export Zip File with Images of entire previous chats -- 2025-07-14
Which is the best small local LLM models for tasks like doing research and generating insights -- 2025-07-10
Looking for practical advice with my MSc thesis “On-Premise Orchestration of SLMs” (OpenWebUI + SLM v LLM benchmarking on multiple GPUs) -- 2025-07-10
Deep Research with local LLM and local documents -- 2025-07-10
WikipeQA : An evaluation dataset for both web-browsing agents and vector DB RAG systems -- 2025-07-10
Looking for advice. -- 2025-07-10
OpenAI to release open-source model this summer - everything we know so far -- 2025-07-09
Accelerating Docker Builds by Halving EC2 Boot Time -- 2025-06-30
Show HN: Inspect and extract files from MSI installers directly in your browser -- 2025-06-28
Meet Mistral Devstral, SOTA open model designed specifically for coding agents -- 2025-06-26
1.93bit Deepseek R1 0528 beats Claude Sonnet 4 -- 2025-06-26
DeepSeek R1 05/28 performance on five independent benchmarks -- 2025-06-26
Few-Shot Examples: Overfitting / Leakage -- 2025-06-26
Finetune a model to think and use tools -- 2025-06-26
I need help using open web UI with Ollama. Help installing and getting it running win 11 -- 2025-06-26
I built/am building a micro-transformer for learning and experimentation -- 2025-06-26
I shipped more code yesterday with Claude 4 than the last 3 weeks combined -- 2025-06-26
A deep dive into self-improving AI and the Darwin-Gödel Machine -- 2025-06-26
Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) -- 2025-06-25
Is it possible to give Gemma 3 or any other model on-device screen awareness? -- 2025-06-25
Chonkie update. -- 2025-06-25
Memory Layer Compatible with Local Llama -- 2025-06-25
Ollama/AnythingLLM on Windows 11 with AMD RX 6600: GPU Not Utilized for LLM Inference - Help! -- 2025-06-25
How can synthetic data improve a model if the model was the thing that generated that data? -- 2025-06-25
After reading OpenAI's GPT-4.1 prompt engineering cookbook, I created this comprehensive Python coding template -- 2025-06-25
The 55% Regret Club: How AI-First Companies Are Learning Lessons the Hard Way -- 2025-06-25
Microsoft-backed Builder.ai enters insolvency proceedings -- 2025-06-25
Show HN: FaynoSync Self-Hosted API for Automatic App Updates -- 2025-06-23
identicallead/mse6 -- 2025-06-22
GCC 13.4 Released with 129 additional bug fixes -- 2025-06-22
Databricks acquires Neon -- 2025-06-22
Java Virtual Threads Ate My Memory: A Web Crawler's Tale of Speed vs. Memory -- 2025-06-20
Show HN: Zeekstd – Rust Implementation of the ZSTD Seekable Format -- 2025-06-20
DeepSeek R1 05 28 Tested. It finally happened. The ONLY model to score 100% on everything I threw at it. -- 2025-06-17
ubergarm/DeepSeek-R1-0528-GGUF -- 2025-06-17
LLM training on RTX 5090 -- 2025-06-17
[DEMO] I created a coding agent that can do dynamic, runtime debugging. -- 2025-06-17
Is anyone productively using Aider and Ollama together? -- 2025-06-17
For everyone who's still confused by Attention... I made this spreadsheet just for you(FREE) -- 2025-06-17
What setup/model do you use and what’s your monthly spend? -- 2025-06-17
Xiaomi released an updated 7B reasoning model and VLM version claiming SOTA for their size -- 2025-06-17
hhftechnology/middleware-manager -- 2025-06-17
Show HN: McWig – A modal, Vim-like text editor written in Go -- 2025-06-17
The Unreliability of LLMs and What Lies Ahead -- 2025-06-10
007: Democratically Finding The Cause of Packet Drops -- 2025-06-08
langtalks/swe-agent -- 2025-06-08
wey-gu/py-pglite -- 2025-06-08
0-$π$ qubit in one Josephson junction -- 2025-06-07
100-kT Magnetic field generation using paisley targets by femtosecond laser-plasma interactions -- 2025-06-07
fileshare-go/fileshare -- 2025-06-04
Rust Coreutils 0.1.0 Release -- 2025-06-04
Show HN: Samchika – A Java Library for Fast, Multithreaded File Processing -- 2025-06-04
100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems -- 2025-06-04
100+ Metrics for Software Startups - A Multi-Vocal Literature Review -- 2025-06-04
Building a plug-and-play vector store for any data stream (text, audio, video, etc.)—searchable by your LLM via MCP -- 2025-05-29
Building a real-world LLM agent with open-source models—structure > prompt engineering -- 2025-05-29
New LocalLLM Hardware complete -- 2025-05-29
Parameter-Efficient Fine-Tuning (PEFT) Explained -- 2025-05-29
LLM help for recovering deleted data? -- 2025-05-29
AI Runner v4.10.0 Release Notes -- 2025-05-29
Unpopular opinion: RAG is actively hurting your coding agents -- 2025-05-29
Teal – A statically-typed dialect of Lua -- 2025-05-29
I think it's time to give Nix a chance -- 2025-05-29
deepseek-ai/DeepSeek-R1-0528 -- 2025-05-29
AM5 or TRX4 for local LLMs? -- 2025-05-29