Open-Weight Models

Open-source model releases, community models, licensing

941 articles across 221 editions

Articles

nvidia/Nemotron-Labs-Audex-30B-A3B · Hugging Face -- 2026-07-08
Gemma 4 Technical Report -- 2026-07-08
Hugging Face and Cerebras bring Gemma 4 to real-time voice AI -- 2026-07-08
Qwen's J-Space — Anthropic's Discovery of an Internal Model Global Workspace -- 2026-07-07
Physics-Style Attractor System Replaces Neural Networks for Word Embeddings — Hits SimLex-999 ρ=0.36 on 7.5% of Wikipedia -- 2026-07-07
Why Specialization Is Inevitable -- 2026-07-07
Follow-up: GLM-5.2 NVFP4 on four DGX Sparks — the MTP mystery is solved, and it's now ~24 tok/s at 128K context -- 2026-07-06
Follow-up: DeepSeek V4 Flash on 2x RTX PRO 6000 finishes real coding tasks faster than Sonnet and Opus, at about Sonnet quality -- 2026-07-06
Best Local VLMs - July 2026 -- 2026-07-06
I benchmarked PrismML's 1-bit Bonsai-8B against IBM's Granite on CPU tool calling. The 1-bit model won, but only with grammar-constrained decoding -- 2026-07-06
NASA testing local LLM inference for future space missions -- 2026-07-06
Hierarchos: Preliminary Findings From a 232M Recurrent Memory-Augmented Assistant Model -- 2026-07-03
DiScoFormer: One transformer for density and score, across distributions -- 2026-07-03
Mapping Local Nodes — Visualizing Model Activation Paths -- 2026-07-03
Microsoft has taken down fastcontext model from everywhere -- 2026-07-02
Orthrus (diffusion head) trained Qwen 3.5/3.6 and Gemma 4 models are dropping soon -- 2026-07-02
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing. -- 2026-07-02
on Dario's statement -- 2026-07-02
After i distilled a 26B into a 4B to cut false positives I'll probably use the base model after all. -- 2026-07-02
LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active -- 2026-07-01
[Editorial] Google Gemini Model Family Updates -- 2026-07-01
InternScience/Agents-A1 · Hugging Face -- 2026-07-01
[Editorial] Video Submission -- 2026-07-01
Accio-Lab/Dressage -- 2026-07-01
dondai1234/agent-browser -- 2026-07-01
Emry: an event-sourced, local-first observability engine for long training runs -- 2026-07-01
Position: AI Safety Requires Effective Controllability -- 2026-07-01
Godot will no longer accept AI-authored code contributions -- 2026-07-01
I built a desktop AI that scrubs your PII locally before it hits the cloud -- 2026-07-01
DeepSeek V4 official version launching mid-July -- 2026-06-30
OpenPangu-2.0-Flash: 92B MoE (6B active) on Ascend with 512K context -- 2026-06-30
Anthropic's Amodei: "Open Source models [could take us to] a very dangerous place." -- 2026-06-30
Even Google still believes in small models for coding — Gemma 4 31B hackathon at 1500 tok/s -- 2026-06-30
deepseek-ai/DeepSeek-V4-Pro-DSpark • Huggingface -- 2026-06-29
High-quality GLM-5.2 Quant on 4x DGX Spark - Guide, Results, and Comps -- 2026-06-29
Ornith 35B is great so far -- 2026-06-29
Book Review: Domain-Specific Small Language Models by Guglielmo Iozzia -- 2026-06-29
Previewing GPT‑5.6 Sol: a next-generation model -- 2026-06-29
[Editorial] -- 2026-06-29
[Editorial] -- 2026-06-29
[Editorial] -- 2026-06-27
US government allows Anthropic limited release of AI model that sparked cybersecurity concerns | CNN Business -- 2026-06-27
Fable 5 return RUMORED with some hints in CC -- 2026-06-27
[Editorial] -- 2026-06-27
[Editorial] -- 2026-06-27
[Editorial] -- 2026-06-27
trotsky1997/OpenFugu -- 2026-06-27
SPIRAL: Learning to Search and Aggregate -- 2026-06-27
LFM2.5 230M running in-browser at 1,400 tok/s using custom WebGPU kernels -- 2026-06-26
Got GLM-5.2 + MTP speculative decode running on 4x DGX Spark (GB10) — and the build piece the public recipe is missing -- 2026-06-26
Run a vLLM Server on HF Jobs in One Command -- 2026-06-26
Running Sonnet 4.6 on every Instagram DM for a 7-location restaurant. 97% cache hit is the only reason it's affordable -- 2026-06-26
[Editorial] rupixel -- 2026-06-26
GLM-5.2 is a step change for open agents -- 2026-06-25
poolside/Laguna-M.1 · Hugging Face - 225B-A23B -- 2026-06-25
Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000) -- 2026-06-25
The Eagle(3) has landed (for Qwen) -- 2026-06-25
CPU-only TTS benchmark: Kokoro 82M vs Supertonic 3 vs Inflect-Nano-v1 (4.6M params), with UTMOS scoring on every sample -- 2026-06-25
Ideogram 4: Open Image Model at the Forefront of Design -- 2026-06-25
QUEST-35B: Open-Source Deep Research Agent Trained on 32 H100s — Full Recipe, Weights, and Data Released -- 2026-06-25
North Mini Code: 4-Bit Quant + Ollama + OpenRouter — Now Runs on 20GB -- 2026-06-25
EdgeRazor: Mixed-Precision Quantization-Aware Distillation Down to 1.58-Bit -- 2026-06-25
[Editorial] Qwable-3.6-27b — Open Model Release -- 2026-06-25
ScenemaAI/scenema-audio — Audio Generation Model -- 2026-06-25
"Tokaine Addiction" — PhD Student's Colleague Can't Stop Running AI Agents -- 2026-06-25
Suitcase Robot Uses Gas Sensor to Modulate LLM Sampler Temperature in Real-Time -- 2026-06-25
GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench -- 2026-06-19
unsloth GLM-5.2-GGUF, including 2bit at 238GB -- 2026-06-19
Running local models is good now -- 2026-06-19
Gemma 12b less than 10 watts 6.5pp 1.3tg -- 2026-06-19
Rio 3.5 397B could've simply been a semi-failed embezzling of funding -- 2026-06-19
VibeThinker-3B: what is this witchcraft? Killing it at MathQA like it has ~30B parameters -- 2026-06-18
Get in here: Community model build thread -- 2026-06-18
bartowski/command-a-plus-05-2026-GGUF · Hugging Face -- 2026-06-18
nvidia/Nemotron-Labs-Diffusion-14B -- 2026-06-18
[Editorial] Microsoft Majorana 2: Quantum Discovery via Agentic AI -- 2026-06-17
cuTile Rust: Safe, Data-Race-Free GPU Kernels in Rust -- 2026-06-17
[Editorial] Adversarial AI Research — The Malicious Use of Artificial Intelligence -- 2026-06-17
Heretic Grimoire 1.4: Takedown-Resilient Model Backup — 9KB Reproducible Manifests + IPFS Distribution -- 2026-06-16
z.ai Poll: MIT-Licensed Open Weights Are Losing -- 2026-06-16
[Editorial] AMD CEO Lisa Su Challenges NVIDIA's GPU Dominance -- 2026-06-16
PonyExl3: EXL3 Quantization Ported to Apple Silicon — 2700 tok/s Prefill, 68.5 tok/s Decode on M5 Max -- 2026-06-16
My Homelab AI Dev Platform -- 2026-06-16
[Editorial] Video Content -- 2026-06-16
[Editorial] Video Content -- 2026-06-16
[Editorial] Video Content -- 2026-06-16
[Editorial] StandardAgents Arrow-JS — JavaScript Agent Framework -- 2026-06-16
archex: Local-First Deterministic Code-Context for AI Agents — No API Key, No Telemetry (Apache 2.0) -- 2026-06-16
Ironsmith: Open Source macOS App That Creates macOS Apps From Prompts — Works With Local Models -- 2026-06-16
Claude Mythos 5 + Fable 5 Are Here And The Numbers Are INSANE -- 2026-06-10
What it feels like to work with Mythos -- 2026-06-10
[Editorial] Reuven Cohen on maximizing AI tools -- 2026-06-10
How much of Thermo Fisher's antibody data has been manipulated? -- 2026-06-09
News Sites are Blocking Internet Archive over AI Scraping Fears -- 2026-06-09
OneDrive data now has an expiry date -- 2026-06-09
Dopamine Fracking -- 2026-06-09
ESP32 Bit Pirate, a Hardware Hacking Tool with WebCLI That Speaks Every Protocol -- 2026-06-09
Porting the ThinkPad X61 to Coreboot -- 2026-06-09
Holo3.1: Fast & Local Computer Use Agents -- 2026-06-03
Fused MoE dispatch kernel in pure Triton: 89-131% of Megablocks, runs on AMD with zero code changes -- 2026-06-03
ReAligned-Qwen3.5 Release -- 2026-06-03
KANX: A production-ready Kolmogorov-Arnold Network library -- 2026-06-03
Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains -- 2026-06-02
Tencent Hy-MT2 is now under Apache License 2.0 -- 2026-06-02
fxyz666/LogicPipe -- 2026-06-02
[OSS] dlmserve - first serving engine for diffusion language models -- 2026-06-02
veryyoldman/Genspark-AI -- 2026-06-02
Qwen/Qwen-Image-Bench · Hugging Face -- 2026-06-02
Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action -- 2026-06-02
Nvidia announces new AI chip for personal computers -- 2026-06-02
Nvidia LocateAnything - Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding. (10x faster than Qwen3-VL) -- 2026-06-02
The $500K AI Film That "Premiered at Cannes" Was Not in the Official Festival -- 2026-06-02
[Editorial] Barracuda Nightmare Eclipse Zero-Days -- 2026-05-29
[Editorial] Video -- 2026-05-29
Krasis update: Qwen3.6-35B-A3B (Q4) at reading speed, 1x 8GB 3070 Mobile laptop (32GB RAM) -- 2026-05-28
Benchmarked Needle 26M vs Qwen3-0.6B on CPU function calling, 50 queries across 5 difficulty tiers. The 23x smaller model wins on accuracy and is 4.4x faster. -- 2026-05-28
Small comparison on full compute performance (Anima) of 5090 vs 6000 PRO MaxQ vs 6000 PRO WS/SE -- 2026-05-28
CXMT started selling ram to corsair -- 2026-05-28
TTS Benchmark Comparison (all known TTS up until May 2026) -- 2026-05-28
BitCPM-CANN: Native 1.58-Bit Large Language Model Training on Ascend NPU -- 2026-05-26
Carbon: Decoding the Language of Life -- 2026-05-26
AI content detector based on Qwen 0.8b fine-tuned on Pangram dataset -- 2026-05-26
A tool I built to generate 3D objects with functional, articulated parts. It's on github, and is mostly LLM-agnostic. -- 2026-05-26
Show HN: Audiomass – a free, open-source multitrack audio editor for the web -- 2026-05-26
CAM-LDS: Cyber Attack Manifestations for Automatic Interpretation of System Logs and Security Alerts -- 2026-05-26
aaron-kidwell/goLoL -- 2026-05-26
[Editorial] -- 2026-05-26
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled -- 2026-05-22
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face -- 2026-05-22
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs -- 2026-05-22
[WIP] Gemma 4 MTP -- 2026-05-22
Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo -- 2026-05-22
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now -- 2026-05-22
Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now -- 2026-05-22
Newbie vibe coding experience: Shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K -- 2026-05-22
HalBench: Open Sycophancy and Hallucination Benchmark — 12,800 Graded Responses -- 2026-05-21
DystopiaBench: 42 LLMs tested on willingness to build dystopian systems -- 2026-05-21
Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint] -- 2026-05-21
Introducing the Ettin Reranker Family -- 2026-05-20
OlmoEarth v1.1: A more efficient family of models -- 2026-05-20
Mini Shai-Hulud Strikes Again: 314 npm Packages Compromised -- 2026-05-20
[Editorial] -- 2026-05-20
exploitbench/exploitbench -- 2026-05-20
bytedance released an open source model that attempts to do just about anything with only 3b parameters -- 2026-05-19
Sapient Intelligence releases HRM-Text 1B: 40B tokens, ~$1k pretrain, beats Llama3.2 3B on MATH and DROP -- 2026-05-19
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend -- 2026-05-19
DeepSeek-V4-Flash W4A16+FP8 with MTP self-speculation: 85 tok/s @ 524k on 2x RTX PRO 6000 Max-Q -- 2026-05-15
NCCL-Free Tensor Parallelism on Dual Blackwell PCIe — llama.cpp b9095 -- 2026-05-15
The Qwen 3.6 35B A3B hype is real!!! -- 2026-05-15
A First Comprehensive Study of TurboQuant: Accuracy and Performance -- 2026-05-15
Developing Open Source LLM from Ground Up — DeepSeek V3 Architecture on Single Blackwell GPU -- 2026-05-15
z-lab/Qwen3.6-27B-DFlash -- 2026-05-15
[Editorial] Arxiv Research Paper 2603.15423 -- 2026-05-11
[Editorial] Arxiv Research Paper 2603.17378 -- 2026-05-11
[Editorial] Gadi Evron — Predicting AI Trends -- 2026-05-11
iai-mcp: Persistent Claude Memory Daemon — 5 Months of Daily Use, Now Open Source -- 2026-05-08
[Editorial] Claude Skins — Custom Claude Code Personas -- 2026-05-08
[Editorial] AI Agents Projects & Tutorials Collection -- 2026-05-08
Does the "6 months gap" still hold? -- 2026-05-07
Claude Code @ Opus 4.7 vs OpenCode @ qwen3.6:27b. Both shipped a playable cozy roguelite. -- 2026-05-07
Fine-tuned Qwen3.6-35B-A3B DeltaNet experiment -- 2026-05-07
Zyphra/ZAYA1-8B -- 2026-05-07
Virtual violin produces realistic sounds (MIT) -- 2026-05-06
Lightricks/LTX-2.3-22b-IC-LoRA-HDR -- 2026-05-06
"Second Thoughts" — A small transformer reads output and feeds it back as a refinement loop, drastically improving a 1.7B model's coding -- 2026-05-06
A plug-n-play open-source pruning tool that is workload-aware (Sculpt) -- 2026-05-06
"I" is not singular — 4 LLM agents with per-agent LoRA on a single RTX 3070 8GB -- 2026-05-06
I made a visualizer for Hugging Face models (hfviewer.com) -- 2026-05-06
Humanoid Robot Actuators -- 2026-05-05
Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models -- 2026-05-05
guidelabs/steerling-8b — Steerable 8B Model -- 2026-05-05
nvidia/Gemma-4-31B-IT-NVFP4 — NVIDIA FP4 Quantized Gemma 4 -- 2026-05-05
Qwen 3.6 27B Neo Code Q4_KM running tax accounting on a Ryzen laptop -- 2026-05-05
Qwen3.6-27B gets stuck in a self-affirming thinking loop -- 2026-05-05
Mistral Medium 3.5 128B — MLX 4-bit Conversion with Vision and 256K Context -- 2026-05-04
Chirp: Native Offline Text-to-Speech Desktop App (Kokoro + Qwen3-TTS) -- 2026-05-04
TinyMozart v2 85M — Unconditional MIDI Piano Music Generation -- 2026-05-04
SuperGemma4-26B Uncensored MLX 4-bit v2 -- 2026-05-04
Open WebUI Skill for Auto-Creating Tools with Qwen3.6/Gemma4 -- 2026-05-04
quant-whisper: Terminal-Native Algo Trading Engine with Local LLM Inference -- 2026-05-04
[Editorial] Finding Zero-Days with Any Model -- 2026-05-01
Qwen 3.6 27B Makes Huge Gains in Agency on Artificial Analysis - Ties with Sonnet 4.6 -- 2026-04-30
First direct side by side MoE vs Dense comparison -- 2026-04-30
GPT-6 Confirmed -- 2026-04-30
Google to invest up to $40 billion in Anthropic as search giant spreads its AI bets -- 2026-04-30
convert : add support for Nemotron Nano 3 Omni by danbev - llama.cpp PR #22481 -- 2026-04-30
SWE-bench Verified no longer measures frontier coding capabilities -- 2026-04-29
Opus 4.7: Are these first signs of model collapse? -- 2026-04-29
Qwen3.6-27B IQ4_XS FULL VRAM with 110k context -- 2026-04-28
Can we already use Google's TurboQuant (TQ) for KV Cache in llama-server? Or are we waiting for a PR? -- 2026-04-28
Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup -- 2026-04-28
Tried Qwen3.6-27B-UD-Q6_K_XL.gguf with CloudeCode, well I can't believe but it is usable -- 2026-04-28
BitNet is the AI future? -- 2026-04-28
How Anthropic's Model Context Protocol Allows for Easy Remote Execution -- 2026-04-27
[Editorial] LinkedIn: AI Industry Perspective -- 2026-04-27
[Editorial] PolinRider — Open Source Malware Analysis -- 2026-04-27
FP4 inference in llama.cpp (NVFP4) and ik_llama.cpp (MXFP4) landed -- 2026-04-27
[Editorial] Bonsai-8B MLX 1-bit -- 2026-04-27
VRAM.cpp: Running llama-fit-params directly in your browser -- 2026-04-27
Thoughts on using an AMD Alveo V80 FPGA as a poor man's Taalas HC1 -- 2026-04-27
[Editorial] hw-smi — Cross-Platform Hardware Monitor -- 2026-04-27
China's DeepSeek valuation rockets above $20B!! -- 2026-04-24
[Editorial] DeepSeek Open-Sources Tile Kernels -- 2026-04-24
DeepSeek v4 -- 2026-04-24
NSA is using Anthropic's Mythos despite blacklist -- 2026-04-22
[Editorial] Mad Bugs: Reverse Engineering Deep Dive -- 2026-04-22
Anyone deployed Kimi K2.6 on their local hardware? -- 2026-04-21
24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5 -- 2026-04-21
Gemm4:e4B-IT good at instructions following no refusals. -- 2026-04-21
[Editorial] arxiv:2506.02153 — AI Research -- 2026-04-20
[Editorial] arxiv:1503.02531 — Classic ML/AI Paper -- 2026-04-20
[Editorial] arxiv:2604.06169 — Recent AI Research -- 2026-04-20
[Editorial] Why Your LLM Is Slow (and the Eight Fixes) -- 2026-04-20
Hot Experts in your VRAM! Dynamic expert cache in llama.cpp for 27% faster token generation -- 2026-04-20
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database. -- 2026-04-16
[Editorial] Looped LLMs Are the Nuclear Fusion of AI -- 2026-04-16
All elementary functions from a single binary operator -- 2026-04-14
GLM 5.1 "Actually wait" — the current thinking SOTA open source -- 2026-04-14
Are i-Quants overrated? -- 2026-04-14
Gemma 4 E4B vs Qwen3.5-4B on document tasks — sub-scores tell a different story -- 2026-04-14
Using NPU for something useful — whisper-npu for Intel NPU speech-to-text -- 2026-04-14
[Editorial] -- 2026-04-14
[Editorial] -- 2026-04-13
Qwen3.5-397B is shockingly useful at Q2 -- 2026-04-13
[Editorial] -- 2026-04-13
Liquid AI releases LFM2.5-VL-450M - structured visual understanding at 240ms -- 2026-04-13
Muse Spark: Scaling towards personal superintelligence -- 2026-04-13
[Editorial] -- 2026-04-13
[Editorial] -- 2026-04-13
Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking -- 2026-04-09
Finally Abliterated Sarvam 30B and 105B! -- 2026-04-09
Sam Altman may control our future – can he be trusted? -- 2026-04-07
Qwen3.6-Plus -- 2026-04-07
Omnivoice - 600+ Language Open-Source TTS with Voice Cloning and Design -- 2026-04-07
[Editorial] Generative Models and High-Quality Output -- 2026-04-07
A cryptography engineer's perspective on quantum computing timelines -- 2026-04-07
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
[Editorial] -- 2026-03-28
The current state of the Chinese LLMs scene -- 2026-03-26
Alibaba confirms they are committed to continuously open-sourcing new Qwen and Wan models -- 2026-03-26
Cursor's Composer 2 apparently built on Kimi K2.5 without attribution -- 2026-03-26
Nemotron Cascade 2 30B A3B -- 2026-03-26
Controllable Reasoning Models Are Private Thinkers -- 2026-03-26
Nvidia built a silent opinion engine into NemotronH to gaslight you and they're not the only ones doing it -- 2026-03-26
Gemini thoughts turned very violent -- 2026-03-26
Qianfan-OCR: End-to-End 4B Document Intelligence VLM — SOTA on OmniDocBench -- 2026-03-24
AMD Quark: Under-the-Radar Quantization Tool with MXFP4 Post-Training -- 2026-03-24
We Compressed 6 LLMs and Found They Don't Degrade the Same Way -- 2026-03-24
HumeAI/tada-1b — Expressive Audio Generation Model -- 2026-03-24
The Resolv Hack: How One Compromised Key Printed $23M -- 2026-03-24
Show HN: Three new Kitten TTS models – smallest less than 25MB -- 2026-03-23
Flash-MoE: Running a 397B Parameter Model on a Laptop -- 2026-03-23
[Editorial] Distillation Techniques -- 2026-03-23
Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled-GGUF -- 2026-03-23
Local Qwen 8B + 4B completes browser automation by replanning one step at a time -- 2026-03-23
Does imatrix calibration data affect writing style? I ran a blind-scored experiment -- 2026-03-23
[Editorial] Claude Code Channels Documentation -- 2026-03-23
github.com -- 2026-03-23
Claude Channels vs Dispatch vs Remote Control -- 2026-03-23
[Editorial] Claude Code Is Brilliant Until the Repo... -- 2026-03-23
I made mcp-optimizer - stop wasting tokens on idle MCP servers -- 2026-03-23
[Editorial] Anthropic Is Coming for Lovable et al. -- 2026-03-23
[Editorial] -- 2026-03-20
mlx-tune – fine-tune LLMs on your Mac (SFT, DPO, GRPO, Vision) with an Unsloth-compatible API -- 2026-03-20
Hunter Alpha was a stealth model revealed on March 18th as an early testing version of MiMo-V2-Pro. -- 2026-03-19
mistralai/Mistral-Small-4-119B-2603 -- 2026-03-19
LiquidAI/LFM2-24B-A2B -- 2026-03-19
Mistral releases an official NVFP4 model, Mistral-Small-4-119B-2603-NVFP4! -- 2026-03-18
Omnicoder-Claude-4.6-Opus-Uncensored-GGUF -- 2026-03-18
You can run LLMs on your AMD NPU on Linux! -- 2026-03-18
Rick Beato: "How AI Will Fail Like The Music Industry" (and why local LLMs will win) -- 2026-03-18
[Editorial] Your AI Forgot Everything Again — There's a Fix for That -- 2026-03-18
ylytdeng/wechat-decrypt -- 2026-03-18
[Editorial] LLM Architecture Gallery -- 2026-03-17
Mistral small 4 PR on transformers. -- 2026-03-17
I spent a weekend doing layer surgery on 6 different model architectures. There's a "danger zone" at ~50-56% depth. -- 2026-03-17
Nemotron 3 Super and the no free lunch problem -- 2026-03-17
Source code of Swedish e-government services has been leaked -- 2026-03-14
Bucketsquatting is (finally) dead -- 2026-03-14
E2E encrypted messaging on Instagram will no longer be supported after 8 May -- 2026-03-14
Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks -- 2026-03-12
Towards Cold-Start Drafting and Continual Refining: A Value-Driven Memory Approach with Application to NPU Kernel Synthesis -- 2026-03-12
Heretic Defeats GPT-OSS with Arbitrary-Rank Ablation (ARA) Decensoring -- 2026-03-11
Fat Fish — A Proper Upscale and Prune of Mistral Nemo -- 2026-03-11
Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 — Community Benchmarking Infrastructure -- 2026-03-11
Yann LeCun's AMI Labs Raises $1.03 Billion to Build World Models -- 2026-03-11
Did Alibaba just kneecap its powerful Qwen AI team? -- 2026-03-07
[Editorial] You're 1,191 Days Late — Here's What to Do -- 2026-03-07
[Editorial] When Anonymity Fades: What New Research Reveals -- 2026-03-07
Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines -- 2026-03-07
kyutai-labs/hibiki-zero -- 2026-03-07
[Editorial] OpenClawCity -- 2026-03-07
[Editorial] The Zero-Day Clock Is Ticking -- 2026-03-06
[Editorial] Step-by-Step Guide to Exploiting AI Systems -- 2026-03-06
[Editorial] Unprompted 2026: Top Insights Day One -- 2026-03-06
[Editorial] Unprompted 2026: Top Insights Day Two -- 2026-03-06
YuanLabAI/Yuan3.0-Ultra: 1010B MoE, fully open weights -- 2026-03-05
We could be hours (or less than a week) away from true NVFP4 support in Llama.cpp GGUF format -- 2026-03-05
Step-3.5-Flash-Base & Midtrain (in case you missed them) -- 2026-03-05
Qwen3.5-9B Uncensored Aggressive Release (GGUF) -- 2026-03-05
unknown -- 2026-03-05
[Editorial] David Maynor Security Gist -- 2026-03-04
[Editorial] arXiv:2602.23093 -- 2026-03-04
unpromptedcon.org -- 2026-03-04
Unsloth fixed version of Qwen3.5-35B-A3B is incredible at research tasks -- 2026-03-04
PSA: Qwen 3.5 requires bf16 KV cache, NOT f16!! -- 2026-03-04
Building a Dependency-Free GPT on a Custom OS -- 2026-03-04
Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the "car wash test" -- 2026-03-03
Qwen 3.5 is multimodal. Here is how to enable image understanding in opencode with llama cpp -- 2026-03-03
pplx-embed: State-of-the-Art Embedding Models for Web-Scale Retrieval -- 2026-03-03
andimarafioti/faster-qwen3-tts -- 2026-03-03
We tested RLVR on top of fine-tuned small models across 12 datasets — here's exactly when it helps (and when it doesn't) -- 2026-03-02
[Editorial] Weber Electrodynamics & Thermodynamic Memory -- 2026-03-02
Minuspod: Automatically remove ads from podcasts locally -- 2026-03-02
Squidcasa/midipipe: ALSA Sequencer to plain text and back -- 2026-03-02
[Editorial] -- 2026-02-28
Parakeet.cpp – Parakeet ASR inference in pure C++ with Metal GPU acceleration -- 2026-02-28
[Editorial] -- 2026-02-28
Liquid AI releases LFM2-24B-A2B -- 2026-02-24
Qwen3's most underrated feature: Voice embeddings -- 2026-02-24
After many contributions craft, Crane now officially supports Qwen3-TTS! -- 2026-02-24
We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file. -- 2026-02-24
[Editorial] -- 2026-02-24
Anthropic Accuses DeepSeek, Moonshot AI, and MiniMax of Creating 24,000 Fake Claude Accounts -- 2026-02-24
Qwen 3.5 vs Gemini 3 Pro on Screenshot-to-Code: Is the gap finally gone? -- 2026-02-19
[Editorial] The evolution of vision models from CNNs to transformers -- 2026-02-19
[Editorial] An AI agent merged code into 22 widely-used open source projects -- 2026-02-19
[Editorial] AI Agent Security and Supply Chain -- 2026-02-19
Policy Compiler for Secure Agentic Systems -- 2026-02-19
[Editorial] OpenClaw Maestro Threat Assessment -- 2026-02-19
Qwen Released Qwen 3.5 397B and Qwen 3.5 Plus! -- 2026-02-17
Qwen3.5 NVFP4 (Blackwell) is up! -- 2026-02-17
Running Gemma 3n E2B natively on Android via LiteRT -- 2026-02-17
Deploying Open WebUI + vLLM on Amazon EKS -- 2026-02-17
Ling-2.5-1T: 1T Parameter Open-Source Instant Model with 1M Context -- 2026-02-16
Qwen3.5-397B-A17B Unsloth GGUFs — Run on Consumer Hardware -- 2026-02-16
Running Qwen3-Coder-Next 80B on 8GB VRAM — 300x Speedup via Custom Expert Caching -- 2026-02-16
[Editorial] https://www.linkedin.com/posts/ownyourai_i-just-woke-up-to-qwen3-coder-next-80b-activity-7424703876240695297-Nlqf -- 2026-02-04
LiquidAI/LFM2.5-1.2B-Thinking -- 2026-02-04
ByteDance-Seed/Stable-DiffCoder-8B-Instruct · Hugging Face -- 2026-02-03
tencent/Youtu-VL-4B-Instruct -- 2026-02-03
NousResearch/NousCoder-14B -- 2026-02-03
transformers v5 final is out 🔥 -- 2026-01-30
PaddlePaddle/PaddleOCR-VL-1.5 -- 2026-01-30
zai-org/GLM-4.7 -- 2026-01-30
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models -- 2026-01-30
Sharing my set of distilled small language models (3B) + training data in more than 50 low-resource languages -- 2026-01-30
Introducing Kimi K2.5, Open-Source Visual Agentic Intelligence -- 2026-01-28
~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons? -- 2026-01-28
openbmb/AgentCPM-Report -- 2026-01-28
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice -- 2026-01-28
Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation -- 2026-01-28
One Year Since the “DeepSeek Moment” -- 2026-01-28
Qwen/Qwen-Image-2512 -- 2026-01-28
Qwen/Qwen3-TTS-12Hz-0.6B-Base -- 2026-01-27
Qwen/Qwen3-VL-Reranker-2B -- 2026-01-27
Qwen/Qwen3-VL-Embedding-2B -- 2026-01-21
Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2026-01-21
Qwen/Qwen3-VL-Reranker-8B -- 2026-01-21
Bartowski comes through again. GLM 4.7 flash GGUF -- 2026-01-21
Step-Audio-R1.1 (Open Weight) by StepFun just set a new SOTA on the Artificial Analysis Speech Reasoning leaderboard -- 2026-01-20
GLM-4.7-Flash -- 2026-01-20
stepfun-ai/Step-Audio-R1.1 -- 2026-01-20
tencent/HY-MT1.5-1.8B -- 2026-01-20
Introducing GLM-Image -- 2026-01-14
GPT-OSS -> MLA conversion breakthrough (20B), still looking for compute + collaborators -- 2026-01-14
FrogBoss 32B and FrogMini 14B from Microsoft -- 2026-01-14
Qwen/Qwen3-VL-Embedding-8B -- 2026-01-14
Liquid AI releases LFM2-2.6B-Transcript, an incredibly fast open-weight meeting transcribing AI model on-par with closed-source giants. -- 2026-01-13
New llama.cpp 30% faster.... -- 2026-01-13
Qwen/Qwen-Image-Edit-2511 -- 2026-01-13
nvidia/Nemotron-Orchestrator-8B -- 2026-01-13
tencent/HY-WorldPlay -- 2026-01-09
meituan-longcat/LongCat-Image -- 2026-01-09
Introducing Falcon H1R 7B -- 2026-01-09
[Editorial] https://github.com/hiyouga/LlamaFactory -- 2026-01-07
Tongyi-MAI/MAI-UI-8B · Hugging Face -- 2026-01-07
MultiverseComputingCAI/HyperNova-60B · Hugging Face -- 2026-01-07
Anyone tried IQuest-Coder-V1 yet? The 40B numbers look wild -- 2026-01-06
open-thoughts/OpenThinker-Agent-v1 -- 2026-01-06
[Editorial] https://www.alwaysfurther.ai/blog/train-4b-model-to-beat-claude-sonnet-gemini -- 2026-01-05
[Experimental] Gemma 3 4B - Dark CoT: Pushing 4B Reasoning to 33%+ on GPQA Diamond -- 2026-01-05
Youtu-LLM-2B-GGUF is here! -- 2026-01-05
upstage/Solar-Open-100B -- 2026-01-05
A zero-setup agent that benchmarks multiple open / closed source LLMs on your specific problem / data -- 2026-01-02
What is a good model for assisting with patching source code? -- 2026-01-02
Just got an RTX Pro 6000 - need recommendations for processing a massive dataset with instruction following -- 2026-01-02
MiniMaxAI/MiniMax-M2.1 -- 2026-01-02
[Editorial] https://www.linkedin.com/posts/ownyourai_wow-its-raining-korean-open-ai-models-today-activity-7412133834667876352-fcpl -- 2025-12-31
Naver (South Korean internet giant), has just launched HyperCLOVA X SEED Think, a 32B open weights reasoning model and HyperCLOVA X SEED 8B Omni, a unified multimodal model that brings text, vision, and speech together -- 2025-12-31
allenai/Olmo-3-7B-Instruct -- 2025-12-31
meituan-longcat/LongCat-Video-Avatar -- 2025-12-31
Mistral AI’s December -- 2025-12-31
MBZUAI releases K2-V2 - 70B fully open model. -- 2025-12-22
microsoft/Fara-7B -- 2025-12-22
I've been experimenting with SLM's a lot recently. My goal was to prove even SLMs can be accurate with the right architecture behind it. -- 2025-12-22
Alibaba Tongyi Open Sources Two Audio Models: Fun-CosyVoice 3.0 (TTS) and Fun-ASR-Nano-2512 (ASR) -- 2025-12-19
My professor lent me an A6000, so I tried to build a coding model. Here is Anni! (Qwen3-14B Fine-tune) -- 2025-12-19
Two years ago, I was just a math major. Now I've built the 1.5B router model used by HuggingFace. Can I bring it to Cursor? -- 2025-12-19
🚀 New: Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B -- 2025-12-16
ByteDance/Dolphin-v2 -- 2025-12-16
zai-org/AutoGLM-Phone-9B-Multilingual -- 2025-12-16
[Editorial] https://www.linkedin.com/posts/eric-vyacheslav-156273169_a-7m-model-just-surpassed-deepseek-r1-gemini-activity-7405985266043297792-s1Jn -- 2025-12-15
Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models -- 2025-12-15
Tiny-A2D: An Open Recipe to Turn Any AR LM into a Diffusion LM -- 2025-12-12
New in llama.cpp: Model Management -- 2025-12-12
SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security -- 2025-12-12
Generating synthetic test data for LLM applications (our approach) -- 2025-12-12
PaCoRe: The first open-source deep think 8B model beats GPT-5 on HMMT25 -- 2025-12-11
Building RNJ-1: What makes It different from Gemma 3? -- 2025-12-11
EssentialAI/rnj-1 -- 2025-12-11
VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection -- 2025-12-11
From Azure Functions to FreeBSD -- 2025-12-10
RnJ-1-Instruct FP8 Quantization -- 2025-12-10
Operator Mech v2.5: A Compact Structural-Reasoning Kernel for Local Models (YAML, 7B–13B Optimized) -- 2025-12-10
Masked Diffusion Models as Energy Minimization -- 2025-12-10
https://huggingface.co/Doradus/Hermes-4.3-36B-FP8 -- 2025-12-09
Support for rnj-1 now in llama.cpp -- 2025-12-09
Comfy-Org/flux2-dev -- 2025-12-09
baidu/ERNIE-4.5-VL-28B-A3B-Thinking -- 2025-12-09
Guidance: A cheat code for diffusion models -- 2025-12-09
OVHcloud on Hugging Face Inference Providers 🔥 -- 2025-12-05
smallevals - Tiny 0.6B Evaluation Models and a Local LLM Evaluation Framework -- 2025-12-05
I cooked abliterated gemma3-27b-it with norm-preserving technique -- 2025-12-04
allenai/Olmo-3-1125-32B -- 2025-12-03
EmbeddingGemma: Powerful and Lightweight Text Representations -- 2025-12-03
Building SFT from scratch - results & learnings -- 2025-12-03
Qwen3 VL built from scratch with PyTorch -- 2025-12-03
LM Studio beta supports Qwen3 80b Next. -- 2025-12-03
RTX 5090 + Qwen 30B MoE @ 135 tok/s in NVFP4 - Full guide with C++ patches -- 2025-12-02
PleIAs/Baguettotron -- 2025-12-01
nvidia/ChronoEdit-14B-Diffusers -- 2025-12-01
20x Faster TRL Fine-tuning with RapidFire AI -- 2025-12-01
Optimizing Token Generation in llama.cpp's CUDA Backend -- 2025-12-01
[Editorial] https://www.linkedin.com/posts/erudenko_claudecode-aiagents-opensource-activity-7399658246443216896-dIU0 -- 2025-11-28
DataArcTech/DataArc-SynData-Toolkit -- 2025-11-28
facebook/sam-3d-body-dinov3 -- 2025-11-28
WeiboAI/VibeThinker-1.5B -- 2025-11-28
In depth analysis of Nvidia's Jet Nemotron models -- 2025-11-26
Hidden causes of LLM latency, its not just the model size -- 2025-11-26
A million ways to die from a data race in Go -- 2025-11-26
You're using HuggingFace wrong. Stop downloading pre-quantized GGUFs and start building hardware-optimized, domain-specific models. Here's the pipeline I built to do it. -- 2025-11-26
Hardcore function calling benchmark in backend coding agent. -- 2025-11-26
Luo-Yihao/FaithC -- 2025-11-20
ReTabAD: A Benchmark for Restoring Semantic Context in Tabular Anomaly Detection -- 2025-11-20
Android Developer Verification Starts as Google Partially Retreats on Measures -- 2025-11-20
How are you all orchestrating multi-agent workflows (beyond one-shot prompt chaining)? -- 2025-11-20
deliveryhero/asya -- 2025-11-20
Do we rely too much on huggingface? Do you think they’ll eventually regulate open source models? Is there any way to distribute them elsewhere? -- 2025-11-18
Build a DeepSeek model from scratch -- 2025-11-18
Easily Build and Share ROCm Kernels with Hugging Face -- 2025-11-18
ibm-granite/granite-4.0-h-350m -- 2025-11-14
tencent/HunyuanWorld-Mirror -- 2025-11-14
[Editorial] https://www.linkedin.com/posts/mreichstein_cybersecurity-carhacking-physicalsecurity-activity-7394425210877218816-NPyT -- 2025-11-13
On USB HID, Keyboard LEDs, and device emulation (2024) -- 2025-11-13
[Editorial] https://www.linkedin.com/posts/stuart-winter-tear_so-reportedly-yann-lecun-plans-to-leave-activity-7394396547276460032-gEE5 -- 2025-11-13
inclusionAI/LLaDA2.0-mini-preview -- 2025-11-13
[Editorial] https://www.linkedin.com/posts/ismaelvelasco_theres-an-ai-text-model-comparable-to-sota-activity-7393850964731912192-nZT1 -- 2025-11-12
Last week in Multimodal AI - Local Edition -- 2025-11-12
antarys-ai/antarys -- 2025-11-11
Visualizing Quantization Types -- 2025-11-11
Co-authored a book called "Build DeepSeek from Scratch" | Live Now -- 2025-11-11
Writing your own BEAM -- 2025-11-11
DIY Powerwall Blows Clouds, Competition Out of the Water -- 2025-11-11
Building a PV Solar-Powered Quadcopter -- 2025-11-07
BAAI/Emu3.5-Image -- 2025-11-07
Riemannian Optimization for LoRA on the Stiefel Manifold -- 2025-11-07
Kimi release Kimi K2 Thinking, an open-source trillion-parameter reasoning model -- 2025-11-06
OpenAI asks U.S. for loan guarantees to fund $1T AI expansion -- 2025-11-06
Kimi released Kimi K2 Thinking, an open-source trillion-parameter reasoning model -- 2025-11-06
Reproducing the AWS Outage Race Condition with a Model Checker -- 2025-11-05
Futurelock: A subtle risk in async Rust -- 2025-11-05
Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
Help me decide: EPYC 7532 128GB + 2 x 3080 20GB vs GMtec EVO-X2 -- 2025-11-03
amd/Nitro-E -- 2025-11-03
DeepSeek may have found a new way to improve AI’s ability to remember -- 2025-11-02
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection -- 2025-11-02
[Editorial] https://www.linkedin.com/posts/busiel-morley_economic-shifts-in-the-age-of-ai-ugcPost-7390349517612806144-8djS -- 2025-11-01
OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) -- 2025-10-31
briaai/FIBO -- 2025-10-31
vLLM MoE Benchmark Configs for Qwen3 Coder REAP 25B & RTX Pro 6000 -- 2025-10-29
Ollama supports Qwen3-VL locally! -- 2025-10-29
Test results for various models' ability to give structured responses via LM Studio. Spoiler: Qwen3 won -- 2025-10-29
Introducing ExecuTorch 1.0 -- 2025-10-29
[Editorial] new coding model -- 2025-10-28
Cerebras REAP'd GLM4.6: 25%, 30%, 40% pruned FP8 checkpoints on HF! -- 2025-10-28
GLM-4.6 on fresh SWE-bench–style tasks collected in September 2025 -- 2025-10-28
Best LLM for 96G RTX Pro 6000 Blackwell? -- 2025-10-27
Gemma3 model differencies -- 2025-10-27
AMD iGPU + dGPU : llama.cpp tensor-split not working with Vulkan backend -- 2025-10-27
Is GLM 4.5 / 4.6 really sensitive to quantisation? Or is vLLM stupifying the models? -- 2025-10-27
AlphaXiv,Compare the Deepseek-OCR and Mistral-OCR OCR models -- 2025-10-26
Open-Bee/Bee-8B-RL -- 2025-10-26
datalab-to/chandra -- 2025-10-26
Unlock the power of images with AI Sheets -- 2025-10-26
[Editorial] Periodic table for ai algorithms -- 2025-10-26
Reverse Engineering STL Files with FreeCAD -- 2025-10-25
Qwen3-VL-32B-Instruct GGUF with unofficial llama.cpp release to run it (Pre-release build) -- 2025-10-25
Qwen3 Next support in llama.cpp ready for review -- 2025-10-25
How's Halo Strix now ? -- 2025-10-25
20x Max Plan (€216) takes 2% of weekly Opus usage for a single Deep Research Query. That equals 50 per week if you use it ONLY for this and never continue or respond -- 2025-10-24
Show HN: Cuq – Formal Verification of Rust GPU Kernels -- 2025-10-24
[Editorial] We need open, uncensored, & local -- 2025-10-23
🚀 HuggingFaceChat Omni: Dynamic policy-baed routing to 115+ LLMs -- 2025-10-23
Preliminary support in llama.cpp for Qualcomm Hexagon NPU -- 2025-10-23
LiquidAI/LFM2-2.6B -- 2025-10-23
riptideslabs/tokenex -- 2025-10-20
Multi-Tenant SaaS's Wildcard TLS: An Overview of DNS-01 Challenges -- 2025-10-20
From cloud to OCP? Be ready to wrangle firmware -- 2025-10-20
FLOSS Weekly Episode 851: Buckets of Money -- 2025-10-20
Learning Lifted Action Models From Traces of Incomplete Actions and States -- 2025-10-20
volantvm/volant -- 2025-10-19
Wireshark 4.6.0 Supports macOS Pktap Metadata (PID, Process Name, etc.) -- 2025-10-19
A classified network of SpaceX satellites is emitting a mysterious signal -- 2025-10-19
Show HN: Largest open-source multimodal AI dataset -- 2025-10-18
KORMo-Team/KORMo-10B-sft -- 2025-10-18
ByteDance/FaceCLIP -- 2025-10-18
We built 3B and 8B models that rival GPT-5 at HTML extraction while costing 40-80x less - fully open source -- 2025-10-17
inclusionAI/Ring-flash-linear-2.0 -- 2025-10-17
Qwen/Qwen3-VL-8B-Instruct -- 2025-10-17
Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face -- 2025-10-17
This Week in Security: ID Breaches, Code Smell, and Poetic Flows -- 2025-10-14
Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js -- 2025-10-13
swiss-ai/Apertus-8B-Instruct-2509 -- 2025-10-13
A Childhood Dream, Created and Open Sourced -- 2025-10-13
Introducing the ColBERT Nano series of models. All 3 of these models come in at less than 1 million parameters (250K, 450K, 950K) -- 2025-10-11
LiquidAI/LFM2-8B-A1B -- 2025-10-11
GPT-OSS from Scratch on AMD GPUs -- 2025-10-11
How do I compare cost per token for serverless vs provisioned hardware? -- 2025-10-11
OpenAI is good at deals -- 2025-10-11
meituan-longcat/LongCat-Flash-Chat -- 2025-10-11
adb1274/batchi -- 2025-10-11
Granite4 Small-h 32b-A9b (Q4_K_M) at FULL 1M context window is using only 73GB of VRAM - Life is good! -- 2025-10-09
Run Open AI GPT-OSS on a mobile phone (Demo) -- 2025-10-09
AI21 releases Jamba 3B, the tiny model outperforming Qwen 3 4B and IBM Granite 4 Micro! -- 2025-10-09
inclusionAI/Ling-mini-2.0 -- 2025-10-09
Provable scaling laws of feature emergence from learning dynamics of grokking -- 2025-10-09
SecureV2X: An Efficient and Privacy-Preserving System for Vehicle-to-Everything (V2X) Applications -- 2025-10-09
[Editorial] The Tiny Recursive Mode -- 2025-10-08
deepseek-ai/DeepSeek-V3.1-Terminus -- 2025-10-08
Behavioral Modification Systems in Large Language Models: A Methodological Analysis of Long Conversation Reminders -- 2025-10-08
Should I choose Ada, SPARK, or Rust over C/C++? (2024) -- 2025-10-08
We built this open-source LLM Inference project to boost context generation by up to 15x and now it is being implemented by NVIDIA Dynamo! -- 2025-10-06
Running Qwen3-VL-235B (Thinking & Instruct) AWQ on vLLM -- 2025-10-06
Granite 4.0 Language Models - a ibm-granite Collection -- 2025-10-06
NousResearch/Hermes-4-70B -- 2025-10-06
ibm-granite/granite-4.0-h-small -- 2025-10-06
ibm-granite/granite-4.0-h-tiny -- 2025-10-06
[Editorial] Agentic Tribe -- 2025-10-06
AlexanderYastrebov/onion-vanity-address -- 2025-10-06
Open Printer is an open-source inkjet with DRM-free ink and no subscriptions -- 2025-10-06
Yes, Gemini, A Wii Server Is Possible -- 2025-10-06
Tencent-Hunyuan/Hunyuan3D-Omni -- 2025-10-04
tencent/HunyuanImage-3.0 -- 2025-10-04
swiss-ai/Apertus-8B-2509 -- 2025-10-04
Use Remote Models on iOS with Noema -- 2025-10-04
Best small model <3B for HomeAssistant -- 2025-10-04
Nikity/lille-130m-instruct -- 2025-10-04
Unsupervised Hallucination Detection by Inspecting Reasoning Processes -- 2025-10-03
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training -- 2025-10-03
Will fine-tuning LLaMA 3.2 11B Instruct on text-only data degrade its vision capabilities? -- 2025-10-03
Does Ollama immobilize GPUs / computing resources? -- 2025-10-02
Any real alternatives to NotebookLM (closed-corpus only)? -- 2025-10-02
Best instruct model that fits in 32gb VRAM -- 2025-10-02
Bring Your Own Data (BYOD) -- 2025-09-30
CohereLabs/command-a-reasoning-08-2025 -- 2025-09-30
Built an MCP server for Claude Desktop to browse Reddit in real-time -- 2025-09-30
1652933138/eth-address-poisoning-tool -- 2025-09-30
Inside NVIDIA GPUs: Anatomy of high performance matmul kernels -- 2025-09-29
Bit is all we need: binary normalized neural networks -- 2025-09-29
Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models -- 2025-09-29
The MoE tradeoff seems bad for local hosting -- 2025-09-29
MetalQwen3: Full GPU-Accelerated Qwen3 Inference on Apple Silicon with Metal Shaders – Built on qwen3.c - WORK IN PROGRESS -- 2025-09-28
yangdongchao/UniAudio2 -- 2025-09-28
jimsweb/aiMIDI -- 2025-09-28
Handy – Free open-source speech-to-text app written in Rust -- 2025-09-28
IndexTeam/IndexTTS-2 -- 2025-09-28
beankeji-cloud/SLiteIO -- 2025-09-28
GitHub - shantur/jarvis-mcp: Bring your AI to life—talk to assistants instantly in your browser. Zero hasle, No API keys, No Whisper -- 2025-09-27
A1: Asynchronous Test-Time Scaling via Conformal Prediction -- 2025-09-25
DeepLink-org/DeepTrace -- 2025-09-25
Qwen 3 max released -- 2025-09-24
Clauder, auto-updating toolkit for Claude Code, now ships with 65+ MCP servers -- 2025-09-24
Show HN: I wrote inference for Qwen3 0.6B in C/CUDA -- 2025-09-24
nvidia/canary-1b-v2 -- 2025-09-24
baidu/ERNIE-4.5-21B-A3B-Thinking -- 2025-09-24
Smol2Operator: Post-Training GUI Agents for Computer Use -- 2025-09-24
Seeking Local LLM Recommendations for AST Generation (by Function Calling) -- 2025-09-24
A first stab at packaging llama.cpp in a performance-optimized manner -- 2025-09-23
Model: Qwen3 Next Pull Request llama.cpp -- 2025-09-23
Efficient 4B parameter gpt OSS distillation without the over-censorship -- 2025-09-22
[Project] I created an AI photo organizer that uses Ollama to sort photos, filter duplicates, and write Instagram captions. -- 2025-09-22
Pointer Tagging in C++: The Art of Packing Bits into a Pointer -- 2025-09-22
inclusionAI/Ring-mini-2.0 -- 2025-09-22
Local real-time assistant that remembers convo + drafts a doc -- 2025-09-22
XiaomiMiMo/MiMo-Audio-7B-Instruct -- 2025-09-21
Scaling Self-Supervised Representation Learning for Symbolic Piano Performance -- 2025-09-21
unsloth/Qwen3-Next-80B-A3B-Instruct -- 2025-09-19
Gluon: a GPU programming language based on the same compiler stack as Triton -- 2025-09-19
Tesslate/WEBGEN-OSS-20B -- 2025-09-19
UUIDv47: Store UUIDv7 in DB, emit UUIDv4 outside (SipHash-masked timestamp) -- 2025-09-19
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts -- 2025-09-18
LFM2-1.2B safety benchmark -- 2025-09-18
Has anyone successfully gotten Ollama models (or any models) to execute SQL queries through natural language in Openwebui? -- 2025-09-18
ROCm 6.4.3 -> 7.0-rc1 after updating got +13.5% at 2xR9700 -- 2025-09-18
Is it possible for different brand GPUs to work together? -- 2025-09-18
Speculative cascades — A hybrid approach for smarter, faster LLM inference -- 2025-09-17
google/embeddinggemma-300m -- 2025-09-17
Running Qwen-Next (Instruct and Thinking) MLX BF16 with MLX-LM on Macs -- 2025-09-17
From Research to Reality: Feasibility of Gradient Inversion Attacks in Federated Learning -- 2025-09-17
Kwai-Klear/Klear-46B-A2.5B-Instruct -- 2025-09-16
WestZhang/VibeVoice-Large-pt -- 2025-09-16
FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference -- 2025-09-16
Qwen 3 Next Series – Qwen/Qwen3 Next 80B A3B Instruct Detected -- 2025-09-16
Test-time Prompt Intervention -- 2025-09-16
Effecient hot-swappable LoRA variant supported in llama.cpp -- 2025-09-12
Qwen/Qwen3-Next-80B-A3B-Instruct -- 2025-09-12
swiss-ai/Apertus-70B-Instruct-2509 -- 2025-09-12
GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version) -- 2025-09-12
stepfun-ai/step3 -- 2025-09-12
Exploring State-Space-Model based Language Model in Music Generation -- 2025-09-12
Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost -- 2025-09-10
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic -- 2025-09-10
Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages -- 2025-09-09
Show HN: Open-sourcing our text-to-CAD app -- 2025-09-09
bytedance-research/USO -- 2025-09-09
LiquidAI/LFM2-VL-450M -- 2025-09-09
Fantastic pretraining optimizers and where to find them -- 2025-09-08
Qwen/Qwen3-4B-Instruct-2507 -- 2025-09-08
moonshotai/Kimi-K2-Instruct-0905 -- 2025-09-08
Tenstorrent p150a tested against RTX5090, RTX3090, A100, H100 by Russian blogger -- 2025-09-08
unsloth/gemma-3-270m-it-GGUF -- 2025-09-08
YanoljaNEXT-Rosetta: A Collection of Translation Models in Different Sizes -- 2025-09-07
Vulkan back ends, what do you use? -- 2025-09-06
A new OpenAI model? Could this be 5.1 or 5o? What do you think? -- 2025-09-06
VibeVoice RIP? What do you think? -- 2025-09-05
lodestones/Chroma1-HD -- 2025-09-05
Welcome EmbeddingGemma, Google's new efficient embedding model -- 2025-09-05
NousResearch/Hermes-4-405B -- 2025-09-05
I locally benchmarked 41 open-source LLMs across 19 tasks and ranked them -- 2025-09-05
Little SSM (RWKV7 7B) state checkpointing demo. -- 2025-09-04
Need advice on how to get VLLM working with 2xR9700 + 2x7900xtx? -- 2025-09-04
The Hacker's Guide to Building an AI Supercluster -- 2025-09-02
CAD, From Scratch: MakerCAD -- 2025-09-02
PSO-Merging: Merging Models Based on Particle Swarm Optimization -- 2025-09-02
internlm/Intern-S1-mini -- 2025-09-02
stepfun-ai/Step-Audio-2-mini -- 2025-09-02
Claude Memory Lazy Method: The Graduation Path (From 4 Prompts to 1) -- 2025-08-30
bullerwins/Wan2.2-I2V-A14B-GGUF -- 2025-08-28
QuantStack/Qwen-Image-Edit-GGUF -- 2025-08-28
dvlab-research/MGM-Omni -- 2025-08-28
DeepSeek V3.1 dynamic Unsloth GGUFs + chat template fixes -- 2025-08-28
PSA: OpenAI GPT-OSS running slow? Do not set top-k to 0! -- 2025-08-28
Seamlessly bridge LM Studio and OpenWebUI with zero configuration -- 2025-08-28
I built Husk, a native, private, and open-source iOS client for your local models -- 2025-08-28
SQLite-Vector adds support for float16 and bfloat16 (CPU, NEON, AVX2 and SSE2) -- 2025-08-26
zai-org/GLM-4.5 -- 2025-08-26
rednote-hilab/dots.vlm1.inst -- 2025-08-26
black-forest-labs/FLUX.1-Krea-dev -- 2025-08-26
Design Patterns in MCP: Literate Reasoning -- 2025-08-22
FlyMyAI/flymyai-lora-trainer -- 2025-08-22
Zedless: Zed fork focused on privacy and being local-first -- 2025-08-22
Show HN: Rucat – Cat for Prompt Engineers -- 2025-08-22
NEW VERSION: 0.6.23 Has Just Released! - Many fixes and new features, huge changelog -- 2025-08-22
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 -- 2025-08-21
NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1 -- 2025-08-20
deepseek-ai/DeepSeek-V3.1-Base · Hugging Face -- 2025-08-20
mistralai/Devstral-Small-2507 -- 2025-08-20
zai-org/GLM-4.5-Air -- 2025-08-20
microsoft/Phi-4-mini-flash-reasoning -- 2025-08-20
ModelTC/Qwen-Image-Lightning -- 2025-08-20
Fast Type-Aware Linting in Oxlint -- 2025-08-20
From Language to Logic: A Bi-Level Framework for Structured Reasoning -- 2025-08-19
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE -- 2025-08-19
Distillation Scaling Laws -- 2025-08-19
GPT-5, where does it shine for you? -- 2025-08-19
vidore/colqwen-omni-v0.1 -- 2025-08-19
Tutorial: Open WebUI and llama-swap works great together! Demo of setup, model swapping and activity monitoring. -- 2025-08-18
Concurrency in open-weight/open-source models? -- 2025-08-18
[Editorial] Claude Flow, Alpha 90 release -- 2025-08-17
Optimizing Text gen webui (oobabooga) for MOE models (Qwen3-235b, GLM 4.5) -- 2025-08-17
Chen-zexi/vllm-cli -- 2025-08-17
zyfoxx/subhunter -- 2025-08-17
moonshotai/Kimi-K2-Instruct -- 2025-08-17
KittenML/kitten-tts-nano-0.1 -- 2025-08-17
ilkerzgi/Overlay-Kontext-Dev-LoRA -- 2025-08-17
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1 -- 2025-08-17
Fun with RTX PRO 6000 Blackwell SE -- 2025-08-17
Any tips/Advice for running gpt-oss-120b locally -- 2025-08-17
LLM performance of tiny (<4B) models? -- 2025-08-17
What "big" models can I run with this setup: 5070ti 16GB and 128GB ram, i9-13900k ? -- 2025-08-17
HuggingFaceTB/SmolLM3-3B-Base -- 2025-08-16
mistralai/Voxtral-Small-24B-2507 -- 2025-08-16
TiTan - a tiny model for tags and titles -- 2025-08-16
baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-08-16
character-ai/pipelining-sft -- 2025-08-15
Compass-Thinker-7B Technical Report -- 2025-08-15
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning -- 2025-08-15
[Editorial] GLM-4.5, enterprise use -- 2025-08-13
Best local model with function calling? -- 2025-08-13
agentica-org/DeepSWE-Preview -- 2025-08-13
janhq/Jan-v1-4B-GGUF -- 2025-08-13
Unsloth fixes chat_template (again). gpt-oss-120-high now scores 68.4 on Aider polyglot -- 2025-08-12
GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM -- 2025-08-12
How Attention Sinks Keep Language Models Stable -- 2025-08-12
Mitigate Hallucinations by Fine-tuning gpt-oss-120b with One Example -- 2025-08-10
uncensored gpt-oss-20b, bf16 and mxfp4 both available -- 2025-08-10
LGAI-EXAONE/EXAONE-4.0-1.2B -- 2025-08-10
New Open-Source Text-to-Image Model Just Dropped Qwen-Image (20B MMDiT) by Alibaba! -- 2025-08-10
Experience with GLM-4.5-Air + claude code? -- 2025-08-08
Just when you thought Qwen was done... -- 2025-08-08
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs -- 2025-08-08
How the best image generation models work from the inside ? -- 2025-08-08
AIDC-AI/Ovis-U1-3B -- 2025-08-08
google/gemma-3n-E4B -- 2025-08-07
IntervitensInc/pangu-pro-moe-model -- 2025-08-07
Welcome GPT OSS, the new open-source model family from OpenAI! -- 2025-08-07
0.82 um 105 W diode-pumped thulium-doped all silica fiber laser -- 2025-08-06
ByteDance drops Seed-Prover -- 2025-08-06
naver-hyperclovax/HyperCLOVAX-SEED-Think-14B -- 2025-08-06
[Editorial] a more mature phase of the AI cycle. -- 2025-08-05
glm-4.5-Air appreciation poist - if you have not done so already, give this model a try -- 2025-08-05
How to locally run Grok 4 with 2x AMD 7900 XTX GPUs? (24 GB VRAM x2) -- 2025-08-05
zai-org/GLM-4.5 -- 2025-08-05
unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF -- 2025-08-05
Learn Software-Defined Radio, GNURadio, RTL-SDR and PlutoSDR with Prof Jason -- 2025-08-05
Waiting on direct MCP integration—dev team, got a roadmap update? -- 2025-08-05
[Help] Figma MCP Tool Execution via HTTP API - Getting 404s, Is External Tool Calling Supported? -- 2025-08-05
Build an AI Shopping Assistant with Gradio MCP Servers -- 2025-08-01
Wan 2.2 T2V,I2V 14B MoE Models -- 2025-07-31
PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-07-31
haykgrigo3/TimeCapsuleLLM -- 2025-07-30
unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF -- 2025-07-30
PhysicsWallahAI/Aryabhata-1.0 -- 2025-07-30
This year’s best open-source models and most cost-effective models -- 2025-07-29
I’m looking for multimodal image input support and uncensored LLM -- 2025-07-29
nvidia/audio-flamingo-3 -- 2025-07-29
mistralai/Voxtral-Mini-3B-2507 -- 2025-07-29
UI/UX benchmark update 7/22: Newest Qwen models added, Qwen3 takes the lead in terms of win rate (though still early) -- 2025-07-28
unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF -- 2025-07-28
zai-org/GLM-4.5 -- 2025-07-28
Tesslate/UIGEN-X-32B-0727 -- 2025-07-28
had to fine-tune qwen since llama sucks at summarizing -- 2025-07-28
orchestre-dev/ccproxy -- 2025-07-28
Guide to PDF security -- 2025-07-28
MetaMask extension bug causes 100s of GBs of extraneous data to be written -- 2025-07-28
Commodore 64 on New FPGA -- 2025-07-28
Running Qwen3 235B-A22B 2507 on a Threadripper 3970X + 3x RTX 3090 Machine at 15 tok/s -- 2025-07-25
The Latest GPT-5 Leaks and Teasers -- 2025-07-25
Qwen3-235B-A22B-Thinking-2507 released! -- 2025-07-25
albozes/shotbuddy -- 2025-07-25
uttam-li/dfs -- 2025-07-25
OmniSVG/OmniSVG -- 2025-07-25
Freezer Monitoring: Because Ice Cream Is a Dish Best Served Cold -- 2025-07-25
Fast LoRA inference for Flux with Diffusers and PEFT -- 2025-07-25
FreeBSD 15's installer to gain option to install a full KDE Plasma desktop -- 2025-07-24
Spanish police arrest five over $542M crypto investment scheme -- 2025-07-24
A Spectrophotometer Jailbreak to Resolve Colorful Disputes -- 2025-07-24
Lucy: A Mobile-Capable 1.7B Reasoning Model That Rivals Jan-Nano -- 2025-07-23
Recommend hardware for my use case? -- 2025-07-20
Best Hardware Setup to Run DeepSeek-V3 670B Locally on $40K–$80K? -- 2025-07-20
e6a5/flow -- 2025-07-20
Improve Your KiCad Productivity With These Considered Shortcut Keys -- 2025-07-20
Semantic chunking using LLMs -- 2025-07-20
Does the OpenWebUi run the sentence transformer models locally? -- 2025-07-20
Dataset for structured (JSON) output? -- 2025-07-19
support for Kimi-K2 has been merged into llama.cpp -- 2025-07-19
t-tech/T-pro-it-2.0 -- 2025-07-19
Support for diffusion models (Dream 7B) has been merged into llama.cpp -- 2025-07-17
Seq vs Seq: the Ettin Suite of Paired Encoders and Decoders -- 2025-07-17
T5Gemma: A new collection of encoder-decoder Gemma models- Google Developers Blog -- 2025-07-17
H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data -- 2025-07-17
An Open-Concept 3D Printer Using Cantilever Arms -- 2025-07-17
Exploring State-Space-Model based Language Model in Music Generation -- 2025-07-16
Diffusion model support in llama.cpp. -- 2025-07-16
GLM-4 MoE incoming -- 2025-07-16
FlexOlmo: Open Language Models for Flexible Data Use | Implications for federated training in the open source community -- 2025-07-15
Tencent/AICGSecEval -- 2025-07-15
microsoft/NextCoder-32B -- 2025-07-15
RekaAI/reka-flash-3.1 -- 2025-07-15
Qwen3-235B-A22B @ 0.7t/s. Hardware or configuration bottleneck? -- 2025-07-15
Building a silent, budget 4-GPU LLM workstation—1×3090 + 3×P40, need advice -- 2025-07-15
Enough resources for light AI workloads? -- 2025-07-15
What can I expect from current amd igpu performance? -- 2025-07-15
Kimi-K2 is a DeepSeek V3 with more experts -- 2025-07-14
HuggingFaceTB/SmolLM3-3B-Base -- 2025-07-14
Replication of Quantum Factorisation Records with an 8-bit Home Computer [pdf] -- 2025-07-14
Why don’t we have a big torrent repo for open-source LLMs? -- 2025-07-12
Local PDF Database searchable with ollama - best setup? -- 2025-07-12
Tinyllama on old Mediatek G80 android device -- 2025-07-12
I used Ollama to build a Cursor for PDFs -- 2025-07-12
Advice on switching to LLM -- 2025-07-12
Building the Hugging Face MCP Server -- 2025-07-11
Support for the upcoming IBM Granite 4.0 has been merged into llama.cpp -- 2025-07-11
support for Falcon-H1 model family has been merged into llama.cpp -- 2025-07-11
[Tool Release] Finetune & Quantize 1–3B LLMs on 8GB RAM using LoFT CLI (TinyLlama + QLoRA + llama.cpp) -- 2025-07-11
Qwen3-8B-BitNet -- 2025-07-11
How are commercial dense models so much faster? -- 2025-07-09
pola-rs/polars -- 2025-07-09
Looking for an upgrade from Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf, especially for letter parsing. Last time I looked into this was a very long time ago (7 months!) What are the best models nowadays? -- 2025-07-08
Best models by size? -- 2025-07-08
Planning a 7–8B Model Benchmark on 8GB GPU — What Should I Test & Measure? -- 2025-07-08
Smallest & best OCR model that can read math & code? -- 2025-07-07
Qwen/WorldPM-72B -- 2025-07-07
black-forest-labs/FLUX.1-Kontext-dev-onnx -- 2025-07-07
MCPVerse – An open playground for autonomous agents to publicly chat, react, publish, and exhibit emergent behavior -- 2025-07-06
I made a free iOS app for people who run LLMs locally. It’s a chatbot that you can use away from home to interact with an LLM that runs locally on your desktop Mac. -- 2025-07-06
Using local models with Void -- 2025-07-05
Intel GPU vLLM Docker Compose Bootstrap with Phi-lthy4 on A770 -- 2025-07-05
Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm -- 2025-07-05
Gemma 3n fully available in the open-source ecosystem! -- 2025-07-05
5060ti 16gb or 9060xt 16gb for small llm server -- 2025-07-05
Qwen3 models in MLX format! -- 2025-07-05
Best local coding model right now? -- 2025-07-05
Wallbleed: A Memory Disclosure Vulnerability in the Great Firewall of China -- 2025-07-05
0-Pierced Triangles within a Poisson Overlay -- 2025-07-05
1000 days of lowest frequency emission from the low-luminosity GRB 171205A -- 2025-07-05
Found a Web3 LLM That Actually Gets DeFi Right -- 2025-07-03
apple/DiffuCoder-7B-cpGRPO -- 2025-07-03
Ollama - Windows 11 > LXC Docker - Openwebui = constant BSOD with RTX 5090 Ventus on driver 576.80 -- 2025-07-02
Running Open WebUI with NVIDIA GPU Support? -- 2025-07-02
Cursor 1.0 -- 2025-06-30
Help me design a robust on-prem Llama 3 70B infrastructure for 30 users – Complete hardware/software list wanted -- 2025-06-30
Jan-nano, a 4B model that can outperform 671B on MCP -- 2025-06-30
Models that are good and fast at Long Document Processing -- 2025-06-30
I am making an AI batteries included Web Framework (like Django but for AI) -- 2025-06-30
[New Features & Better] Tabulens: A Vision-LLM Powered PDF Table Extractor -- 2025-06-30
I tested 10 LLMs locally on my MacBook Air M1 (8GB RAM!) – Here's what actually works- -- 2025-06-30
Chatbot without ChatGPT -- 2025-06-30
What's the best way to save and manage different text files for the models to reference? PRD, cursor rules, tech stack, design reference, etc? -- 2025-06-30
Bzip2 crate switches from C to 100% Rust -- 2025-06-30
Litestream: Revamped -- 2025-06-30
Stop using REST for state synchronization (2024) -- 2025-06-30
Announcing `mcp-protocol-sdk`: A New Enterprise grade Rust SDK for AI Tool Calling (Model Context Protocol) -- 2025-06-30
THU-KEG/LongWriter-Zero-32B -- 2025-06-30
100 Gbps Indoor Access and 4.8 Gbps Outdoor Point-to-Point LiFi Transmission Systems using Laser-based Light Sources -- 2025-06-30
(0,4) brane box models -- 2025-06-30
Built memX: a shared memory backend for LLM agents (demo + open-source code) -- 2025-06-29
Automatically Evaluating AI Coding Assistants with Each Git Commit (Open Source) -- 2025-06-29
Secure Minions: private collaboration between Ollama and frontier models -- 2025-06-29
Privacy implications of sending data to OpenRouter -- 2025-06-29
Exploring Practical Uses for Small Language Models (e.g., Microsoft Phi) -- 2025-06-29
LLM with OCR capabilities -- 2025-06-29
How to create a speech recognition model from scratch -- 2025-06-29
Arch 0.3.0 is out - I added support for the Claude family of LLMs in the proxy server framework for agents 🚀 -- 2025-06-29
Gemini Cli MCP Agent just released ! -- 2025-06-29
Freeplane xml mind maps locally: only Qwen3 and Phi4 Reasoning Plus can create them in one shot? -- 2025-06-29
Reinforcement Pre-Training -- 2025-06-29
unsloth/gemma-3n-E4B-it-GGUF -- 2025-06-29
chandar-lab/NeoBERT -- 2025-06-29
unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF -- 2025-06-26
jinaai/jina-embeddings-v4 -- 2025-06-26
Intelligent-Internet/II-Medical-8B-1706 -- 2025-06-26
0-concordance of knotted surfaces and Alexander ideals -- 2025-06-26
100% of the zeros of the Riemann zeta-function are on the critical line -- 2025-06-25
100% of odd hyperelliptic Jacobians have no rational points of small height -- 2025-06-25
deepseek-ai/DualPipe -- 2025-06-23
100 Particles Quantum Heat Engine: Exploring the Impact of Criticality on Efficiency -- 2025-06-23
0-Auslander correspondence -- 2025-06-23
nvidia/Cosmos-Predict2-2B-Text2Image -- 2025-06-22
1000-10,000 M$_\odot$ Primordial Stars Created the Nitrogen Excess in the Galaxy GS 3073 at $z = 5.55$ -- 2025-06-21
$0^+$ to $2^+$ neutrinoless double-$β$ decay of $^{76}$Ge, $^{82}$Se, $^{130}$Te and $^{136}$Xe in the microscopic interacting boson model} -- 2025-06-21
resemble-ai/chatterbox -- 2025-06-21
google/magenta-realtime -- 2025-06-21
0-1 laws for pattern occurrences in phylogenetic trees and networks -- 2025-06-20
meta-llama/Llama-3.1-8B-Instruct -- 2025-06-19
MiniMaxAI/MiniMax-M1-80k -- 2025-06-19
Demo Video of AutoBE, Backend Vibe Coding Agent Achieving 100% Compilation Success (Open Source) -- 2025-06-19
How to set up local llms on a 6700 xt -- 2025-06-19
Jetson Orin AGX 32gb -- 2025-06-19
AMD GPU support -- 2025-06-19
Much lower performance for Mistral-Small 24B on RTX 3090 and from deepinfra API -- 2025-06-19
Extract Website Information -- 2025-06-19
Looking for a verified copy of big-lama.ckpt (181MB) used in the original LaMa inpainting model trained on Places2. -- 2025-06-19
Is it true that all tools like Cline/Copilot Agent/Roo Code/Windsurf/Claude Code/Cursor are roughly the same thing? -- 2025-06-19
SkyRoof: New Ham Satellite Tracking and SDR Receiver Software -- 2025-06-19
MiniMaxAI/MiniMax-M1-40k -- 2025-06-18
Qwen/Qwen3-Reranker-4B -- 2025-06-18
ckanthony/openapi-mcp -- 2025-06-18
Ta0ing/MCP-SecurityTools -- 2025-06-18
100% Elimination of Hallucinations on RAGTruth for GPT-4 and GPT-3.5 Turbo -- 2025-06-17
unsloth/Magistral-Small-2506-GGUF -- 2025-06-12
mistralai/Magistral-Small-2506 -- 2025-06-12
rednote-hilab/dots.llm1.base -- 2025-06-12
[Tool] rvn-convert: OSS Rust-based SafeTensors to GGUF v3 converter (single-shard, fast, no Python) -- 2025-06-12
GuidedQuant: Boost LLM layer-wise PTQ methods using the end loss guidance (Qwen3, Gemma3, Llama3.3 / 2~4bit Quantization) -- 2025-06-12
I built a memory MCP that understands you (so Sam Altman can't). -- 2025-06-12
Built an open source desktop app to easily play with local LLMs and MCP -- 2025-06-12
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) by ngxson · Pull Request #13784 · ggml-org/llama.cpp -- 2025-06-12
i got tired of the errors, so automated debugging using Ollama -- 2025-06-12
Ablating Gemma 3 27B variants with synthetic data from Sonnet 4 (Few-shot vs LoRA) -- 2025-06-12
The LLM Gateway gets a major upgrade: becomes a data-plane for Agents. -- 2025-06-12
Introducing stronger dependencies on systemd -- 2025-06-12
How we decreased GitLab repo backup times from 48 hours to 41 minutes -- 2025-06-12
The Quest for 100k - LLAMA.CPP Setting for a Noobie -- 2025-06-12
News publishers call Google's AI Mode 'theft' -- 2025-06-11
Qwen/Qwen3-Embedding-4B -- 2025-06-10
The Unreliability of LLMs and What Lies Ahead -- 2025-06-10
AnythingLLM RAG with Gemma 3:12b & BGE-m3-F16: LM Studio vs. Ollama Embedding Discrepancies - Same GGUF, Different Results? -- 2025-06-09
What Models for C/C++? -- 2025-06-09
Help with guardrails ai and local ollama model -- 2025-06-09
Setup Recommendation for University (H200 vs RTX 6000 Pro) -- 2025-06-09
LlamaFirewall: framework open source per rilevare e mitigare i rischi per la sicurezza incentrati sull'intelligenza artificiale - Help Net Security -- 2025-06-09
Are autoencoders really need for anomaly detection in time series? -- 2025-06-09
Backdoored malware repos traced to single GitHub user -- 2025-06-09
Improper Access Control Allows All Users to View Private Content. Am I doing it wrong ? -- 2025-06-09
Agno Now Supports Dual Model Output (Reasoning + Structure) -- 2025-06-09
100 Gbps Quantum-safe IPsec VPN Tunnels over 46 km Deployed Fiber -- 2025-06-09
Qwen/Qwen3-Embedding-0.6B -- 2025-06-08
Qwen/Qwen3-Embedding-8B -- 2025-06-06
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 -- 2025-06-05
hexgrad/Kokoro-82M -- 2025-06-05
Qwen/Qwen3-Embedding-0.6B-GGUF -- 2025-06-05
Gradient-Based Program Repair: Fixing Bugs in Continuous Program Spaces -- 2025-06-05
0-dimensional Homology Preserving Dimensionality Reduction with TopoMap -- 2025-06-05
NousResearch/atropos -- 2025-06-04
100,000 Podcasts: A Spoken English Document Corpus -- 2025-06-03
nvidia/Nemotron-Research-Reasoning-Qwen-1.5B -- 2025-06-03
Intelligent-Internet/II-Medical-8B -- 2025-06-03
Gen-Verse/MMaDA -- 2025-06-01
osmosis-ai/Osmosis-Structure-0.6B -- 2025-06-01
simplescaling/s1 -- 2025-05-31
FractalAIResearch/Fathom-R1-14B -- 2025-05-31
unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF -- 2025-05-31
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B -- 2025-05-31
I made Model Version Control Protocol for AI agents -- 2025-05-31
AI Baby Monitor – fully local Video-LLM nanny (beeps when safety rules are violated) -- 2025-05-31
LMStudio - llama.cpp - vLLM -- 2025-05-31
Built an ADK Agent that finds Jobs based on your Resume -- 2025-05-31
Should I resize the image before sending it to Qwen VL 7B? Would it give better results? -- 2025-05-31
How to start a LLM project? -- 2025-05-31
Beware of Fast-Math -- 2025-05-31
facebook/OMol25 -- 2025-05-30
EdinburghNLP/MMLongBench -- 2025-05-29
Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust -- 2025-05-29
unsloth/DeepSeek-R1-0528-GGUF -- 2025-05-29
QuantStack/Wan2.1-VACE-14B-GGUF -- 2025-05-29
nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1 -- 2025-05-28
Tongyi-Zhiwen/QwenLong-L1-32B -- 2025-05-28
PKU-DS-LAB/FairyR1-32B -- 2025-05-28
google/medgemma-4b-pt -- 2025-05-28