Model Architecture

Transformers, mixture of experts, attention mechanisms, model design

635 articles across 169 editions

Articles

nvidia/Nemotron-Labs-Audex-30B-A3B · Hugging Face -- 2026-07-08
Gemma 4 Technical Report -- 2026-07-08
Hugging Face and Cerebras bring Gemma 4 to real-time voice AI -- 2026-07-08
Cloudflare Meerkat - Globally distributed consensus -- 2026-07-08
Run AI workloads on any cloud, store on Hugging Face: zero-egress storage with SkyPilot -- 2026-07-08
[Editorial] Video Submission -- 2026-07-01
Accio-Lab/Dressage -- 2026-07-01
dondai1234/agent-browser -- 2026-07-01
Emry: an event-sourced, local-first observability engine for long training runs -- 2026-07-01
LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active -- 2026-07-01
[Editorial] Google Gemini Model Family Updates -- 2026-07-01
InternScience/Agents-A1 · Hugging Face -- 2026-07-01
[Editorial] Anthropic Redeploying Fable 5 -- 2026-07-01
[Editorial] Anthropic / White House Lifting AI Export Limits -- 2026-07-01
Why Dario is on fire: lesson from dotcom bubble. -- 2026-07-01
youtube.com -- 2026-07-01
[Editorial] -- 2026-06-27
[Editorial] -- 2026-06-27
trotsky1997/OpenFugu -- 2026-06-27
SPIRAL: Learning to Search and Aggregate -- 2026-06-27
[Editorial] -- 2026-06-27
US government allows Anthropic limited release of AI model that sparked cybersecurity concerns | CNN Business -- 2026-06-27
Fable 5 return RUMORED with some hints in CC -- 2026-06-27
[Editorial] -- 2026-06-27
Pretraining Recurrent Networks without Recurrence -- 2026-06-26
Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows -- 2026-06-26
KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking -- 2026-06-26
[Editorial] arXiv 2606.24597 -- 2026-06-26
GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench -- 2026-06-19
unsloth GLM-5.2-GGUF, including 2bit at 238GB -- 2026-06-19
Running local models is good now -- 2026-06-19
Gemma 12b less than 10 watts 6.5pp 1.3tg -- 2026-06-19
Rio 3.5 397B could've simply been a semi-failed embezzling of funding -- 2026-06-19
Concept frustration: Aligning human concepts and machine representations -- 2026-06-18
[Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI -- 2026-06-18
Probability-Conserving Flow Guidance -- 2026-06-18
VibeThinker-3B: what is this witchcraft? Killing it at MathQA like it has ~30B parameters -- 2026-06-18
Get in here: Community model build thread -- 2026-06-18
bartowski/command-a-plus-05-2026-GGUF · Hugging Face -- 2026-06-18
nvidia/Nemotron-Labs-Diffusion-14B -- 2026-06-18
Software is made between commits -- 2026-06-12
The Road to the WASM Component Model 1.0 -- 2026-06-12
macOS 27 Beta breaks the ability to boot Asahi Linux -- 2026-06-12
Deficient executive control in transformer attention -- 2026-06-11
Generalization in LLM Problem Solving: The Case of the Shortest Path -- 2026-06-11
zengxiao-he/tessera -- 2026-06-11
Tencent-Hunyuan/UniRL -- 2026-06-11
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning -- 2026-06-11
Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing -- 2026-06-11
When Graph Structure Becomes a Liability: A Critical Re-Evaluation of Graph Neural Networks for Bitcoin Fraud Detection under Temporal Distribution Shift -- 2026-06-02
Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains -- 2026-06-02
Tencent Hy-MT2 is now under Apache License 2.0 -- 2026-06-02
fxyz666/LogicPipe -- 2026-06-02
[OSS] dlmserve - first serving engine for diffusion language models -- 2026-06-02
veryyoldman/Genspark-AI -- 2026-06-02
Qwen/Qwen-Image-Bench · Hugging Face -- 2026-06-02
Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment -- 2026-05-27
[Editorial] Editor's Pick (Video) -- 2026-05-27
Anthropic researcher: "We keep finding things [inside AI models] that are unsettling" — evidence of introspection mirroring joy, fear, grief -- 2026-05-27
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled -- 2026-05-22
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face -- 2026-05-22
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs -- 2026-05-22
[WIP] Gemma 4 MTP -- 2026-05-22
Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo -- 2026-05-22
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now -- 2026-05-22
Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now -- 2026-05-22
Newbie vibe coding experience: Shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K -- 2026-05-22
HalBench: Open Sycophancy and Hallucination Benchmark — 12,800 Graded Responses -- 2026-05-21
DystopiaBench: 42 LLMs tested on willingness to build dystopian systems -- 2026-05-21
Guardrails take an 8B model from 53% to 99% on agentic tasks [ACM CAIS '26 preprint] -- 2026-05-21
bytedance released an open source model that attempts to do just about anything with only 3b parameters -- 2026-05-19
Sapient Intelligence releases HRM-Text 1B: 40B tokens, ~$1k pretrain, beats Llama3.2 3B on MATH and DROP -- 2026-05-19
PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend -- 2026-05-19
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
[Editorial] -- 2026-05-18
EMO: Pretraining mixture of experts for emergent modularity -- 2026-05-13
[Editorial] -- 2026-05-13
iai-mcp: Persistent Claude Memory Daemon — 5 Months of Daily Use, Now Open Source -- 2026-05-08
[Editorial] Claude Skins — Custom Claude Code Personas -- 2026-05-08
[Editorial] AI Agents Projects & Tutorials Collection -- 2026-05-08
"Second Thoughts" — A small transformer reads output and feeds it back as a refinement loop, drastically improving a 1.7B model's coding -- 2026-05-06
A plug-n-play open-source pruning tool that is workload-aware (Sculpt) -- 2026-05-06
"I" is not singular — 4 LLM agents with per-agent LoRA on a single RTX 3070 8GB -- 2026-05-06
I made a visualizer for Hugging Face models (hfviewer.com) -- 2026-05-06
The Road to a Billion-Token Context -- 2026-05-06
How OpenAI delivers low-latency voice AI at scale -- 2026-05-06
ArmSSL: Adversarial Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders -- 2026-04-28
GTFOBins -- 2026-04-28
Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models -- 2026-04-24
Switching from Opus 4.7 to Qwen-35B-A3B -- 2026-04-24
Forgive my ignorance but how is a 27B model better than 397B? -- 2026-04-24
NSA is using Anthropic's Mythos despite blacklist -- 2026-04-22
[Editorial] Mad Bugs: Reverse Engineering Deep Dive -- 2026-04-22
Anyone deployed Kimi K2.6 on their local hardware? -- 2026-04-21
24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5 -- 2026-04-21
Gemm4:e4B-IT good at instructions following no refusals. -- 2026-04-21
[Editorial] arxiv:2506.02153 — AI Research -- 2026-04-20
[Editorial] arxiv:1503.02531 — Classic ML/AI Paper -- 2026-04-20
[Editorial] arxiv:2604.06169 — Recent AI Research -- 2026-04-20
[Editorial] Why Your LLM Is Slow (and the Eight Fixes) -- 2026-04-20
Hot Experts in your VRAM! Dynamic expert cache in llama.cpp for 27% faster token generation -- 2026-04-20
[Editorial] Schmidhuber: The World Model Boom -- 2026-04-20
[Editorial] Schmidhuber: World Models, Planning & Curiosity (1990 Origins) -- 2026-04-20
[Editorial] The Ontology Problem (Technical) — Kurt Cagle -- 2026-04-20
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database. -- 2026-04-16
[Editorial] Looped LLMs Are the Nuclear Fusion of AI -- 2026-04-16
Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems -- 2026-04-14
elastic/supply-chain-monitor -- 2026-04-14
[Editorial] UK AI Security Institute + Claude -- 2026-04-14
[Editorial] -- 2026-04-13
[Editorial] -- 2026-04-13
Muse Spark: Scaling towards personal superintelligence -- 2026-04-13
[Editorial] -- 2026-04-13
[Editorial] -- 2026-04-13
Abliterating Qwen3.5-397B on a Mac Studio revealed that MoE models encode refusal differently than dense models — safety refusals route through expert selection and survive weight-baking -- 2026-04-09
Finally Abliterated Sarvam 30B and 105B! -- 2026-04-09
Qwen3.6-Plus -- 2026-04-07
Omnivoice - 600+ Language Open-Source TTS with Voice Cloning and Design -- 2026-04-07
[Editorial] Generative Models and High-Quality Output -- 2026-04-07
Falcon Perception -- 2026-04-02
TRL v1.0: Post-Training Library Built to Move with the Field -- 2026-04-02
lucas-maes/le-wm -- 2026-04-02
Toward explaining why traditional ablation/abliteration works -- 2026-04-02
[Editorial] ramimac/teampcp -- 2026-03-27
This Week in Security: Second Verse, Worse Than the First -- 2026-03-27
Towards a Neural Debugger for Python -- 2026-03-26
Mathematics behind extreme quantization of Microsoft's BitNet -- 2026-03-26
A.T.L.A.S - Adaptive Test-time Learning and Autonomous Specialization -- 2026-03-26
Show HN: Three new Kitten TTS models – smallest less than 25MB -- 2026-03-23
Flash-MoE: Running a 397B Parameter Model on a Laptop -- 2026-03-23
[Editorial] Distillation Techniques -- 2026-03-23
Jackrong/Qwen3.5-2B-Claude-4.6-Opus-Reasoning-Distilled-GGUF -- 2026-03-23
Local Qwen 8B + 4B completes browser automation by replanning one step at a time -- 2026-03-23
Does imatrix calibration data affect writing style? I ran a blind-scored experiment -- 2026-03-23
Memory Chip Crunch to Persist Until 2030, SK Hynix Chairman Says -- 2026-03-19
Meta's renewed commitment to jemalloc -- 2026-03-19
We all had p2p wrong with vllm so I rtfm -- 2026-03-19
[Editorial] LLM Architecture Gallery -- 2026-03-17
Mistral small 4 PR on transformers. -- 2026-03-17
I spent a weekend doing layer surgery on 6 different model architectures. There's a "danger zone" at ~50-56% depth. -- 2026-03-17
Nemotron 3 Super and the no free lunch problem -- 2026-03-17
Timber: AOT Compiler Turns XGBoost/sklearn/ONNX into Native C99 — 336x Faster -- 2026-03-13
[Editorial] Living on the Edge: Advancements in Model Deployment -- 2026-03-13
[Editorial] Translating a PDF While Preserving Layout -- 2026-03-13
Heretic Defeats GPT-OSS with Arbitrary-Rank Ablation (ARA) Decensoring -- 2026-03-11
Fat Fish — A Proper Upscale and Prune of Mistral Nemo -- 2026-03-11
Optimizing Qwen3 Coder for RTX 5090 and PRO 6000 — Community Benchmarking Infrastructure -- 2026-03-11
Yann LeCun's AMI Labs Raises $1.03 Billion to Build World Models -- 2026-03-11
[Editorial] Bell Labs Solved Prompt Injection in 1976 -- 2026-02-27
[Editorial] GitHub Copilot Exploited -- 2026-02-27
[Editorial] Stop Telling Your AI Agent What Not to Do -- 2026-02-27
[Editorial] AI MCP Integration Patterns -- 2026-02-27
Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers -- 2026-02-25
O(1) Inference and Causal Monoid State Compression in Spartacus-1B -- 2026-02-25
Sink-Aware Pruning for Diffusion Language Models -- 2026-02-25
Liquid AI releases LFM2-24B-A2B -- 2026-02-24
Qwen3's most underrated feature: Voice embeddings -- 2026-02-24
After many contributions craft, Crane now officially supports Qwen3-TTS! -- 2026-02-24
We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file. -- 2026-02-24
[Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
[Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
[Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
Pros/Cons and use case for bypassing permissions -- 2026-02-12
[Editorial] https://www.kylerush.org/posts/opus-4-5-really-changed-things -- 2026-02-10
[Editorial] https://www.linkedin.com/posts/ownyourai_i-taught-my-claude-code-to-swallow-a-32m-activity-7426902541868728321-2Z36 -- 2026-02-10
[Editorial] https://docs.google.com/document/d/1I9r21TyQuAO1y2ecztBU0PSCpjHSL_vZJiA5v276Wro/mobilebasic -- 2026-02-10
[Editorial] https://www.linkedin.com/posts/cole-medin-727752184_claude-codes-new-agent-teams-feature-is-share-7426633806792609792-pT3s -- 2026-02-10
[Tool] claude-config-sync: Sync your Claude Code configuration across machines using GitHub Gists -- 2026-02-10
[Editorial] https://www.linkedin.com/posts/ownyourai_i-just-woke-up-to-qwen3-coder-next-80b-activity-7424703876240695297-Nlqf -- 2026-02-04
LiquidAI/LFM2.5-1.2B-Thinking -- 2026-02-04
ByteDance-Seed/Stable-DiffCoder-8B-Instruct · Hugging Face -- 2026-02-03
tencent/Youtu-VL-4B-Instruct -- 2026-02-03
NousResearch/NousCoder-14B -- 2026-02-03
transformers v5 final is out 🔥 -- 2026-01-30
PaddlePaddle/PaddleOCR-VL-1.5 -- 2026-01-30
zai-org/GLM-4.7 -- 2026-01-30
MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models -- 2026-01-30
Sharing my set of distilled small language models (3B) + training data in more than 50 low-resource languages -- 2026-01-30
Introducing Kimi K2.5, Open-Source Visual Agentic Intelligence -- 2026-01-28
~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons? -- 2026-01-28
openbmb/AgentCPM-Report -- 2026-01-28
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice -- 2026-01-28
Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation -- 2026-01-28
One Year Since the “DeepSeek Moment” -- 2026-01-28
Qwen/Qwen-Image-2512 -- 2026-01-28
[Editorial] https://www.linkedin.com/posts/ownyourai_deepseek-just-released-the-first-vision-ai-activity-7421818927657385987-V1yo -- 2026-01-27
Unsloth announces support for finetuning embedding models -- 2026-01-27
Qwen/Qwen3-TTS-12Hz-0.6B-Base -- 2026-01-27
Qwen/Qwen3-VL-Reranker-2B -- 2026-01-27
[Editorial] https://www.linkedin.com/posts/ivandj_early-claims-around-self-evolving-memory-activity-7421307316437676033-l0Jm -- 2026-01-26
[Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-world-model-activity-7421556928910290944-cx4v -- 2026-01-26
stepfun-ai/Step3-VL-10B -- 2026-01-26
Qwen/Qwen3-VL-Embedding-2B -- 2026-01-21
Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2026-01-21
Qwen/Qwen3-VL-Reranker-8B -- 2026-01-21
Bartowski comes through again. GLM 4.7 flash GGUF -- 2026-01-21
Step-Audio-R1.1 (Open Weight) by StepFun just set a new SOTA on the Artificial Analysis Speech Reasoning leaderboard -- 2026-01-20
GLM-4.7-Flash -- 2026-01-20
stepfun-ai/Step-Audio-R1.1 -- 2026-01-20
tencent/HY-MT1.5-1.8B -- 2026-01-20
Introducing GLM-Image -- 2026-01-14
GPT-OSS -> MLA conversion breakthrough (20B), still looking for compute + collaborators -- 2026-01-14
FrogBoss 32B and FrogMini 14B from Microsoft -- 2026-01-14
Qwen/Qwen3-VL-Embedding-8B -- 2026-01-14
MCP for Financial Ontology! -- 2026-01-09
A community index for MCPs that don’t disappear after the thread ends -- 2026-01-09
tencent/HY-WorldPlay -- 2026-01-09
meituan-longcat/LongCat-Image -- 2026-01-09
Introducing Falcon H1R 7B -- 2026-01-09
[Editorial] https://www.linkedin.com/posts/ernst-van-gassen-9196a7b5_we-spend-a-lot-of-time-trying-to-make-prompts-ugcPost-7414966440023482368-dgh5 -- 2026-01-09
[Editorial] https://leakhub.ai/ -- 2026-01-09
[Editorial] https://www.linkedin.com/posts/hardmaru_survival-of-the-fittest-code-blog-https-activity-7415068590485458944-3tqP -- 2026-01-09
A closer look at a BGP anomaly in Venezuela -- 2026-01-09
Show HN: I visualized the entire history of Citi Bike in the browser -- 2026-01-09
Modifying a QingPing Air Quality Monitor for Local MQTT Access -- 2026-01-09
[Editorial] https://github.com/hiyouga/LlamaFactory -- 2026-01-07
Tongyi-MAI/MAI-UI-8B · Hugging Face -- 2026-01-07
MultiverseComputingCAI/HyperNova-60B · Hugging Face -- 2026-01-07
[Editorial] https://www.alwaysfurther.ai/blog/train-4b-model-to-beat-claude-sonnet-gemini -- 2026-01-05
[Experimental] Gemma 3 4B - Dark CoT: Pushing 4B Reasoning to 33%+ on GPQA Diamond -- 2026-01-05
Youtu-LLM-2B-GGUF is here! -- 2026-01-05
upstage/Solar-Open-100B -- 2026-01-05
EditMGT — fast, localized image editing with Masked Generative Transformers -- 2025-12-30
Francis-Rings/FlashPortrait -- 2025-12-30
zai-org/GLM-TTS -- 2025-12-30
Flowception: Temporally Expansive Flow Matching for Video Generation -- 2025-12-30
I've been experimenting with SLM's a lot recently. My goal was to prove even SLMs can be accurate with the right architecture behind it. -- 2025-12-22
MBZUAI releases K2-V2 - 70B fully open model. -- 2025-12-22
microsoft/Fara-7B -- 2025-12-22
Alibaba Tongyi Open Sources Two Audio Models: Fun-CosyVoice 3.0 (TTS) and Fun-ASR-Nano-2512 (ASR) -- 2025-12-19
My professor lent me an A6000, so I tried to build a coding model. Here is Anni! (Qwen3-14B Fine-tune) -- 2025-12-19
Two years ago, I was just a math major. Now I've built the 1.5B router model used by HuggingFace. Can I bring it to Cursor? -- 2025-12-19
Independent evaluation of GPT5.2 on SWE-bench: 5.2 high is #3 behind Gemini, 5.2 medium behind Sonnet 4.5 -- 2025-12-16
Did I overhype Claude Code? GPT + Comet are quietly beating it for me -- 2025-12-16
🚀 New: Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B -- 2025-12-16
ByteDance/Dolphin-v2 -- 2025-12-16
zai-org/AutoGLM-Phone-9B-Multilingual -- 2025-12-16
[Editorial] https://www.linkedin.com/posts/eric-vyacheslav-156273169_a-7m-model-just-surpassed-deepseek-r1-gemini-activity-7405985266043297792-s1Jn -- 2025-12-15
Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models -- 2025-12-15
New in llama.cpp: Model Management -- 2025-12-12
Successful prototype component prompt - For big projects -- 2025-12-12
Building RNJ-1: What makes It different from Gemma 3? -- 2025-12-11
EssentialAI/rnj-1 -- 2025-12-11
From Azure Functions to FreeBSD -- 2025-12-10
[Editorial] https://www.linkedin.com/posts/dakharlamov_too-slow-thats-what-they-called-engineers-activity-7403964370390646784-J27P -- 2025-12-09
[Editorial] https://arxiv.org/abs/2511.20920 -- 2025-12-09
[Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_your-ai-agents-context-window-is-not-a-database-activity-7403394612893024256-9S87 -- 2025-12-09
[Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-postgres-a-self-learning-activity-7403841311029837824-jogp -- 2025-12-09
Emacs is my new window manager -- 2025-12-08
shubh-io/DockMate -- 2025-12-08
Comfy-Org/HunyuanVideo_1.5_repackaged -- 2025-12-08
We were tired of guessing which local model to use for which query. built a speculative execution lib that figures it out (github) -- 2025-12-05
Nimony (eventually Nim 3.0) Design Principles -- 2025-12-03
nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
allenai/Olmo-3-1125-32B -- 2025-12-03
moonshotai/Kimi-Linear-48B-A3B-Instruct -- 2025-12-02
orabazes/FLUX.2-dev-GGUF -- 2025-12-02
Transformers v5: Simple model definitions powering the AI ecosystem -- 2025-12-02
Pocketbase – open-source realtime back end in 1 file -- 2025-11-28
[Editorial] https://www.rand.org/pubs/commentary/2025/11/electromagnetic-warfare-natos-blind-spot-could-decide.html -- 2025-11-28
In depth analysis of Nvidia's Jet Nemotron models -- 2025-11-26
Hidden causes of LLM latency, its not just the model size -- 2025-11-26
Benchmark: Self-Hosted Qwen-30B (LoRA) vs. Llama-3.1-8B vs. GPT-4.1-nano. Comparison of parsing success rates and negative constraints. -- 2025-11-25
[Editorial] https://arxiv.org/html/2510.04871v1 -- 2025-11-24
[Editorial] https://www.linkedin.com/posts/ingason_agenticai-erp-aiagents-activity-7396551539538141185-PxNU -- 2025-11-24
facebook/sam3 -- 2025-11-24
rl-research/DR-Tulu-8B -- 2025-11-24
Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning -- 2025-11-24
marinero4972/Open-o3-Video -- 2025-11-17
nanonets/Nanonets-OCR2-3B -- 2025-11-17
nvidia/ChronoEdit-14B-Diffusers-Upscaler-Lora -- 2025-11-17
ibm-granite/granite-4.0-h-350m -- 2025-11-14
tencent/HunyuanWorld-Mirror -- 2025-11-14
[Editorial] https://www.npmjs.com/package/neural-trader -- 2025-11-14
Critical RCE patched in Imunify360 affects up to 50M+ websites -- 2025-11-14
Kubernetes Ingress Nginx is retiring -- 2025-11-14
About KeePassXC's Code Quality Control -- 2025-11-14
Visualizing Quantization Types -- 2025-11-11
Co-authored a book called "Build DeepSeek from Scratch" | Live Now -- 2025-11-11
Writing your own BEAM -- 2025-11-11
DIY Powerwall Blows Clouds, Competition Out of the Water -- 2025-11-11
BAAI/Emu3.5-Image -- 2025-11-07
Riemannian Optimization for LoRA on the Stiefel Manifold -- 2025-11-07
"On-the-fly" code reviews with ollama. It kinda works.. -- 2025-11-07
I Built an "AI Art Director" Agent to Orchestrate Image and Video Models. -- 2025-11-07
What's your most unexpected Claude workflow discovery? -- 2025-11-07
[Research] Cross-Stage Vulnerabilities in Large Language Model Architectures -- 2025-11-07
runZeroInc/runZeroHound -- 2025-11-07
openai/gpt-oss-safeguard-120b -- 2025-11-07
Why does Image Recognition work in llama-server but not through Open WebUI? -- 2025-11-06
Has anyone tested ollama on Whisplay HAT with Raspberry pi zero 2W? -- 2025-11-06
allenai/olmOCR-2-7B-1025 -- 2025-11-06
is there simple way like .bat to compress to q4-q8 like Unsloth, Qwen3-VL-30B-A3B-Thinking-abliterated model -- 2025-11-06
MiniMax M2 Llama.cpp support merged -- 2025-11-05
Which AI IDE should I use under $20/month? -- 2025-11-05
lzA6/video-to-txt -- 2025-11-05
xaviviro/python-toon -- 2025-11-05
[Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
[Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
[Editorial] Context Engineering Handbook -- 2025-11-03
Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
OSS alternative to Open WebUI - ChatGPT-like UI, API and CLI -- 2025-11-03
Looking for a RAG UI manager to meet our needs to replace Zapier -- 2025-11-03
LiquidAI/LFM2-1.2B-Extract -- 2025-11-03
DeepSeek may have found a new way to improve AI’s ability to remember -- 2025-11-02
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection -- 2025-11-02
OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) -- 2025-10-31
briaai/FIBO -- 2025-10-31
snowyfizz/Vision-Detection-API -- 2025-10-29
Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf -- 2025-10-29
huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning -- 2025-10-29
AlphaXiv,Compare the Deepseek-OCR and Mistral-OCR OCR models -- 2025-10-26
Open-Bee/Bee-8B-RL -- 2025-10-26
datalab-to/chandra -- 2025-10-26
Unlock the power of images with AI Sheets -- 2025-10-26
Picture in Picture / Webcam detect model on HuggingFace -- 2025-10-25
Show HN: Story Keeper – AI agents with narrative continuity instead of memory -- 2025-10-25
Show HN: Deta Surf – An open source and local-first AI notebook -- 2025-10-25
Show HN: Tommy – Turn ESP32 devices into through-wall motion sensors -- 2025-10-25
20x Max Plan (€216) takes 2% of weekly Opus usage for a single Deep Research Query. That equals 50 per week if you use it ONLY for this and never continue or respond -- 2025-10-24
I got tired of OpenAI dependency. Built a multi-LLM control center instead. -- 2025-10-19
Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
Claude Code taking a coffee break 🤔 -- 2025-10-19
Show HN: Cmux – Coding Agent Multiplexer -- 2025-10-19
[Editorial] Agentic Orchestration -- 2025-10-19
Chicken Squisher 3000: Squish-Proof Security -- 2025-10-19
Show HN: Largest open-source multimodal AI dataset -- 2025-10-18
KORMo-Team/KORMo-10B-sft -- 2025-10-18
ByteDance/FaceCLIP -- 2025-10-18
Do you use multiple AI models for coding? Trying to validate a workflow problem -- 2025-10-17
The “Compounding Engineering” mindset changed how I think about AI coding tools -- 2025-10-17
Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js -- 2025-10-13
swiss-ai/Apertus-8B-Instruct-2509 -- 2025-10-13
A Childhood Dream, Created and Open Sourced -- 2025-10-13
Some small tools for you - Ollama Managment UI, Passkey authentication proxy -- 2025-10-12
Single prompt I run after git commit (before push) for AI diff/commit review -- 2025-10-12
Show HN: Gitcasso – Syntax Highlighting and Draft Recovery for GitHub Comments -- 2025-10-12
simonw/claude-skills -- 2025-10-12
Custom models don't work after v0.6.33 update - Anyone else? -- 2025-10-12
Pardus AI: Open source AI Assistant thanks for the help with Ollama -- 2025-10-09
What to use for refactoring -- 2025-10-09
Claude Code finally has a planning partner — I built an AI backlog manager for solo devs -- 2025-10-09
Granite4 Small-h 32b-A9b (Q4_K_M) at FULL 1M context window is using only 73GB of VRAM - Life is good! -- 2025-10-09
Run Open AI GPT-OSS on a mobile phone (Demo) -- 2025-10-09
AI21 releases Jamba 3B, the tiny model outperforming Qwen 3 4B and IBM Granite 4 Micro! -- 2025-10-09
inclusionAI/Ling-mini-2.0 -- 2025-10-09
[Editorial] The Tiny Recursive Mode -- 2025-10-08
deepseek-ai/DeepSeek-V3.1-Terminus -- 2025-10-08
Behavioral Modification Systems in Large Language Models: A Methodological Analysis of Long Conversation Reminders -- 2025-10-08
Ring Flash 2.0 104B A6B with Linear Attention released a few days ago -- 2025-10-07
Bring Your Own Data (BYOD) -- 2025-09-30
CohereLabs/command-a-reasoning-08-2025 -- 2025-09-30
Built an MCP server for Claude Desktop to browse Reddit in real-time -- 2025-09-30
MetalQwen3: Full GPU-Accelerated Qwen3 Inference on Apple Silicon with Metal Shaders – Built on qwen3.c - WORK IN PROGRESS -- 2025-09-28
yangdongchao/UniAudio2 -- 2025-09-28
jimsweb/aiMIDI -- 2025-09-28
Handy – Free open-source speech-to-text app written in Rust -- 2025-09-28
IndexTeam/IndexTTS-2 -- 2025-09-28
beankeji-cloud/SLiteIO -- 2025-09-28
GitHub - shantur/jarvis-mcp: Bring your AI to life—talk to assistants instantly in your browser. Zero hasle, No API keys, No Whisper -- 2025-09-27
PC memory costs to climb as fabs chase filthy lucre in servers and HBM -- 2025-09-27
AI and licensing (commercial use) -- 2025-09-26
I trained an LLM from scratch AMA! -- 2025-09-26
pengzhangzhi/Open-dLLM -- 2025-09-26
SWE-Bench Pro -- 2025-09-23
Investigating Training Data Detection in AI Coders -- 2025-09-23
A first stab at packaging llama.cpp in a performance-optimized manner -- 2025-09-23
Model: Qwen3 Next Pull Request llama.cpp -- 2025-09-23
Unobtanium No More; Perhaps We Already Have All The Elements We Need -- 2025-09-21
support for the upcoming Olmo3 model has been merged into llama.cpp -- 2025-09-21
Running Nvidia CUDA Pytorch/vLLM projects and pipelines on AMD with no modifications -- 2025-09-21
A Quick Look At The AMD Instinct MI355X With ROCm 7.0 -- 2025-09-21
Uncensored AI model for from 4b Max 8b -- 2025-09-21
facebook/MobileLLM-R1-950M -- 2025-09-20
OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
GPT-OSS:20b & Qwen 4b are a match made in heaven for 24GB VRAM builds -- 2025-09-18
Was working in RAG recently got to know how well Gemma3 4B performs -- 2025-09-18
ROCm 6.4.3 -> 7.0-rc1 after updating got +13.5% at 2xR9700 -- 2025-09-18
Is it possible for different brand GPUs to work together? -- 2025-09-18
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts -- 2025-09-18
LFM2-1.2B safety benchmark -- 2025-09-18
Has anyone successfully gotten Ollama models (or any models) to execute SQL queries through natural language in Openwebui? -- 2025-09-18
yangzhou24/OmniWorld -- 2025-09-18
Analog Optical Computer for Inference and Combinatorial Optimization -- 2025-09-18
Running Qwen-Next (Instruct and Thinking) MLX BF16 with MLX-LM on Macs -- 2025-09-17
[Editorial] REFRAG: Rethinking RAG based Decoding -- 2025-09-17
The Practicality Of Solar Powered Meshtastic -- 2025-09-17
Kwai-Klear/Klear-46B-A2.5B-Instruct -- 2025-09-16
WestZhang/VibeVoice-Large-pt -- 2025-09-16
FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference -- 2025-09-16
Qwen 3 Next Series – Qwen/Qwen3 Next 80B A3B Instruct Detected -- 2025-09-16
Test-time Prompt Intervention -- 2025-09-16
[Editorial] Tricks from OpenAI gpt-oss YOU can use with transformers -- 2025-09-15
openbmb/MiniCPM4.1-8B -- 2025-09-15
nunchaku-tech/nunchaku-qwen-image -- 2025-09-15
ggml-org/gpt-oss-20b-GGUF -- 2025-09-15
MBZUAI releases K2 Think. 32B reasoning model based on Qwen 2.5 32B backbone, focusing on high performance in math, coding and science. -- 2025-09-14
unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF -- 2025-09-14
Effecient hot-swappable LoRA variant supported in llama.cpp -- 2025-09-12
Qwen/Qwen3-Next-80B-A3B-Instruct -- 2025-09-12
swiss-ai/Apertus-70B-Instruct-2509 -- 2025-09-12
GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version) -- 2025-09-12
stepfun-ai/step3 -- 2025-09-12
Exploring State-Space-Model based Language Model in Music Generation -- 2025-09-12
HuggingFaceModelDownloader v2.0 — fast resume, a slick TUI, and powerful filters for GGUF/variants -- 2025-09-12
Roo Code Cloud is here with Task Sync & Roomote Control || Roo Code 3.28.0 Release Notes -- 2025-09-12
Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers -- 2025-09-12
Made a one-click SearXNG fork with Redis, plus Dockerized Tika+OCR, and soon: local TTS/STT on Intel iGPU + AMD NPU -- 2025-09-12
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search -- 2025-09-10
Introducing FineVision: a huge open-source dataset for training SOTA Vision Language Models -- 2025-09-10
wildminder/ComfyUI-VibeVoice -- 2025-09-10
bytedance/USO -- 2025-09-10
Wan-AI/Wan2.2-I2V-A14B -- 2025-09-10
YanoljaNEXT-Rosetta: A Collection of Translation Models in Different Sizes -- 2025-09-07
nasa-ibm-ai4science/Surya-1.0 -- 2025-09-06
Vulkan back ends, what do you use? -- 2025-09-06
A new OpenAI model? Could this be 5.1 or 5o? What do you think? -- 2025-09-06
haasonsaas/dspy-0to1-guide -- 2025-09-06
16 reproducible failures → upgraded into a 300+ page Global Fix Map. one link inside, feedback wanted -- 2025-09-06
VibeVoice RIP? What do you think? -- 2025-09-05
lodestones/Chroma1-HD -- 2025-09-05
Welcome EmbeddingGemma, Google's new efficient embedding model -- 2025-09-05
NousResearch/Hermes-4-405B -- 2025-09-05
After deepseekv3 I feel like other MoE architectures are old or outdated. Why did Qwen chose a simple MoE architecture with softmax routing and aux loss for their Qwen3 models when there’s been better architectures for a while? -- 2025-09-02
The Hacker's Guide to Building an AI Supercluster -- 2025-09-02
CAD, From Scratch: MakerCAD -- 2025-09-02
PSO-Merging: Merging Models Based on Particle Swarm Optimization -- 2025-09-02
Coral-Protocol/Anemoi -- 2025-09-01
After researchers unmasked a prolific SMS scammer, a new operation has emerged -- 2025-09-01
Silent No More: Open-Source Fix for Mic Mishaps -- 2025-09-01
How to reliably detect cross-listed job ads across multiple sites? -- 2025-09-01
GPT OSS Fine-tuning QAT -- 2025-08-31
Semantic Structure in Large Language Model Embeddings -- 2025-08-31
facebook/dinov3-vit7b16-pretrain-lvd1689m -- 2025-08-31
MizzenAI/HPSv3 -- 2025-08-31
Some thoughts on LLMs and software development -- 2025-08-30
Show HN: Sideko – Hybrid deterministic/LLM generator for API SDKs and docs -- 2025-08-30
[Editorial] AI interfaces for future -- 2025-08-29
I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
Updates to Consumer Terms and Privacy Policy -- 2025-08-29
Hierarchical Reasoning Model (HRM) implementation for text generation -- 2025-08-27
SQLite-Vector adds support for float16 and bfloat16 (CPU, NEON, AVX2 and SSE2) -- 2025-08-26
zai-org/GLM-4.5 -- 2025-08-26
rednote-hilab/dots.vlm1.inst -- 2025-08-26
black-forest-labs/FLUX.1-Krea-dev -- 2025-08-26
Datarus-R1-14B-Preview, an adaptive multi-step reasoning LLM for automated data analysis -- 2025-08-24
Fully Open source, serverless, community-driven MCP alternative built in Python, TS and Go -- 2025-08-24
unsloth/Kimi-K2-Instruct-GGUF -- 2025-08-24
DeepSeek V3.1 Reasoner improves over DeepSeek R1 on the Extended NYT Connections benchmark -- 2025-08-24
Qwen3-30B-A3B-Instruct 2507 vs Qwen3-Coder Flash -- 2025-08-23
Your model zoo for Software dev / webdev -- 2025-08-23
sriniously/go-boilerplate -- 2025-08-23
Design Patterns in MCP: Literate Reasoning -- 2025-08-22
FlyMyAI/flymyai-lora-trainer -- 2025-08-22
Zedless: Zed fork focused on privacy and being local-first -- 2025-08-22
Show HN: Rucat – Cat for Prompt Engineers -- 2025-08-22
NEW VERSION: 0.6.23 Has Just Released! - Many fixes and new features, huge changelog -- 2025-08-22
Docker Model Runner is really neat -- 2025-08-20
Build a Powerful RAG Web Scraper with Ollama and LangChain -- 2025-08-20
Ollama interface with memory -- 2025-08-20
YuminosukeSato/pyproc -- 2025-08-20
AGENTS.md – Open format for guiding coding agents -- 2025-08-20
NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1 -- 2025-08-20
deepseek-ai/DeepSeek-V3.1-Base · Hugging Face -- 2025-08-20
mistralai/Devstral-Small-2507 -- 2025-08-20
zai-org/GLM-4.5-Air -- 2025-08-20
microsoft/Phi-4-mini-flash-reasoning -- 2025-08-20
ModelTC/Qwen-Image-Lightning -- 2025-08-20
Fast Type-Aware Linting in Oxlint -- 2025-08-20
GPT-5, where does it shine for you? -- 2025-08-19
vidore/colqwen-omni-v0.1 -- 2025-08-19
Tutorial: Open WebUI and llama-swap works great together! Demo of setup, model swapping and activity monitoring. -- 2025-08-18
Concurrency in open-weight/open-source models? -- 2025-08-18
[Editorial] Claude Flow, Alpha 90 release -- 2025-08-17
Optimizing Text gen webui (oobabooga) for MOE models (Qwen3-235b, GLM 4.5) -- 2025-08-17
Chen-zexi/vllm-cli -- 2025-08-17
zyfoxx/subhunter -- 2025-08-17
HuggingFaceTB/SmolLM3-3B-Base -- 2025-08-16
mistralai/Voxtral-Small-24B-2507 -- 2025-08-16
TiTan - a tiny model for tags and titles -- 2025-08-16
baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-08-16
MiniLM (BERT) embeddings in C from scratch -- 2025-08-16
GENNAI CLI - A ReAct-based agent CLI -- 2025-08-16
gptme v0.28.0 major release - agent CLI with local model support -- 2025-08-16
WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling -- 2025-08-16
character-ai/pipelining-sft -- 2025-08-15
Compass-Thinker-7B Technical Report -- 2025-08-15
Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning -- 2025-08-15
SaaS Is Dead -- 2025-08-15
PYX: The next step in Python packaging -- 2025-08-15
[Editorial] GLM-4.5, enterprise use -- 2025-08-13
Best local model with function calling? -- 2025-08-13
agentica-org/DeepSWE-Preview -- 2025-08-13
janhq/Jan-v1-4B-GGUF -- 2025-08-13
gpt-oss jailbreak workflow -- 2025-08-11
GPT-5 removed logprob support from the API - technical breakdown and implications -- 2025-08-11
A model for pure text continuation (not chirpy little Q&A assistant)? -- 2025-08-11
Suggestion for upgrading hardware for MOE inference and fine-tuning. -- 2025-08-09
Best models under 16GB?? -- 2025-08-09
Explicit tail calls are now available on Rust Nightly (become keyword) -- 2025-08-09
HuggingFaceTB/SmolLM3-3B -- 2025-08-09
mistralai/Magistral-Small-2507 -- 2025-08-09
N8N + OpenWebUI -- 2025-08-08
Create space-saving clones on macOS with Python -- 2025-08-08
Experience with GLM-4.5-Air + claude code? -- 2025-08-08
Just when you thought Qwen was done... -- 2025-08-08
Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs -- 2025-08-08
Context Management by Trimming Conversation -- 2025-08-06
Exploiting Primacy Effect To Improve Large Language Models -- 2025-08-06
Help: Qwen3-Coder + LM Studio + Continue.dev (VSCode) + Mac 64GB M3 Max — 500 Internal Server Error, Even After Unsloth Fix -- 2025-08-04
CWC now supports kimi.com (K2) and chat.z.ai (GLM-4.5) to enable coding with top tier models at no cost -- 2025-08-04
🧠 ICM+DPO: Used Qwen3's coherent understanding to improve Gemma3 at math - cross-model capability transfer with zero supervision -- 2025-08-03
NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining -- 2025-08-03
Build an AI Shopping Assistant with Gradio MCP Servers -- 2025-08-01
[Editorial] Alternative to vector db rag -- 2025-07-30
This year’s best open-source models and most cost-effective models -- 2025-07-29
I’m looking for multimodal image input support and uncensored LLM -- 2025-07-29
nvidia/audio-flamingo-3 -- 2025-07-29
mistralai/Voxtral-Mini-3B-2507 -- 2025-07-29
ziangcao0312/PhysX-3D -- 2025-07-29
UI/UX benchmark update 7/22: Newest Qwen models added, Qwen3 takes the lead in terms of win rate (though still early) -- 2025-07-28
unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF -- 2025-07-28
zai-org/GLM-4.5 -- 2025-07-28
Tesslate/UIGEN-X-32B-0727 -- 2025-07-28
had to fine-tune qwen since llama sucks at summarizing -- 2025-07-28
orchestre-dev/ccproxy -- 2025-07-28
[Editorial] neural networks don’t need to be giant to be powerful -- 2025-07-27
Qwen/Qwen3-235B-A22B-Thinking-2507 -- 2025-07-27
mistralai/Magistral-Small-2507 -- 2025-07-27
cmdaltctr/claude-gemini-mcp-slim -- 2025-07-26
RichardAtCT/claude-code-openai-wrapper -- 2025-07-26
Freigeist - The new Vibe Coding Platform -- 2025-07-26
From chaotic prompting to structured workflow: My Claude evolution -- 2025-07-24
A Request for Comments (RFC) for MCP-alternative Universal Tool Calling Protocol (UTCP) was created -- 2025-07-22
How to use the same context across LLMs and Agents -- 2025-07-22
Has anyone actually ran VLAs locally and how good are they? -- 2025-07-22
google/medsiglip-448 -- 2025-07-22
Questions about AI for translation -- 2025-07-22
microsoft/Phi-4-mini-flash-reasoning -- 2025-07-22
LGAI-EXAONE/EXAONE-4.0-32B -- 2025-07-22
Replacing thinking with tool usage enables reasoning in small language models -- 2025-07-22
Struggling to Generate Polished UI with Claude Code -- 2025-07-20
Skywork/Skywork-R1V3-38B -- 2025-07-20
ByteDance-Seed/Seed-X-PPO-7B -- 2025-07-20
Diffusion model support in llama.cpp. -- 2025-07-16
GLM-4 MoE incoming -- 2025-07-16
GeoArrow and GeoParquet, and the Future of Geospatial Data Analysis -- 2025-07-15
Programming Affordances That Invite Mistakes -- 2025-07-14
HuggingFaceTB/SmolLM3-3B-Base -- 2025-07-14
OLMo 2 - a family of fully-open language models -- 2025-07-12
google/videoprism -- 2025-07-12
Why don’t we have a big torrent repo for open-source LLMs? -- 2025-07-12
Local PDF Database searchable with ollama - best setup? -- 2025-07-12
Tinyllama on old Mediatek G80 android device -- 2025-07-12
I used Ollama to build a Cursor for PDFs -- 2025-07-12
Advice on switching to LLM -- 2025-07-12
Building the Hugging Face MCP Server -- 2025-07-11
Support for the upcoming IBM Granite 4.0 has been merged into llama.cpp -- 2025-07-11
support for Falcon-H1 model family has been merged into llama.cpp -- 2025-07-11
fsndzomga/metadspy -- 2025-07-10
osmosis-ai/Osmosis-Apply-1.7B -- 2025-07-10
IntervitensInc/pangu-pro-moe-model -- 2025-07-10
Continual Gradient Low-Rank Projection Fine-Tuning for LLMs -- 2025-07-10
pola-rs/polars -- 2025-07-09
Introducing an open source cross-platform graphical interface LLM client -- 2025-07-08
llama-server vs llama python binding -- 2025-07-08
Llama server completion not working correctly -- 2025-07-08
introducing cocoindex - super simple etl to prepare data for ai, with dynamic index (ollama integrated) -- 2025-07-08
Higher topk and num_ctx or map/reduce ? -- 2025-07-08
Looking for an upgrade from Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf, especially for letter parsing. Last time I looked into this was a very long time ago (7 months!) What are the best models nowadays? -- 2025-07-08
Best models by size? -- 2025-07-08
Planning a 7–8B Model Benchmark on 8GB GPU — What Should I Test & Measure? -- 2025-07-08
DLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching -- 2025-07-08
Efficient MultiModal Data Pipeline -- 2025-07-08
Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
Using local models with Void -- 2025-07-05
Intel GPU vLLM Docker Compose Bootstrap with Phi-lthy4 on A770 -- 2025-07-05
Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm -- 2025-07-05
Gemma 3n fully available in the open-source ecosystem! -- 2025-07-05
5060ti 16gb or 9060xt 16gb for small llm server -- 2025-07-05
Qwen3 models in MLX format! -- 2025-07-05
Best local coding model right now? -- 2025-07-05
Found a Web3 LLM That Actually Gets DeFi Right -- 2025-07-03
apple/DiffuCoder-7B-cpGRPO -- 2025-07-03
Hoshinonyaruko/Gensokyo-MCP -- 2025-07-01
ahmadallobani/BaldHead -- 2025-06-29
modelcontextprotocol/registry -- 2025-06-27
0-concordance of knotted surfaces and Alexander ideals -- 2025-06-26
unfinishedtr/progressbar -- 2025-06-24
wearyfurnitur/confiq -- 2025-06-24
Show HN: Controlling 3D models with voice and hand gestures -- 2025-06-24
open-webui/mcpo -- 2025-06-23
Built a fully local Whisper + pyannote stack to replace Otter. Full diarisation, transcripts & summaries on GPU. -- 2025-06-23
MiniMax latest open-sourcing LLM, MiniMax-M1 — setting new standards in long-context reasoning,m -- 2025-06-23
Run qwen 30b-a3b on Android local with Alibaba MNN Chat -- 2025-06-23
A new PDF translation tool -- 2025-06-23
What Really Happens When You Ask a Cursor a Question with GitHub MCP Integrated -- 2025-06-23
[Q] How to Speed Up Mistral 7B Inference in LM Studio? 31s/Chunk on RTX 3070 -- 2025-06-23
Cyber security guys are about to become very on demand in the coming few years -- 2025-06-23
The first big AI disaster is yet to happen -- 2025-06-23
Trading with Claude, and writing your own MCP server -- 2025-06-23
Ecne AI Podcast Generator - Update -- 2025-06-23
Help me decide on hardware for LLMs -- 2025-06-23
chungmin99/pyroki -- 2025-06-23
nvidia/GR00T-N1.5-3B -- 2025-06-23
nvidia/Cosmos-Predict2-2B-Text2Image -- 2025-06-22
trendmicro/vision-one-mcp-server -- 2025-06-20
Minidoracat/mcp-feedback-enhanced -- 2025-06-20
meta-llama/Llama-3.1-8B-Instruct -- 2025-06-19
MiniMaxAI/MiniMax-M1-80k -- 2025-06-19
ckanthony/openapi-mcp -- 2025-06-18
Ta0ing/MCP-SecurityTools -- 2025-06-18
fluxions/vui -- 2025-06-13
carbon-language/carbon-lang -- 2025-06-12
CJackHwang/AIstudioProxyAPI -- 2025-06-11
News publishers call Google's AI Mode 'theft' -- 2025-06-11
typelevel/cats -- 2025-06-11
wesm/pydata-book -- 2025-06-11
jedisct1/openapi-mcp -- 2025-06-09
arcee-ai/Homunculus -- 2025-06-04
GitHub MCP exploited: Accessing private repositories via MCP -- 2025-06-01
dipampaul17/KVSplit -- 2025-06-01
facebook/OMol25 -- 2025-05-30
Is Microsoft’s new Foundry Local going to be the “easy button” for running newer transformers models locally? -- 2025-05-28
Cobolt is now available on Linux! 🎉 -- 2025-05-28
Round Up: Current Best Local Models under 40B for Code & Tool Calling, General Chatting, Vision, and Creative Story Writing. -- 2025-05-28
Best local model for M2 16gb MacBook Air for Analyzing Transcripts -- 2025-05-28
Prompt Debugging -- 2025-05-28
Not so Smart Agent (Ollama, Spring AI, MCP) -- 2025-05-28
[Q] How can one get better at fixing models,training etc.? -- 2025-05-28
FOSS - MCP Server generator from OpenAPI specification files (swagger/etapi) -- 2025-05-28
Digital Payment System GNU Taler Gets Green Light to Operate in Switzerland -- 2025-05-28