Local AI
Local inference, on-device AI, hardware selection, GPUs, Ollama, edge computing
927 articles across 196 editions
Articles
- [Editorial] -- 2026-02-24
- [Editorial] -- 2026-02-24
- [Editorial] -- 2026-02-24
- GGML and llama.cpp join HF to ensure the long-term progress of Local AI -- 2026-02-20
- [Editorial] Agentic AI for Enterprise -- 2026-02-20
- Vellium: open-source desktop app for creative writing with visual controls -- 2026-02-20
- The Anxiety of Influence: Bloom Filters in Transformer Attention Heads -- 2026-02-20
- Higher effort settings reduce deep research accuracy for GPT-5 and Gemini Flash 3 -- 2026-02-20
- I plugged a $30 radio into my Mac mini and told my AI "connect to this" — now I control my smart home and send voice messages over radio with zero internet -- 2026-02-19
- FlashLM v4: 4.3M ternary model trained on CPU in 2 hours — coherent stories from adds and subtracts only -- 2026-02-19
- [Editorial] Context Drift: How I Talked AI Agents Into Giving Up Their Secrets -- 2026-02-16
- [Editorial] PromptArmor — AI Security Defense -- 2026-02-16
- [Editorial] The Agentic AI Future of Threat Intelligence -- 2026-02-16
- [Editorial] ClawdInt — Agentic AI Threat Intelligence -- 2026-02-16
- I built a personal AI assistant in 815 lines of TypeScript — every capability is just a Markdown file -- 2026-02-13
- whisper.cpp + llama.cpp in a desktop app — local voice-to-text with LLM text cleanup -- 2026-02-13
- I built a social network where 6 Ollama agents debate each other autonomously — Mistral vs Llama 3.1 vs CodeLlama -- 2026-02-13
- Lorph: A Local AI Chat App with Advanced Web Search via Ollama -- 2026-02-13
- [Editorial] https://windley.com/archives/2026/02/a_policy-aware_agent_loop_with_cedar_and_openclaw.shtml -- 2026-02-12
- Open Source Kreuzberg benchmarks and new release -- 2026-02-12
- [NVIDIA Nemotron] How can I assess general knowledge on a benchmaxxed model? -- 2026-02-12
- I built a rough .gguf LLM visualizer -- 2026-02-12
- Local-First Fork of OpenClaw for using open source models--LocalClaw -- 2026-02-12
- built a self-hosted API proxy that strips PII before prompts reach any LLM - works with Ollama too -- 2026-02-11
- Bitnet.cpp - Inference framework for 1-bit (ternary) LLM's -- 2026-02-11
- Last Week in Multimodal AI - Local Edition -- 2026-02-11
- [Editorial] https://www.zdnet.com/article/claude-code-alternative-free-local-open-source-goose -- 2026-02-10
- Recommend model for openclaw clawdbot running locally on old laptop 4gb vram 16g ram asus -- 2026-02-10
- OpenWebui + Ace Step 1.5 -- 2026-02-10
- MichiAI: A 530M Full-Duplex Speech LLM with ~75ms Latency using Flow Matching -- 2026-02-10
- Qwen3-Coder-Next 80B (GGUF/BF16) on Zen 5 EPYC: 12-channel DDR5 & NVFP4 bench -- 2026-02-09
- I built Qwen3-TTS Studio – Clone your voice and generate podcasts locally, no ElevenLabs needed -- 2026-02-09
- Using Ollama for a real-time desktop assistant — latency vs usability tradeoffs? -- 2026-02-09
- ginkida/gokin -- 2026-02-09
- jarrodwatts/claude-stt -- 2026-02-09
- Let your coding agent benchmark llama.cpp for you (auto-hunt the fastest params per model) -- 2026-02-06
- GGML implementation of Qwen3-ASR -- 2026-02-06
- Running LLMs & VLMs Fully On-Device on iPhone(6GB RAM) — Offline, Privacy-Focused, Real-Time Performance -- 2026-02-06
- Run Ollama on your Android! -- 2026-02-04
- Is using the officially supported local LLM integration in Claude Code for business/corporate use a violation of ToS? -- 2026-02-04
- We should really try fine-tuning MoLE model from a pre-trained model -- 2026-02-02
- I benchmarked a bunch of open weight LLMs on different Macs so you don't have to! -- 2026-02-02
- Why does using ollama run claude with glm-4.7-flash have zero memory? -- 2026-02-02
- Last Week in Multimodal AI - Local Edition -- 2026-02-02
- I vibe coded a local audio inference engine for Qwen3-TTS and Qwen3-ASR -- 2026-02-02
- Show: Fully Local Voice Assistant (with optional Voice Cloning) -- 2026-01-30
- Thoughts on PowerInfer as a way to break the memory bottleneck? -- 2026-01-30
- Ollama Models Ranked by VRAM Requirements -- 2026-01-30
- local-vision-bridge: OpenWebUI Function to intercept images, send them to a vision capable model, and forward description of images to text only model -- 2026-01-30
- What secondary GPU should I get, mainly for local prompting? -- 2026-01-28
- On-device tool calling with Llama 3.2 3B on iPhone - made it suggest sushi restaurants [Open Source, React Native] -- 2026-01-28
- I have written gemma3 inference in pure C -- 2026-01-28
- A Tool to Calculate If a LLM Will Fit Your GPU -- 2026-01-27
- We indexed the entire Ollama Library (10TB+ VRAM). Here is how we run them all on 1 Node. -- 2026-01-27
- Is the next leap in AI architectural? Comparing VRAM-hungry Transformers with Compute-intensive Energy-Based Models -- 2026-01-27
- Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory -- 2026-01-27
- I put an RTX PRO 4000 Blackwell SFF in my MS-S1 Max (Strix Halo), some benchmarks -- 2026-01-26
- ClaraVerse | Local AI workspace (4 months ago) -> Your feedback -> Back with improvements. -- 2026-01-26
- Beyond Vendor Lock-In: A Framework for LLM Sovereignty -- 2026-01-26
- Bringing Anthropic's "advanced tool use" pattern to local models with mcpx -- 2026-01-23
- native-devtools-mcp - An MCP server for testing native desktop applications -- 2026-01-23
- Built a lightweight Python agent framework to avoid “black box” abstractions, feedback welcome -- 2026-01-23
- Steam page is live! Time for non-technical folks to enjoy local AI too (for free). -- 2026-01-23
- Hi folks, I’ve built an open‑source project that could be useful to some of you -- 2026-01-23
- [Editorial] https://www.linkedin.com/posts/steveyegge_gas-town-hall-activity-7420008043712622592-Oh43 -- 2026-01-23
- [Editorial] https://www.linkedin.com/posts/unsloth_you-can-now-run-glm-47-flash-locally-on-activity-7419220348719624192-CV65 -- 2026-01-22
- Here is how to get GLM 4.7 working on llama.cpp with flash attention and correct outputs -- 2026-01-22
- unsloth/GLM-4.7-Flash-GGUF -- 2026-01-22
- I used Ollama (Mistral Small 24B) + LightRAG to build a graph pipeline that catches hidden risks where standard Vector RAG fails. -- 2026-01-21
- Hey all- I built a self-hosted MCP server to run AI semantic search over your own databases, files, and codebases. Supports Ollama and cloud providers if you want. Thought you all might find a good use for it. -- 2026-01-21
- I built Semantiq - a universal MCP server that gives semantic code understanding to Claude Code, Cursor, and any AI coding tool (100% local, no API keys) -- 2026-01-21
- I need a feedback about an open-source CLI that scan AI models (Pickle, PyTorch, GGUF) for malware, verify HF hashes, and check licenses -- 2026-01-20
- Running multiple models locally on a single GPU, with model switching in 2-5 seconds. -- 2026-01-20
- EXAONE MoE support has been merged into llama.cpp -- 2026-01-20
- naklecha/simple-llm -- 2026-01-20
- 3x3090 + 3060 in a mid tower case -- 2026-01-19
- Built an 8× RTX 3090 monster… considering nuking it for 2× Pro 6000 Max-Q -- 2026-01-19
- vLLM on 2x/4x Tesla v100 32GB -- 2026-01-16
- M.2 to 4x Pcie for extra GPU Power Question -- 2026-01-16
- New version of Raspberry Pie Generative AI card (HAT+ 2) -- 2026-01-16
- Local AI App With SD-1.5 Models -- 2026-01-15
- For RAG serving: how do you balance GPU-accelerated index builds with cheap, scalable retrieval at query time? -- 2026-01-15
- Home workstation vs NYC/NJ colo for LLM/VLM + Whisper video-processing pipeline (start 1 GPU, scale to 4–8) -- 2026-01-15
- Create specialized Ollama models in 30 seconds -- 2026-01-15
- Two ASRock Radeon AI Pro R9700's cooking in CachyOS. -- 2026-01-14
- which small model can i use to read this gauge? -- 2026-01-14
- Supertone/supertonic-2 -- 2026-01-14
- [Editorial] https://github.com/VibiumDev/vibium -- 2026-01-13
- Battle of AI Gateways: Rust vs. Python for AI Infrastructure: Bridging a 3,400x Performance Gap -- 2026-01-13
- Built a local TTS app using Apple's MLX framework. No cloud, no API calls, runs entirely on device. -- 2026-01-13
- I built a tool to clean HTML pages for RAG (JSON / MD / low-noise HTML) -- 2026-01-13
- We benchmarked every 4-bit quantization method in vLLM 👀 -- 2026-01-12
- Gpu inference with model that does not fit in one GPU -- 2026-01-12
- Llama.cpp rpc experiment -- 2026-01-12
- Performance improvements in llama.cpp over time -- 2026-01-12
- Dual rx 9070 for LLMs? -- 2026-01-09
- Opus 4.5 head-to-head against Codex 5.2 xhigh on a real task. Neither won. -- 2026-01-09
- Solar-Open-100B-GGUF is here! -- 2026-01-08
- [HW TUNING] Finding the best GPU power limit for inference -- 2026-01-08
- HomeGenie v2.0: 100% Local Agentic AI (Sub-5s response on CPU, No Cloud) -- 2026-01-08
- WebGPU llama.cpp running in browser with Unity to drive NPC interactions (demo) -- 2026-01-08
- Offline agent testing chat mode using Ollama as the judge (EvalView) -- 2026-01-08
- Achieving 30x Real-Time Transcription on CPU . Multilingual STT Openai api endpoint compatible. Plug and play in Open-webui - Parakeet -- 2026-01-07
- Local Image Edit API Server for Models like Qwen-Image-Edit or Flux2-dev -- 2026-01-07
- Using n8n to orchestrate DeepSeek/Llama3 Agents via SSH (True Memory Persistence) -- 2026-01-07
- llama.cpp performance breakthrough for multi-GPU setups -- 2026-01-06
- Llama 3.2 3B fMRI LOAD BEARING DIMS FOUND -- 2026-01-06
- Hyperbolic Math w Mac GPU acceleration -- 2026-01-06
- RTX 3090 vs RTX 4090 for local AI assistant - impact on Time To First Token (TTFT)? -- 2026-01-05
- Any Vision model on pair with GPT-OSS 120B? -- 2026-01-05
- tobilg/ai-observer -- 2026-01-05
- Llama-3.3-8B-Instruct -- 2025-12-30
- Benchmarking local llms for speed with CUDA and vulkan, found an unexpected speedup for select models -- 2025-12-30
- Why Kimi K2 Thinking choose Int4 QAT, from infra enginner of KImi -- 2025-12-30
- Help RTX 5090 + llama.cpp crashes after 2-3 inferences (VFIO passthrough, SM120 CUDA) -- 2025-12-30
- AI-Doomsday-Toolbox Distributed inference + workflows -- 2025-12-30
- I built a local voice assistant that learns new abilities via auto-discovered n8n workflows exposed as tools via MCP (LiveKit + Ollama + n8n) -- 2025-12-29
- exllamav3 adds support for GLM 4.7 (and 4.6V, + Ministral & OLMO 3) -- 2025-12-29
- Tencent just released WeDLM 8B Instruct on Hugging Face -- 2025-12-29
- Gen 3D with local llm -- 2025-12-29
- Offline vector DB experiment — anyone want to test on their local setup? -- 2025-12-29
- Roo Code 3.37 | GLM 4.7 | MM 2.1 | Custom tools | MORE!!! -- 2025-12-29
- My problem: my agent code got tied to one provider. I built a thin wrapper so I can swap OpenAI ↔ Ollama without rewrites. -- 2025-12-23
- Hey r/LocalLLaMA, I built a fully local AI agent that runs completely offline (no external APIs, no cloud) and it just did something pretty cool: It noticed that the "panic button" in its own GUI was completely invisible on dark theme (black text on black background), reasoned about the problem, a -- 2025-12-23
- Demo - RPI4 wakes up a server with dynamically scalable 7 gpus -- 2025-12-23
- Show HN: I Built an Image Captioning Tool Using Llama.cpp -- 2025-12-23
- FedFusion: Federated Learning with Diversity- and Cluster-Aware Encoders for Robust Adaptation under Label Scarcity -- 2025-12-19
- mini-SGLang released: Learn how LLM inference actually works (5K lines, weekend-readable) -- 2025-12-18
- Run Mistral Devstral 2 locally Guide + Fixes! (25GB RAM) - Unsloth -- 2025-12-18
- I vibe coded (I hope) useful tool for local LLMs inference -- 2025-12-18
- I built a local Python agent that catches stderr and self-heals using Ollama. No cloud APIs involved. (Demo) -- 2025-12-18
- mistralai/Devstral-Small-2-24B-Instruct-2512 -- 2025-12-18
- [Editorial] https://docs.unsloth.ai/new/deploy-llms-phone -- 2025-12-17
- Full AI Voice Agent (Whisper + 700M LLM + NeuTTS) running entirely on an Nvidia Jetson Orin Nano ($250 hardware) with no internet access -- 2025-12-17
- 8x Radeon 7900 XTX Build for Longer Context Local Inference - Performance Results & Build Details -- 2025-12-17
- running Deepseek v32 on consumer hardware llama.cpp/Sglang/vLLm -- 2025-12-15
- Found a REAP variant of Qwen3-coder that I can use for 100K tokens in Roo Code on my macbook -- 2025-12-15
- Understanding the new router mode in llama cpp server -- 2025-12-15
- Letting a local Ollama model judge my AI agents and it’s surprisingly usable -- 2025-12-15
- I got tired of my agents losing context on topic shifts, so I hacked together a branch router - thoughts? -- 2025-12-12
- Tiny-A2D: An Open Recipe to Turn Any AR LM into a Diffusion LM -- 2025-12-12
- Am I overthinking GDPR/Privacy by moving my AI workflow local? -- 2025-12-12
- zai-org/GLM-ASR-Nano-2512 -- 2025-12-12
- Thoughts on decentralized training with Psyche? -- 2025-12-12
- Operator Mech v2.5: A Compact Structural-Reasoning Kernel for Local Models (YAML, 7B–13B Optimized) -- 2025-12-10
- stepfun-ai/GELab-Zero-4B-preview -- 2025-12-10
- Confused and unsure -- 2025-12-10
- Off-Grid, Small-Scale Payment System -- 2025-12-09
- Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs -- 2025-12-09
- [Tool] Tiny MCP server for local FAISS-based RAG (no external DB) -- 2025-12-09
- [help] RTX pro 6000 - llama.cpp Qwen3-Next-80B maxes out at 70% gpu? -- 2025-12-09
- Rate/roast my setup -- 2025-12-09
- [Editorial] https://github.com/robertelee78/claude-telegram-mirror -- 2025-12-09
- Gemini TTS for OpenWebUI using OpenAI endpoint -- 2025-12-08
- We Got Claude to Fine-Tune an Open Source LLM -- 2025-12-08
- CUA Local Opensource -- 2025-12-05
- OVHcloud on Hugging Face Inference Providers 🔥 -- 2025-12-05
- Using Ollama (qwen2.5-vl) to auto-tag RAW photos in a Python TUI -- 2025-12-05
- AMD PRO 395 Radeon 8060S Graphics - Any recent Benchmarks -- 2025-12-04
- LoRa Repeater Lasts 5 Years on PVC Pipe and D Cells -- 2025-12-03
- AI just proved Erdos Problem #124 -- 2025-12-02
- PleIAs/Baguettotron -- 2025-12-01
- nvidia/ChronoEdit-14B-Diffusers -- 2025-12-01
- 20x Faster TRL Fine-tuning with RapidFire AI -- 2025-12-01
- Deep learning models are vulnerable, but adversarial examples are even more vulnerable -- 2025-12-01
- Introducing GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization | "GeoVista is a new 7B open-source agentic model that achieves SOTA performance in geolocalization by integrating visual tools and web search into an RL loop." -- 2025-11-28
- [Editorial] https://arxiv.org/html/2511.09030v1 -- 2025-11-28
- [Editorial] https://ai.google.dev/gemini-api/docs/prompting-strategies#agentic-si-template -- 2025-11-28
- stepfun-ai/Step-Audio-R1 -- 2025-11-28
- Benchmark: Self-Hosted Qwen-30B (LoRA) vs. Llama-3.1-8B vs. GPT-4.1-nano. Comparison of parsing success rates and negative constraints. -- 2025-11-25
- In relation to the Ollama post , would you all be interested in an apache 2 open source alternative? -- 2025-11-24
- Study shows why local models might be the only private option -- 2025-11-24
- Best < $20k Configuration -- 2025-11-24
- [Release] Memory-Isolated Recursive Compression (MIRC). A local-first probabilistic compression utility for Apple Silicon. Research Preview (Open Source) -- 2025-11-21
- Read long podcasts locally with Whisper + LLM, open sourced -- 2025-11-21
- Local all-in-one AI system (Local multimodal AI) -- 2025-11-21
- JMS1717/8mb.local -- 2025-11-21
- Mimir Memory Bank now uses llama.cpp! -- 2025-11-21
- [Editorial] https://dropbox.tech/machine-learning/how-dash-uses-context-engineering-for-smarter-ai -- 2025-11-20
- The new Aider-CE fork of Aider is now official -- 2025-11-19
- Brimstone: ES2025 JavaScript engine written in Rust -- 2025-11-19
- Open WebUI Lite: an open-source, dependency-free Rust rewrite, with a standalone Tauri desktop client -- 2025-11-19
- Integrating Openwebui / local LLM into Firefox -- 2025-11-19
- A more surgical approach to abliteration -- 2025-11-19
- [30 Trillion token dataset] "HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models", Oepen et al. 2025 -- 2025-11-19
- Compress to Impress: Efficient LLM Adaptation Using a Single Gradient Step on 100 Samples -- 2025-11-19
- Smart Bandage Leverages AI Model For Healing Purposes -- 2025-11-19
- PyTorch 2.10.0a0 w/ Blackwell (sm_120) Support — Patched & Packaged for One-Command Install -- 2025-11-17
- Half-trillion parameter model on a machine with 128 GB RAM + 24 GB VRAM -- 2025-11-17
- [Editorial] https://www.linkedin.com/posts/andriyburkov_when-you-train-a-model-on-one-dataset-it-activity-7392804316769701888-166x/ -- 2025-11-14
- xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning -- 2025-11-14
- I built my own self-hosted GPT with LM Studio, Caddy, and Cloudflare Tunnel -- 2025-11-14
- Can't find Model in Ollama -- 2025-11-14
- [PSA] Claude Code Web users: Want something useful to do with your $1k free credits? Help fix all the borked HuggingFace Spaces. -- 2025-11-14
- Tiny386 on an Espressif ESP32-S3 -- 2025-11-14
- I built a RAG as a Service orchestrator for local models -- 2025-11-14
- Fine-Tuning SLMs and Running Them Securely in Your Web Browser -- 2025-11-14
- [Editorial] https://www.linkedin.com/posts/stuart-winter-tear_so-reportedly-yann-lecun-plans-to-leave-activity-7394396547276460032-gEE5 -- 2025-11-13
- inclusionAI/LLaDA2.0-mini-preview -- 2025-11-13
- Qwen3-VL works really good with Zoom-in Tool -- 2025-11-12
- [Update] mlx-knife 2.0 stable — MLX model manager for Apple Silicon -- 2025-11-12
- Vascura BAT - configuration Tool for Llama.Cpp Server via simple BAT files. -- 2025-11-12
- Last week in Multimodal AI - Local Edition -- 2025-11-12
- Cerebras Code now supports GLM 4.6 at 1000 tokens/sec -- 2025-11-12
- Finetuning DeepSeek 671B locally with only 80GB VRAM and Server CPU -- 2025-11-07
- IPEX-LLM llama.cpp portable GPU and NPU working really well on laptop -- 2025-11-07
- [P] Training Better LLMs with 30% Less Data – Entropy-Based Data Distillation -- 2025-11-06
- I fine tuned a (small) model to help with reasoning backfill on old/non-reasoning datasets -- 2025-11-06
- Superhuman AI for Multiplayer Poker -- 2025-11-06
- cerebras/GLM-4.5-Air-REAP-82B-A12B -- 2025-11-06
- Retrieval Enhanced Feedback via In-context Neural Error-book -- 2025-11-06
- Worse Embedding Performance with Qwen 3 VL than with Qwen 2.5 VL? -- 2025-11-05
- Retrospective Sparse Attention for Efficient Long-Context Generation -- 2025-11-05
- [Open Source] We deployed numerous agents in production and ended up building our own GenAI framework -- 2025-11-04
- First LangFlow Flow Official Release - Elephant v1.0 -- 2025-11-04
- zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
- Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
- OCR models: HF demos vs local performance -- 2025-11-03
- Help me decide: EPYC 7532 128GB + 2 x 3080 20GB vs GMtec EVO-X2 -- 2025-11-03
- Now you can deploy OpenStatus on Raspberry Pi -- 2025-11-03
- Flex Attention vs Flash Attention 3 -- 2025-11-02
- manifestai releases Brumby-14B-Base weights, claims "attention free" and inference "hundreds of time faster" for long context -- 2025-11-02
- What's the best, I can run with 32GB of RAM and 8GB of VRAM -- 2025-11-02
- Minimax-M2 cracks top 10 overall LLMs (production LLM performance gap shrinking: 7 points from GPT-5 in Artificial Analysis benchmark) -- 2025-11-01
- 🚨 OpenAI Gives Microsoft 27% Stake, Completes For-Profit Shift -- 2025-11-01
- [Editorial] https://github.com/claraverse-space/ClaraVerse -- 2025-10-31
- You can now run Ollama models in Jan -- 2025-10-31
- Codex and Supabase -- 2025-10-31
- Flamingo 3 released in safetensors -- 2025-10-31
- My LLM-powered text adventure needed a dynamic soundtrack, so I'm training a MIDI generation model to compose it on the fly. Here's a video of its progress so far. -- 2025-10-31
- Experimenting with Qwen3-VL for Computer-Using Agents -- 2025-10-30
- Built a full voice AI assistant running locally on my RX 6700 with Vulkan - Proof AMD cards excel at LLM inference -- 2025-10-30
- Streaming datasets: 100x More Efficient -- 2025-10-30
- fireleyfreya/AI-Art-Generator -- 2025-10-30
- krea/krea-realtime-video -- 2025-10-30
- vLLM MoE Benchmark Configs for Qwen3 Coder REAP 25B & RTX Pro 6000 -- 2025-10-29
- Ollama supports Qwen3-VL locally! -- 2025-10-29
- Test results for various models' ability to give structured responses via LM Studio. Spoiler: Qwen3 won -- 2025-10-29
- Introducing ExecuTorch 1.0 -- 2025-10-29
- Myanmar military shuts down a major cybercrime center, detains over 2k people -- 2025-10-29
- [Editorial] Virtual false positive, physical problems -- 2025-10-28
- Show HN: A fast, privacy-first image converter that runs in browser -- 2025-10-28
- Microsoft Releases AI Call Center Stack with Voice, SMS, and Memory -- 2025-10-28
- Robot Phone Home…Or Else -- 2025-10-28
- Local alternatives to Atlas -- 2025-10-27
- Pardus CLI: Ollama Support Gemini CLI. -- 2025-10-27
- karpathy/nanochat-d32 -- 2025-10-27
- Best LLM for 96G RTX Pro 6000 Blackwell? -- 2025-10-27
- Gemma3 model differencies -- 2025-10-27
- AMD iGPU + dGPU : llama.cpp tensor-split not working with Vulkan backend -- 2025-10-27
- Is GLM 4.5 / 4.6 really sensitive to quantisation? Or is vLLM stupifying the models? -- 2025-10-27
- Looking for some advice/input for LLM and more -- 2025-10-26
- AlpinDale/ssh-dashboard -- 2025-10-26
- GPU 101 and Triton kernels -- 2025-10-26
- Picture in Picture / Webcam detect model on HuggingFace -- 2025-10-25
- Show HN: Story Keeper – AI agents with narrative continuity instead of memory -- 2025-10-25
- Show HN: Deta Surf – An open source and local-first AI notebook -- 2025-10-25
- Show HN: Tommy – Turn ESP32 devices into through-wall motion sensors -- 2025-10-25
- Qwen3 Next support in llama.cpp ready for review -- 2025-10-25
- GLM Air REAP tool call problems -- 2025-10-25
- [Editorial] We need open, uncensored, & local -- 2025-10-23
- 🚀 HuggingFaceChat Omni: Dynamic policy-baed routing to 115+ LLMs -- 2025-10-23
- Preliminary support in llama.cpp for Qualcomm Hexagon NPU -- 2025-10-23
- LiquidAI/LFM2-2.6B -- 2025-10-23
- radicalnumerics/RND1-Base-0910 -- 2025-10-22
- Amadeus: Autoregressive Model with Bidirectional Attribute Modelling for Symbolic Music -- 2025-10-22
- I got Kokoro TTS running natively on iOS! 🎉 Natural-sounding speech synthesis entirely on-device -- 2025-10-22
- Mobile fully on device inference AI chat app with RAG support -- 2025-10-22
- I am generally impressed by iPhone 17 GPU -- 2025-10-22
- ⚡ Gemma 3 1B Smart Q4 — Bilingual (IT/EN) Offline AI for Raspberry Pi 4/5 -- 2025-10-22
- Valve Developer Contributes Major Improvement To RADV Vulkan For Llama.cpp AI -- 2025-10-22
- I want to build an AI inference server for 72B models...what should I do? -- 2025-10-22
- UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding -- 2025-10-21
- Built a 100% Local AI Medical Assistant in an afternoon - Zero Cloud, using LlamaFarm -- 2025-10-20
- Sharing my local voice-to-text setup on Apple Silicon (with fallback cascade) -- 2025-10-20
- LLM recomendation -- 2025-10-20
- Ollama's cloud what’s the limits? -- 2025-10-20
- Show HN: OnlyJPG – Client-Side PNG/HEIC/AVIF/PDF/etc to JPG -- 2025-10-20
- It has been 4 hrs since the release of nanochat from Karpathy and no sign of it here! A new full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase -- 2025-10-19
- [Editorial] Claude Skills are awesome, maybe a bigger deal than MCP -- 2025-10-18
- NVIDIA DGX Spark Benchmarks -- 2025-10-18
- Should I add another 5060 Ti 16GB or two? Already had 1 x 5070 Ti and 3 x 5060 Ti 16G -- 2025-10-18
- Reproducing Karpathy’s NanoChat on a Single GPU — Step by Step with AI Tools -- 2025-10-17
- Best path for a unified Gaming, AI & Server machine? Custom build vs. Mac Studio/DGX Spark -- 2025-10-17
- Ollama kinda dead since OpenAI partnership. Virtually no new models, and kimi2 is cloud only? Why? I run it fine locally with lmstudio. -- 2025-10-17
- Last week in Multimodal AI - Local Edition -- 2025-10-16
- BosonAI's Higgs-Llama-3-70B AWQ Quantized (140GB → 37GB) -- 2025-10-16
- Worthwhile using Ollama without nVidia? -- 2025-10-16
- LiquidAI/LFM2-8B-A1B-GGUF -- 2025-10-16
- [Editorial] Limitations of Normalization in Attention Mechanism -- 2025-10-16
- [Editorial] Explaining the voodoo behind how AI works -- 2025-10-16
- McGill-NLP/the-markovian-thinker -- 2025-10-16
- [Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
- [Editorial] ReasoningBank is a self-learning, local-first memory system -- 2025-10-16
- I tested if tiny LLMs can self-improve through memory: Qwen3-1.7B gained +8% accuracy on MATH problems -- 2025-10-16
- Tested 9 RAG query transformation techniques – HydE is absurdly underrated -- 2025-10-16
- Xrvitd/MeshMosaic -- 2025-10-15
- Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent -- 2025-10-15
- Nvidia breakthrough gives 4-bit pretraining technique the accuracy of FP8 -- 2025-10-15
- AI assisted suite - Doubt about n_gpu layer test -- 2025-10-15
- ibm-granite/granite-4.0-h-micro -- 2025-10-15
- Qwen/Qwen3-VL-235B-A22B-Instruct -- 2025-10-15
- Get your VLM running in 3 simple steps on Intel CPUs -- 2025-10-15
- Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search -- 2025-10-15
- How to re-create OpenAI Assistants locally? -- 2025-10-14
- M2 Max 96GB - llama.cpp with codex and gpt-oss 120b to edit files and github upload -- 2025-10-14
- Why You Should Build AI Agents with Ollama First -- 2025-10-14
- OpenWebUI en Docker no detecta modelo LLaMA3 instalado con Ollama en Linux -- 2025-10-14
- I made a plugin to run LLMs on phones -- 2025-10-13
- 🚀 ToolNeuron Beta-4.5 — Offline & Privacy-First AI Hub for Android! -- 2025-10-13
- Emacs agent-shell (powered by ACP) -- 2025-10-13
- install package to open web ui gpt api env -- 2025-10-13
- Nanocoder Continues to Grow - A Small Update -- 2025-10-13
- liangdabiao/autogen-financial-analysis -- 2025-10-13
- forkspacer/forkspacer -- 2025-10-13
- Atlassian announces Rovo Dev in general availability - full SDLC context-aware AI agent in Jira, CLI, IDE, Github and Bitbucket -- 2025-10-13
- What's the best local LLM for coding I can run on MacBook Pro M4 32Gb? -- 2025-10-12
- How do you benchmark the cognitive performance of local LLM models? -- 2025-10-12
- Introducing the ColBERT Nano series of models. All 3 of these models come in at less than 1 million parameters (250K, 450K, 950K) -- 2025-10-11
- LiquidAI/LFM2-8B-A1B -- 2025-10-11
- Qwen3-VL-30B-A3B-Thinking GGUF with llama.cpp patch to run it -- 2025-10-10
- Does quantization need training data and will it lower performance for task outside of training data? -- 2025-10-10
- Qwen/Qwen3-VL-30B-A3B-Instruct -- 2025-10-10
- Sharing my free tool for easy handwritten fine-tuning datasets! -- 2025-10-09
- Just finished a fun open source project, a full stack system that fetches RSS feeds, uses an AI agent pipeline to write new articles, and automatically serves them through a Next.js site all done locally with Ollama and ChromaDB. -- 2025-10-08
- Desktop app for running local LLMs -- 2025-10-08
- Transcribe and summarize your meetings - local-first - on MacOS -- 2025-10-08
- Awesome Local LLM Speech-to-Speech Models & Frameworks -- 2025-10-08
- FabioSarracino/VibeVoice-Large-Q8 -- 2025-10-08
- Granite 4.0 Micro (3.4B) running 100% locally in your browser w/ WebGPU acceleration -- 2025-10-08
- ibm-granite/granite-docling-258M -- 2025-10-08
- Local Open Deep Research with Offline Wikipedia Search Source -- 2025-10-07
- Ollama drops MI50 support -- 2025-10-07
- CoexistAI Now Supports Docker Setup, Also now you can turn any text into Podcasts and Speech Easily -- 2025-10-07
- MCP_File_Generation_Tool - v0.6.0 Update! -- 2025-10-07
- I created the cheapest possible AI voice agent (over 30x less expensive than Elevenlabs and OpenAI Realtime). Check out the Github repo below if you want to try it for yourself! -- 2025-10-07
- MaximeRivest/maivi -- 2025-10-07
- nineninesix/kani-tts-370m -- 2025-10-07
- Running Qwen3-VL-235B (Thinking & Instruct) AWQ on vLLM -- 2025-10-06
- Granite 4 H Tiny Q8 in RTX 3090, It's a context king. -- 2025-10-06
- Video2X 6.x — open-source upscaler + frame interpolation (Anime4K v4 / Real-ESRGAN / Real-CUGAN / RIFE) 🚀 -- 2025-10-06
- DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models -- 2025-10-05
- [Launch Ollama compatible] ShareAI (open beta) — decentralized AI gateway, Ollama-native -- 2025-10-04
- Creating a Full Stack App W/Cloudflare Works and BetterAuth -- 2025-10-04
- shareAI-lab/mini_claude_code -- 2025-10-04
- Show HN: Simple WhatsApp API (Open Source) -- 2025-10-04
- Use Remote Models on iOS with Noema -- 2025-10-04
- Best small model <3B for HomeAssistant -- 2025-10-04
- Nikity/lille-130m-instruct -- 2025-10-04
- I created a simple tool to manage your llama.cpp settings & installation -- 2025-10-03
- Looking for a web-based open-source Claude agent/orchestration framework (not for coding, just orchestration) -- 2025-10-03
- Codexia GUI for Codex CLI new features -- 2025-10-03
- ArchGW 🚀 - Use Ollama-based LLMs with Anthropic client (release 0.3.13) -- 2025-10-03
- [Editorial] Build tools for where people are at -- 2025-10-03
- AppUse : Create virtual desktops for AI agents to focus on specific apps -- 2025-10-03
- what was that? -- 2025-10-03
- Does Ollama immobilize GPUs / computing resources? -- 2025-10-02
- Any real alternatives to NotebookLM (closed-corpus only)? -- 2025-10-02
- Best instruct model that fits in 32gb VRAM -- 2025-10-02
- Qwen3-Omni thinking model running on local H100 (major leap over 2.5) -- 2025-09-30
- Seeking Advice: Best Model + Framework for Max Tokens/sec on Dual L40S (Testing Rig) -- 2025-09-30
- For local models, has anyone benchmarked tool calling protocols performance? -- 2025-09-30
- Geolocation and Starlink -- 2025-09-30
- MusicSwarm: Biologically Inspired Intelligence for Music Composition -- 2025-09-30
- Nexa SDK launch + past-month updates for local AI builders -- 2025-09-29
- wildminder/ComfyUI-VoxCPM -- 2025-09-29
- apple/FastVLM-7B -- 2025-09-29
- tencent/Hunyuan-MT-Chimera-7B -- 2025-09-29
- Qwen/Qwen3-Omni-30B-A3B-Captioner -- 2025-09-29
- The MoE tradeoff seems bad for local hosting -- 2025-09-29
- LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs -- 2025-09-29
- Apple called out every major AI company for fake reasoning and Anthropic's response proves their point -- 2025-09-29
- More money than brains... building a workstation for local LLM. -- 2025-09-28
- Fully-Local AI Agent Runs on Raspberry Pi, With a Little Patience -- 2025-09-28
- Reinforcement Learning with Rubric Anchors -- 2025-09-28
- Qwen3 235b Q2 with Celeron, 2x8gb of 2400 RAM, 96GB VRAM @ 18.71 t/s -- 2025-09-27
- PAR LLAMA v0.7.0 Released - Enhanced Security & Execution Experience -- 2025-09-27
- Swift Transformers Reaches 1.0 — and Looks to the Future -- 2025-09-27
- FireRedTeam/FireRedChat -- 2025-09-27
- Chaos96/NTPP -- 2025-09-27
- An Ollama user seeking uncensored models that can generate images -- 2025-09-26
- nunchaku-tech/nunchaku-qwen-image-edit -- 2025-09-26
- bytedance-research/HuMo -- 2025-09-26
- Mitigating Cross-Image Information Leakage in LVLMs for Multi-Image Tasks -- 2025-09-26
- How to change design of 3500 images fast,easy and extremely accurate? -- 2025-09-26
- GLM 4.5 Air Template Breaking llamacpp Prompt Caching -- 2025-09-25
- Tracking prompt evolution for RAG systems - anyone else doing this? -- 2025-09-25
- MAESTRO v0.1.6 Update: Better support for models that struggle with JSON mode (DeepSeek, Kimi K2, etc.) -- 2025-09-25
- Dead-simple example code for Ollama function calling. -- 2025-09-25
- Built LLM Colosseum - models battle each other in a kingdom system -- 2025-09-24
- Qwen 3 max released -- 2025-09-24
- Clauder, auto-updating toolkit for Claude Code, now ships with 65+ MCP servers -- 2025-09-24
- Show HN: I wrote inference for Qwen3 0.6B in C/CUDA -- 2025-09-24
- nvidia/canary-1b-v2 -- 2025-09-24
- baidu/ERNIE-4.5-21B-A3B-Thinking -- 2025-09-24
- Smol2Operator: Post-Training GUI Agents for Computer Use -- 2025-09-24
- I Upgrade 4090's to have 48gb VRAM: Comparative LLM Performance -- 2025-09-23
- Some things I learned about installing flash-attn -- 2025-09-23
- Google Android RAG SDK – Quick Comparison Study -- 2025-09-22
- Engineer's Guide to Local LLMs with LLaMA.cpp and QwenCode on Linux -- 2025-09-21
- gpt-oss-20b TTFT very slow with llama.cpp? -- 2025-09-21
- Unobtanium No More; Perhaps We Already Have All The Elements We Need -- 2025-09-21
- Listening for the Next Wow! Signal with Low-Cost SDR -- 2025-09-20
- vLLM is kinda awesome -- 2025-09-19
- Public AI on Hugging Face Inference Providers 🔥 -- 2025-09-19
- `LeRobotDataset`: Bringing large-scale datasets to lerobot -- 2025-09-17
- This AI assistant became our go-to Unity co-pilot (not just another LLM) -- 2025-09-17
- AI Bubble Watch -- 2025-09-16
- Built an OpenWebUI Mobile Companion (Conduit): Alternative to Commercial Chat Apps -- 2025-09-16
- Local LLM suite on iOS powered by llama cpp - with web search and RAG -- 2025-09-16
- What are the local TTS models with voice cloning? -- 2025-09-16
- Switched to LobeChat from OpenWebUI because of crappy web search and no reasoning level support: a review -- 2025-09-15
- 0.6.27 is out - New Changelog Style -- 2025-09-15
- I built Claude Context but 100% local - semantic code search with no API keys -- 2025-09-14
- Building my Local AI Studio -- 2025-09-14
- Deploying 1.4KW GPUs (B300) what's the biggest bottleneck you've seen power delivery or cooling? -- 2025-09-10
- New approach to block decoding from Meta, claims that around 4x inference speedup is possible, with 4x less compute passes at the same time. -- 2025-09-10
- Qwen3 30B A3B Q40 @ 13 tok/sec on Raspberry Pi cluster -- 2025-09-10
- SERVE 8B model directly from iPhone -- 2025-09-10
- I built a Graph RAG pipeline (VeritasGraph) that runs entirely locally with Ollama (Llama 3.1) and has full source attribution. -- 2025-09-09
- Built an offline AI CLI that generates apps and runs code safely -- 2025-09-09
- Three different models reviewing three different implementations coded by three different models -- 2025-09-09
- karpathy/rendergit -- 2025-09-09
- Shipping textures as PNGs is suboptimal -- 2025-09-09
- Qwen3 30B A3B 2507 Hybrid Deep Reasoning Showcase -- 2025-09-08
- High-level visual representations in the human brain are aligned with LLMs -- 2025-09-07
- [Project/Code] Fine-Tuning LLMs on Windows with GRPO + TRL -- 2025-09-07
- nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base -- 2025-09-07
- huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated -- 2025-09-07
- RX570 compatibility issues -- 2025-09-07
- Continue.dev setup -- 2025-09-07
- LiquidGEMM: Seems interesting -- 2025-09-07
- How to use a Hugging Face embedding model in Ollama -- 2025-09-07
- Wal3: A Write-Ahead Log for Chroma, Built on Object Storage -- 2025-09-07
- Running LLM Locally with Ollama + RAG -- 2025-09-07
- yaof20/Flash-RL -- 2025-09-07
- In-Browser AI: WebLLM + WASM + WebWorkers -- 2025-09-06
- FluidAudio, a local-first Swift SDK for real-time speaker diarization, ASR & audio processing on iOS/MacOS -- 2025-09-06
- Is there a way to have models load in to vram quicker, or stay alive without persisting in vram? Or are there alternatives for fast models? -- 2025-09-06
- A simple zsh function to bring “Copilot Inline Chat for Terminal” to any shell -- 2025-09-06
- Show HN: I built a deep research tool for local file system -- 2025-09-06
- Who Owns, Operates, and Develops Your VPN Matters -- 2025-09-06
- Vulkan back ends, what do you use? -- 2025-09-06
- Relaxed-System-Lab/Flash-Sparse-Attention -- 2025-09-06
- A multi-interface (REST and MCP) server for automatic license plate recognition 🚗 -- 2025-09-05
- Replay - like Git for App States and Agent Context -- 2025-09-05
- Bringing Computer Use to the Web -- 2025-09-05
- Missing Agents -- 2025-09-05
- Show HN: Woomarks, transfer your Pocket links to this app or self-host it -- 2025-09-05
- Capture and Plot Serial Data in the Browser -- 2025-09-05
- ECA: free vendor lock alternative -- 2025-09-05
- AMD Ryzen 7 8700G for Local AI: User Experience with Integrated Graphics? -- 2025-09-05
- ChatGPT on the Road: Leveraging Large Language Model-Powered In-vehicle Conversational Agents for Safer and More Enjoyable Driving Experience -- 2025-09-04
- Good setup for coder LLM under 12GB VRam and 64GB DDR5? -- 2025-09-04
- openai/gpt-oss-120b -- 2025-09-04
- gpt-oss:120b running on an AMD 7800X3D CPU and a 7900XTX GPU -- 2025-09-03
- unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF -- 2025-09-03
- RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) -- 2025-09-03
- High-Logic/Genie -- 2025-09-03
- tencent/Hunyuan-GameCraft-1.0 -- 2025-09-03
- Enchanted: A privacy-first personal AI app -- 2025-09-03
- OpenAI says it's scanning users' conversations and reporting content to police -- 2025-09-03
- How do you do RL 100% locally without a NVIDIA GPU? -- 2025-08-31
- NiceWebRL: a Python library for human subject experiments with reinforcement learning environments -- 2025-08-31
- DeepSeek V3.1 improves on the multiplayer Step Game social reasoning benchmark -- 2025-08-31
- I built Husk, a native, private, and open-source iOS client for your local models -- 2025-08-31
- How's Seed-OSS 39B for coding? -- 2025-08-30
- I built a local “second brain” AI that actually remembers everything (321 tests passed) -- 2025-08-30
- DeepSeek V3.1 dynamic Unsloth GGUFs + chat template fixes -- 2025-08-28
- PSA: OpenAI GPT-OSS running slow? Do not set top-k to 0! -- 2025-08-28
- Seamlessly bridge LM Studio and OpenWebUI with zero configuration -- 2025-08-28
- I built Husk, a native, private, and open-source iOS client for your local models -- 2025-08-28
- texttron/BrowseComp-Plus -- 2025-08-28
- Scaling RL to Long Videos -- 2025-08-28
- Local Inference for Very Large Models - a Look at Current Options -- 2025-08-27
- This is just a test but works -- 2025-08-27
- Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 -- 2025-08-27
- Is there any way to run 100-120B MoE models at >32k context at 30 tokens/second without spending a lot? -- 2025-08-26
- Right GPU for AI research -- 2025-08-26
- How to add pdf extract abilities -- 2025-08-26
- ollama + webui + iis reverse proxy -- 2025-08-26
- Nvidia's new 'robot brain' goes on sale for $3,499 -- 2025-08-26
- Help me decide between these two pc builds -- 2025-08-25
- Some legend finally posted working quants of GLM-4.5 Air for Ollama -- 2025-08-24
- Had some beginner questions regarding how to use Ollama? -- 2025-08-24
- AI Mode in Search gets new agentic features and expands globally -- 2025-08-24
- Practical approach for streaming UI from LLMs -- 2025-08-24
- Show HN: Pinch – macOS voice translation for real-time conversations -- 2025-08-23
- Replicating the World’s Oldest Stringed Instrument -- 2025-08-23
- COMponent-Aware Pruning for Accelerated Control Tasks in Latent Space Models -- 2025-08-23
- 🐧 llama.cpp on Steam Deck (Ubuntu 25.04) with GPU (Vulkan) — step-by-step that actually works -- 2025-08-22
- Running Qwen3-Coder-30B-A3 Q4_LM in Cursor with Agent Mode unlocked -- 2025-08-22
- Docker now support AI Models, anyone using it? -- 2025-08-22
- Why does gpt-oss 120b run slower in ollama than in LM Studio in my setup? -- 2025-08-22
- gguf-eval: an evaluation framework for GGUF models using llama.cpp -- 2025-08-21
- My open-source agent Maestro is now faster and lets you configure context limits for better local model support -- 2025-08-21
- guide : running gpt-oss with llama.cpp -- 2025-08-21
- Docker container for running Claude Code in "dangerously skip permissions" mode -- 2025-08-21
- Llama Habitat Continues to Expand, Now Includes the PSP -- 2025-08-21
- OpenAI Cookbook - Verifying gpt-oss implementations -- 2025-08-21
- LocalAI Major Update: Modular Backends (update llama.cpp, stablediffusion.cpp, and others independently!), Qwen-VL, Qwen-Image Support, Image Editing & More -- 2025-08-18
- Drop-in Voice App Control for iOS with Local Models -- 2025-08-18
- GPT-OSS 20b runs on a RasPi 5, 16gb -- 2025-08-18
- PowerInfer/SmallThinker-21BA3B-Instruct -- 2025-08-18
- Fun with RTX PRO 6000 Blackwell SE -- 2025-08-17
- Any tips/Advice for running gpt-oss-120b locally -- 2025-08-17
- LLM performance of tiny (<4B) models? -- 2025-08-17
- What "big" models can I run with this setup: 5070ti 16GB and 128GB ram, i9-13900k ? -- 2025-08-17
- HoML: vLLM's speed + Ollama like interface -- 2025-08-15
- HoML vs. Ollama: A Deep Dive into Performance -- 2025-08-15
- Sampler Settings for GLM 4.5-Air -- 2025-08-15
- Fully verbal LLM program for OSX using whisper, ollama & XTTS -- 2025-08-15
- NO WAY BACK -- 2025-08-14
- X-Omni-Team/X-Omni -- 2025-08-14
- SkyworkAI/Matrix-3D -- 2025-08-14
- Update for Maestro - A Self-Hosted Research Assistant. Now with Windows/macOS support, Word/MD files support, and a smarter writing agent -- 2025-08-13
- Llama.cpp Vulkan is awesome, It gave new life to my old RX580 -- 2025-08-13
- Pairs of GPUs for inference? -- 2025-08-13
- Ollama 2x mi50 32GB -- 2025-08-13
- Go 1.25 Is Released -- 2025-08-13
- Unsloth fixes chat_template (again). gpt-oss-120-high now scores 68.4 on Aider polyglot -- 2025-08-12
- GPT-OSS-120B runs on just 8GB VRAM & 64GB+ system RAM -- 2025-08-12
- How Attention Sinks Keep Language Models Stable -- 2025-08-12
- Run GPT-OSS with MLX or GGUF in your CLI using 1 line of code -- 2025-08-11
- Built a new VLM (MicroLlaVA) on a single NVIDIA 4090 -- 2025-08-11
- nvidia/audio-flamingo-3 -- 2025-08-11
- LGAI-EXAONE/EXAONE-4.0-32B -- 2025-08-11
- llamacpp+ROCm7 beta is now supported on Lemonade -- 2025-08-10
- gpt-oss 120B runs ~13tps on laptop with igpu -- 2025-08-10
- Throwing a MI50 32Gb in a gaming pc -- 2025-08-10
- One File, Six Formats: Just Change The Extension -- 2025-08-10
- kyutai/tts-voices -- 2025-08-08
- CodeFu-7B-v0.1 - a Reinforcement Learning (RL)-trained 7B model for Competitive Programming -- 2025-08-08
- lucidrains/h-net-dynamic-chunking -- 2025-08-08
- Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training -- 2025-08-08
- MAESTRO, a deep research assistant/RAG pipeline that runs on your local LLMs -- 2025-08-07
- Quantize your own GGUFs the same way as your fav Unsloth Dynamic GGUFs -- 2025-08-07
- Read your code -- 2025-08-07
- [Editorial] Turn-Taking model for Voice AI Agents -- 2025-08-06
- Saidia: Offline-First AI Assistant for Educators in low-connectivity regions -- 2025-08-06
- Finding a local model for text table QA -- 2025-08-06
- [Editorial] A claude code class -- 2025-08-06
- [Editorial] Claude code with open models -- 2025-08-06
- Ollamacode - Local AI assistant that can create, run and understand the task at hand! -- 2025-08-06
- Turn ChatGPT Into a Local Coding Agent -- 2025-08-06
- Managing Multiple Claude Code Sessions Without Git Worktrees -- 2025-08-06
- unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF -- 2025-08-06
- [Editorial] a more mature phase of the AI cycle. -- 2025-08-05
- glm-4.5-Air appreciation poist - if you have not done so already, give this model a try -- 2025-08-05
- How to locally run Grok 4 with 2x AMD 7900 XTX GPUs? (24 GB VRAM x2) -- 2025-08-05
- zai-org/GLM-4.5 -- 2025-08-05
- unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF -- 2025-08-05
- Learn Software-Defined Radio, GNURadio, RTL-SDR and PlutoSDR with Prof Jason -- 2025-08-05
- Help: Qwen3-Coder + LM Studio + Continue.dev (VSCode) + Mac 64GB M3 Max — 500 Internal Server Error, Even After Unsloth Fix -- 2025-08-04
- CWC now supports kimi.com (K2) and chat.z.ai (GLM-4.5) to enable coding with top tier models at no cost -- 2025-08-04
- 🌟 Ming-lite-omni v1.5 is here! Our recent upgrade for omni-modal AI! 🚀 -- 2025-08-04
- Wan-AI/Wan2.2-T2V-A14B -- 2025-08-04
- [Editorial] Agentic security testing -- 2025-08-04
- [Editorial] ML System Design Case Studies Repository -- 2025-08-04
- I built a GitHub scanner that automatically discovers your AI tools using a new .awesome-ai.md standard I created -- 2025-08-04
- [Editorial] Why open-source AI became an American national priority -- 2025-08-04
- Built a full stack web app builder that runs locally and gives you full control -- 2025-08-04
- I made a opensource CAL-AI alternative using ollama which runs completely locally and for is fully free. -- 2025-08-04
- Wondered why in-context learning works so well? Or, ever wonder why Claude mirrors your unique linguistic patterns within a convo? This may be why. -- 2025-08-03
- Unleashing the Editing Superpower of Emacs -- 2025-08-03
- Single-File Qwen3 Inference in Pure CUDA C -- 2025-08-03
- MAESTRO, a deep research assistant/RAG pipeline that runs on your local LLMs -- 2025-08-03
- Ursa: A leaderless, object storage–based alternative to Kafka -- 2025-08-03
- CoexistAI – LLM-Powered Research Assistant (Now with MCP, Vision, Local File Chat, and More) -- 2025-08-02
- Best <2B open-source LLMs for European languages? -- 2025-08-02
- Local TTS quality -- 2025-08-02
- NVIDIA RTX PRO 4000 Blackwell - 24GB GDDR7 -- 2025-08-02
- Help for new LLM Rig -- 2025-08-02
- My first finetune: Gemma 3 4B unslop via GRPO -- 2025-08-02
- Supervised Fine Tuning on Curated Data is Reinforcement Learning -- 2025-08-02
- What's the current go-to setup for a fully-local coding agent that continuously improves code? -- 2025-08-02
- Beginner-Friendly Guide to AWS Strands Agents -- 2025-08-02
- unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF -- 2025-08-02
- Suggest Best Coding model. -- 2025-08-02
- [Editorial] Voice AI -- 2025-08-01
- So you all loved my open-source voice AI when I first showed it off - I officially got response times to under 2 seconds AND it now fits all within 9 gigs of VRAM! Open Source Code included! -- 2025-08-01
- I built a local-first transcribing + summarizing tool that's FREE FOREVER -- 2025-08-01
- mistralai/Voxtral-Small-24B-2507 -- 2025-08-01
- 🚀 Qwen3-30B-A3B Small Update -- 2025-08-01
- These new Qwen3 models are cooking! -- 2025-08-01
- unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF -- 2025-08-01
- Quantize your own GGUFs the same way as your fav Unsloth Dynamic GGUFs -- 2025-08-01
- Implemented Test-Time Diffusion Deep Researcher (TTD-DR) - Turn any local LLM into a powerful research agent with real web sources -- 2025-07-31
- Introducing Agent Data Shuttle (ADS): fully open-source -- 2025-07-31
- Need help deciding on GPU options for inference -- 2025-07-31
- Has vLLM made Ollama and llama.cpp redundant? -- 2025-07-30
- Best Local LLM + Hardware Build for Coding With a $15k Budget (2025) -- 2025-07-29
- Optimizing inference on GPU + CPU -- 2025-07-29
- I got Ollama models running locally and exposed them via a public API with one command -- 2025-07-29
- I want to use llama 7b to check if a 5-7 sentence paragraph contains a given subject, what's the minimum GPU I need? -- 2025-07-29
- PrompTrend: Continuous Community-Driven Vulnerability Discovery and Assessment for Large Language Models -- 2025-07-29
- [Editorial] Local voice AI, 235B LLM -- 2025-07-28
- I stopped typing. Now I just use a hotkey. I built Agent-CLI to make it possible. -- 2025-07-28
- Local cross-platform speech-to-speech and real-time captioning with OpenAI Whisper, Vulkan GPU acceleration and more -- 2025-07-28
- Devstral & Magistral as adapters of Mistral -- 2025-07-28
- 🔓 I built Hearth-UI — A fully-featured desktop app for chatting with local LLMs (Ollama-ready, attachments, themes, markdown, and more) -- 2025-07-27
- UIGEN-X 8B supports React Headless, Flutter, React Native, Static Site Generators, Tauri, Vue, Gradio/Python, Tailwind, and prompt-based design. GGUF/GPTQ/MLX Available -- 2025-07-27
- Realtime codebase indexing for coding agents with ~ 50 lines of Python (open source) -- 2025-07-27
- Freigeist - The new Vibe Coding Platform -- 2025-07-27
- What are some unique uses of OpenWebUI that you can't get otherwise? -- 2025-07-27
- MegaTTS 3 Voice Cloning is Here -- 2025-07-27
- ControlGenAI/T-LoRA -- 2025-07-27
- tencent/HunyuanWorld-1 -- 2025-07-27
- Toward Real-World Table Agents: Capabilities, Workflows, and Design Principles for LLM-based Table Intelligence -- 2025-07-26
- Build advice: Consumer AI workstation with RTX 3090 + dual MI50s for LLM inference and Stable Diffusion (~$5k budget) -- 2025-07-26
- RTX 5090 (32GB VRAM) - Full Fine-Tuning: What Can I Expect? -- 2025-07-26
- Is there a reason to prefer Nvidia over AMD for programming use cases? -- 2025-07-26
- Entry GPU options - 5060 8GB enough to play with? -- 2025-07-26
- Best way to manage context/notes locally for API usage while optimizing token costs? -- 2025-07-26
- Advice on choice of model -- 2025-07-26
- Document processing -- 2025-07-26
- What upgrade option is better with $2000 available for my configuration? -- 2025-07-24
- Can hooks in CC custom slash commands trigger other commands? -- 2025-07-24
- AI Model Juggler automatically and transparently switches between LLM and image generation backends and models -- 2025-07-22
- Looking to possibly replace my ChatGPT subscription with running a local LLM. What local models match/rival 4o? -- 2025-07-22
- Nvidia GTX-1080Ti 11GB Vram -- 2025-07-22
- I messed up my brother's Llama AI workstation.. looking for advice -- 2025-07-22
- How do Claude Code token counts translate to “prompts” for usage limits? -- 2025-07-22
- Has anyone actually ran VLAs locally and how good are they? -- 2025-07-22
- google/medsiglip-448 -- 2025-07-22
- Questions about AI for translation -- 2025-07-22
- RTX 5090 performance with vLLM and batching? -- 2025-07-21
- How good are 2x 3090s for finetuning? -- 2025-07-21
- diptanshu1991/LoFT -- 2025-07-21
- LiquidAI/LFM2-1.2B -- 2025-07-21
- How can I benchmark different AI models? -- 2025-07-21
- Back to The Future: Evaluating AI Agents on Predicting Future Events -- 2025-07-20
- Localllama’s (first?) IFTA - I’ll Fine-Tune Anything -- 2025-07-20
- GitHub - boneylizard/Eloquent: A local front-end for open-weight LLMs with memory, RAG, TTS/STT, Elo ratings, and dynamic research tools. Built with React and FastAPI. -- 2025-07-18
- A free goldmine of tutorials for the components you need to create production-level agents Extensive open source resource with tutorials for creating robust AI agents -- 2025-07-18
- Five Big Improvements to Gradio MCP Servers -- 2025-07-18
- Migrating a semantically-anchored assistant from OpenAI to local environment (Domina): any successful examples of memory-aware agent migration? -- 2025-07-18
- Where local is lagging behind... Wish lists for the rest of 2025 -- 2025-07-18
- Devstral-Vision-Small-2507 -- 2025-07-18
- Xttsv2 model, Chatterbox on MacBook air 8 gb -- 2025-07-18
- Support for diffusion models (Dream 7B) has been merged into llama.cpp -- 2025-07-17
- ETH Zurich and EPFL will release a fully open-source LLM developed on public infrastructure. Trained on the “Alps” supercomputer at the Swiss National Supercomputing Centre (CSCS). Trained on 60% english/40% non-english, it will be released in 8B and 70B sizes. -- 2025-07-17
- Moonshot AI’s open source Kimi K2 outperforms GPT-4 in key benchmarks -- 2025-07-17
- Advice Needed: Best way to replace Together API with self-hosted LLM for high-concurrency app -- 2025-07-17
- baidu/ERNIE-4.5-0.3B-PT -- 2025-07-17
- LiquidAI/LFM2-700M -- 2025-07-17
- RekaAI/reka-flash-3.1 · Hugging Face -- 2025-07-17
- OPENCODE - Like Claude Code or Gemini CLI, but works with local models and/or paid ones as well -- 2025-07-15
- I built a Deep Researcher agent and exposed it as an MCP server! -- 2025-07-15
- awwaiid/gremllm -- 2025-07-15
- Ollama calling tools -- 2025-07-15
- The BastionRank Showdown: Crowning the Best On-Device AI Models of 2025 -- 2025-07-14
- Local Llama with Home Assistant Integration and Multilingual-Fuzzy naming -- 2025-07-14
- Podcast generation app -- works with Ollama -- 2025-07-14
- Show HN: Refine – A Local Alternative to Grammarly -- 2025-07-14
- Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure -- 2025-07-14
- Tencent-Hunyuan/Hunyuan-A13B -- 2025-07-14
- Nvidia RTX Pro 6000 (96 Gb) vs Apple M3 Ultra (512 Gb) -- 2025-07-14
- How fast is inference when utilizing DDR5 and PCIe 5.0x16? -- 2025-07-14
- Import of chatgbt Export Zip File with Images of entire previous chats -- 2025-07-14
- Kimi-K2 is a DeepSeek V3 with more experts -- 2025-07-14
- TheManticoreProject/LDAPWordlistHarvester -- 2025-07-14
- k2-fsa/ZipVoice -- 2025-07-13
- K-intelligence/Midm-2.0-Base-Instruct -- 2025-07-13
- support for Jamba hybrid Transformer-Mamba models has been merged into llama.cpp -- 2025-07-13
- Asynchronous Robot Inference: Decoupling Action Prediction and Execution -- 2025-07-13
- Suggestion: Grayscale-First Hack to Optimize Image Recognition in Grok—Save Compute Without Losing Accuracy? -- 2025-07-13
- Local LLM to back Elastic AI -- 2025-07-13
- Blackwell FP8 W8A8 NVFP4 support discussion -- 2025-07-13
- Is there some localllm benchmarking tool to see how well your system will handle a model? -- 2025-07-13
- Unlocking AMD MI300X for High-Throughput, Low-Cost LLM Inference -- 2025-07-13
- Local AI server with Ollama and Tailscale integration looking for feedback -- 2025-07-12
- Running OpenWebUI Without RAG: Faster Web Search & Document Upload -- 2025-07-12
- What impressive (borderline creepy) local AI tools can I run now that everything is local? -- 2025-07-12
- ScreenEnv: Deploy your full stack Desktop Agent -- 2025-07-12
- Local llms works great! -- 2025-07-12
- LiquidAI/LFM2-350M -- 2025-07-12
- QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference -- 2025-07-12
- Issues with Qwen 3 Embedding models (4B and 0.6B) -- 2025-07-12
- DIY Navigation System Floats this Boat -- 2025-07-12
- (Kramer UI for Ollama) I was tired of dealing with Docker, so I built a simple, portable Windows UI for Ollama. -- 2025-07-11
- BastionChat: Finally got Qwen3 + Gemma3 (thinking models) running locally on iPhone/iPad with full RAG and voice mode -- 2025-07-11
- Thanks to you, I built an open-source website that can watch your screen and trigger actions. It runs 100% locally and was inspired by all of you! -- 2025-07-11
- Preceptor – A Local AI Focus App That Nudges You Back on Track | Waitlist + Suggestions needed -- 2025-07-11
- How are you selecting LLMs? -- 2025-07-10
- Getting started with local AI -- 2025-07-10
- Ollama alternatives -- 2025-07-09
- Dealing with tool_calls hallucinations -- 2025-07-09
- Megakernel doubles Llama-1B inference speed for batch size 1 -- 2025-07-09
- Introducing an open source cross-platform graphical interface LLM client -- 2025-07-08
- llama-server vs llama python binding -- 2025-07-08
- Llama server completion not working correctly -- 2025-07-08
- introducing cocoindex - super simple etl to prepare data for ai, with dynamic index (ollama integrated) -- 2025-07-08
- Higher topk and num_ctx or map/reduce ? -- 2025-07-08
- Best model for a RX 6950xt? -- 2025-07-07
- I built a minimal Web UI for interacting with locally running Ollama models – lightweight, fast, and clean ✨ -- 2025-07-07
- I added Ollama support to AI Runner -- 2025-07-07
- use ollama with browser -- 2025-07-07
- langchain-ai/local-deep-researcher -- 2025-07-07
- Google quietly released an app that lets you download and run AI models locally -- 2025-07-07
- Helping someone build a local continuity LLM for writing and memory—does this setup make sense? -- 2025-07-06
- Is running a local LLM useful? How? -- 2025-07-06
- Any local models that has less restraints? -- 2025-07-06
- baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-07-06
- baidu/ERNIE-4.5-300B-A47B-PT -- 2025-07-06
- SSD Upgrade for Mac Mini M4 -- 2025-07-06
- Nvidia DGX Spark - what's the catch? -- 2025-07-06
- MCPVerse – An open playground for autonomous agents to publicly chat, react, publish, and exhibit emergent behavior -- 2025-07-06
- I made a free iOS app for people who run LLMs locally. It’s a chatbot that you can use away from home to interact with an LLM that runs locally on your desktop Mac. -- 2025-07-06
- [Open Source] Moondream MCP - Vision for AI Agents -- 2025-07-05
- Kyutai's STT with semantic VAD now opensource -- 2025-07-05
- brizzai/auto-mcp -- 2025-07-05
- Lifailon/openrouter-bot -- 2025-07-05
- Kyutai TTS is here: Real-time, voice-cloning, ultra-low-latency TTS, Robust Longform generation -- 2025-07-04
- Privacy preserving ChatGPT/Claude voice mode alternative -- 2025-07-04
- [Setup discussion] AMD RX 7900 XTX workstation for local LLMs — Linux or Windows as host OS? -- 2025-07-04
- 🧠💬 Introducing AI Dialogue Duo – A Two-AI Conversational Roleplay System (Open Source) -- 2025-07-04
- Qwen 2.5 32B or Similar Models -- 2025-07-04
- An optimizing compiler doesn't help much with long instruction dependencies -- 2025-07-04
- Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics -- 2025-07-04
- Transformers backend integration in SGLang -- 2025-07-04
- Tips for running a local RAG and llm? -- 2025-07-04
- I made an LLM tool to let you search offline Wikipedia/StackExchange/DevDocs ZIM files (llm-tools-kiwix, works with Python & LLM cli) -- 2025-07-04
- Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective -- 2025-07-04
- Llama-Server Launcher (Python with performance CUDA focus) -- 2025-07-03
- Lightweight Docker image for launching multiple MCP servers via MCPO with unified OpenAPI access -- 2025-07-03
- jetify-com/ai -- 2025-07-03
- supabase/supabase -- 2025-07-03
- Thinking about switching from cloud based AI to sth more local -- 2025-07-03
- Suggestions to build local voice assistant -- 2025-07-03
- google/gemma-3n-E4B -- 2025-07-03
- Local LLMs in web apps? -- 2025-07-03
- Running Deepseek R1 0528 q4_K_M and mlx 4-bit on a Mac Studio M3 -- 2025-07-02
- Self-hosting LLaMA: What are your biggest pain points? -- 2025-07-02
- Is AMD Ryzen AI Max+ 395 really the only consumer option for running Llama 70B locally? -- 2025-07-02
- Run Deepseek locally on a 24g GPU: Quantizing on our Giga Computing 6980P Xeon -- 2025-07-01
- I built a document workflow system using VLMs: processes complex docs end-to-end (runs locally!!) -- 2025-07-01
- Jan Nano + Deepseek R1: Combining Remote Reasoning with Local Models using MCP -- 2025-07-01
- Query Classifier for RAG - Save your $$$ and users from irrelevant responses -- 2025-07-01
- Building a memory-heavy AI agent — looking for local-first storage & recall solutions -- 2025-07-01
- Is there any easy way to get up and running with chatgpt-like capabilities at home? -- 2025-07-01
- No recognition of slavic characters. English characters recognized are separate singular characters, not a block of text when using PaddleOCR. -- 2025-07-01
- Tired of copy-pasting from ChatGPT for coding? I am building an open-source tool (Athanor) to fix that - Alpha testers/feedback wanted! -- 2025-07-01
- VideoGameBench from Princeton: Can vision-language models play 90s video games? -- 2025-07-01
- New band surges to 500k listeners on Spotify, but turns out it's AI slop -- 2025-07-01
- How we cut CKEditor's bundle size by 40% -- 2025-07-01
- My VSCode → AI chat website connector extension just got 3 new features! -- 2025-07-01
- Built an open-source DeepThink plugin that brings Gemini 2.5 style advanced reasoning to local models (DeepSeek R1, Qwen3, etc.) -- 2025-06-28
- Open-sourced Agent Gym: The framework behind mirau-agent's training data synthesis -- 2025-06-28
- Introducing llamate, a ollama-like tool to run and manage your local AI models easily -- 2025-06-28
- ## DL: CLI Downloader - Hugging Face, Llama.cpp, Auto-Updates & More! -- 2025-06-28
- how is MCP tool calling different form basic function calling? -- 2025-06-28
- Fine-Tuning SmolVLM for Receipt OCR -- 2025-06-28
- Anyone working on alternative representations of codebases for LLM's? -- 2025-06-28
- The Void IDE, Open-Source Alternative to Cursor, Released in Beta -- 2025-06-28
- Writing a basic Linux device driver when you know nothing about Linux drivers -- 2025-06-28
- Meta and Yandex exfiltrating tracking data on Android via WebRTC -- 2025-06-28
- Rocknix is an immutable Linux distribution for handheld gaming devices -- 2025-06-28
- Ever wanted to embed Open WebUI into existing sites, apps or tools? Add a simple, embedded widget with just a few lines of code! -- 2025-06-28
- Does this mean we are free from the shackles of CUDA? We can use AMD GPUs wired up together to run models ? -- 2025-06-28
- black-forest-labs/FLUX.1-Kontext-dev -- 2025-06-27
- 0.71-{\AA} resolution electron tomography enabled by deep learning aided information recovery -- 2025-06-26
- MeiGen-AI/MeiGen-MultiTalk -- 2025-06-26
- kyutai/stt-1b-en_fr -- 2025-06-24
- openai/whisper-large-v3 -- 2025-06-23
- Quartet - a new algorithm for training LLMs in native FP4 on 5090s -- 2025-06-22
- Open Discussion: Improving HTML-to-Markdown Extraction Using Local LLMs (7B/8B, llama.cpp) – Seeking Feedback on My Approach! -- 2025-06-22
- Update:My agent model now supports OpenAI function calling format! (mirau-agent-base) -- 2025-06-22
- 🧙♂️ I Built a Local AI Dungeon Master – Meet Dungeo_ai (Open Source & Powered by your local LLM ) -- 2025-06-22
- Got an LLM to write a fully standards-compliant HTTP 2.0 server via a code-compile-test loop -- 2025-06-22
- Use offline voice controlled agents to search and browse the internet with a contextually aware LLM in the next version of AI Runner -- 2025-06-22
- ReMind: AI-Powered Study Companion that Transforms how You Retain Knowledge! -- 2025-06-22
- My AI coding workflow that's actually working (not just hype) -- 2025-06-22
- Double-Entry Ledgers: The Missing Primitive in Modern Software -- 2025-06-22
- gpt_agents.py -- 2025-06-22
- crumbyte/noxdir -- 2025-06-21
- strapi/strapi -- 2025-06-21
- Optimized Chatterbox TTS (Up to 2-4x non-batched speedup) -- 2025-06-20
- [Discussion] Thinking Without Words: Continuous latent reasoning for local LLaMA inference – feedback? -- 2025-06-20
- Created a more accurate local speech-to-text tool for your Mac -- 2025-06-20
- Attention by Hand - Practice attention mechanism on an interactive webpage -- 2025-06-20
- Major update to my voice extractor (speech dataset creation program) -- 2025-06-20
- Building a Text Adventure Game with Persistent AI Agents Using Ollama -- 2025-06-20
- Azure OpenAI with latest version of NVIDIA'S Nemo Guardrails throwing error -- 2025-06-20
- GitHub RAG MCP Server - A GitIngest alternative for any IDE -- 2025-06-20
- Ruby on Rails Audit Complete -- 2025-06-20
- largest context window model for 24GB VRAM? -- 2025-06-20
- GeneralistAI – Research Preview of Dextrous Bimanual Robotic Manipulation -- 2025-06-19
- llama-server is cooking! gemma3 27b, 100K context, vision on one 24GB GPU. -- 2025-06-18
- I built a lightweight, private, MCP server to share context between AI tools -- 2025-06-18
- KVzip: Query-agnostic KV Cache Eviction — 3~4× memory reduction and 2× lower decoding latency -- 2025-06-18
- Ollama now supports streaming responses with tool calling -- 2025-06-18
- Chainlit or Open webui for production? -- 2025-06-18
- Ollama not releasing VRAM after running a model -- 2025-06-18
- Help Shape the Future of AI in India - Survey on Local vs Cloud LLM Usage (Developers/Students/AI Enthusiasts) -- 2025-06-18
- llmcontext: Attach you whole project in large context chats -- 2025-06-18
- Semantic search engine for ArXiv, biorxiv and medrxiv -- 2025-06-18
- Oodle 2.9.14 and Intel 13th/14th gen CPUs -- 2025-06-18
- A multi-turn tool-calling base model for RL agent training -- 2025-06-18
- GUI RAG that can do an unlimited number of documents, or at least many -- 2025-06-18
- openbmb/MiniCPM4-8B -- 2025-06-17
- lym00/Wan2.1-T2V-1.3B-Self-Forcing-VACE-Addon-Experiment -- 2025-06-17
- Chinese AI firms smuggling suitcases full of hard drives to dodge US chip curbs -- 2025-06-17
- lapce/lapce -- 2025-06-16
- binary-husky/gpt_academic -- 2025-06-16
- mistralai/Magistral-Small-2506_gguf -- 2025-06-14
- maomaocun/dLLM-cache -- 2025-06-13
- AIR-THU/Asyncdriver-Tensorrt -- 2025-06-13
- System Prompt Learning: Teaching your local LLMs to learn problem-solving strategies from experience (optillm plugin) -- 2025-06-11
- GitHub - som1tokmynam/FusionQuant: FusionQuant Model Merge & GGUF Conversion Pipeline - Your Free Toolkit for Custom LLMs! -- 2025-06-11
- Semantic Search PoC for Hugging Face – Now with Parameter Size Filters (0-1B to 70B+) -- 2025-06-11
- Has anyone had success implementing a local FIM model? -- 2025-06-11
- Which agent-like terminal do you guys use? Something like Warp but free. -- 2025-06-11
- Rocm or vulkan support for AMD Radeon 780M? -- 2025-06-11
- What are the most important stages to learn ML properly, step by step? -- 2025-06-11
- What's the best open source coding agent as of now that can be run locally and can even test the created APIs by running the application and calling the endpoinst with various payloads? -- 2025-06-11
- Gemini 2.5: Our most intelligent models are getting even better -- 2025-06-11
- Hugging Face unveils two new humanoid robots -- 2025-06-11
- Quick reference: Configure Ollama, Open WebUI installation paths in Windows 11 -- 2025-06-11
- Is there an alternative to LM Studio with first class support for MLX models? -- 2025-06-11
- litert-community/Gemma3-1B-IT -- 2025-06-11
- Qwen/Qwen3-Reranker-8B -- 2025-06-11
- 1001 Ways of Scenario Generation for Testing of Self-driving Cars: A Survey -- 2025-06-11
- 100G Data Center Interconnections with Silicon Dual-Drive Mach-Zehnder Modulator and Direct Detection -- 2025-06-11
- CJackHwang/AIstudioProxyAPI -- 2025-06-11
- Paper2Poster/Paper2Poster -- 2025-06-10
- LLM-D: Kubernetes-Native Distributed Inference -- 2025-06-09
- manycore-research/SpatialLM -- 2025-06-09
- XiaomiMiMo/MiMo-VL-7B-RL -- 2025-06-09
- fishaudio/openaudio-s1-mini -- 2025-06-09
- AnythingLLM RAG with Gemma 3:12b & BGE-m3-F16: LM Studio vs. Ollama Embedding Discrepancies - Same GGUF, Different Results? -- 2025-06-09
- What Models for C/C++? -- 2025-06-09
- Help with guardrails ai and local ollama model -- 2025-06-09
- Setup Recommendation for University (H200 vs RTX 6000 Pro) -- 2025-06-09
- LlamaFirewall: framework open source per rilevare e mitigare i rischi per la sicurezza incentrati sull'intelligenza artificiale - Help Net Security -- 2025-06-09
- Are autoencoders really need for anomaly detection in time series? -- 2025-06-09
- Backdoored malware repos traced to single GitHub user -- 2025-06-09
- Improper Access Control Allows All Users to View Private Content. Am I doing it wrong ? -- 2025-06-09
- Agno Now Supports Dual Model Output (Reasoning + Structure) -- 2025-06-09
- POC: Running up to 123B as a Letterfriend on <300€ for all hardware. -- 2025-06-07
- Using LLaMA 3 locally to plan macOS UI actions (Vision + Accessibility demo) -- 2025-06-07
- Sarvam-M a 24B open-weights hybrid reasoning model -- 2025-06-07
- why isn’t anyone building legit tools with local LLMs? -- 2025-06-07
- Anyone on Oahu want to let me borrow an RTX 6000 Pro to benchmark against this dual 5090 rig? -- 2025-06-07
- Strange memory usage -- 2025-06-07
- jax and jaxlib in ubuntu -- 2025-06-07
- We accidentally solved the biggest bottleneck in vibe coding: secret sprawl aka secret leaks -- 2025-06-07
- Improving performance of rav1d video decoder -- 2025-06-07
- KumoRFM: Gen-purpose model for making instant predictions over relational data -- 2025-06-07
- The copilot delusion -- 2025-06-07
- AI is getting insane (generating 3d models ChatGPT + 3daistudio.com or open source models) -- 2025-06-07
- Is Qwen the new face of local LLMs? -- 2025-06-07
- EvolutionAPI/evo-ai -- 2025-06-06
- ByteDance Bagel 14B MOE (7B active) Multimodal with image generation (open source, apache license) -- 2025-06-05
- VLLM with 4x7900xtx with Qwen3-235B-A22B-UD-Q2_K_XL -- 2025-06-05
- I would really like to start digging deeper into LLMs. If I have $1500-$2000 to spend, what hardware setup would you recommend assuming I have nothing currently. -- 2025-06-05
- Having trouble getting to 1-2req/s with vllm and Qwen3 30B-A3B -- 2025-06-05
- Trying to get to 24gb of vram - what are some sane options? -- 2025-06-05
- Locally downloading Qwen pretrained weights for finetuning -- 2025-06-05
- Web Application Frameworks Best Suited for AI Coding Assistants - putting the chicken before the egg. -- 2025-06-05
- Debian AI General Resolution Withdrawn -- 2025-06-05
- Authors Are Accidentally Leaving AI Prompts in Their Novels -- 2025-06-05
- Cancelling internet & switching to a LLM: what is the optimal model? -- 2025-06-05
- Show HN: I built an AI Agent that uses the iPhone -- 2025-06-04
- PipesHub - Open Source Enterprise Search Platform(Generative-AI Powered) -- 2025-06-04
- A Privacy-Focused Perplexity That Runs Locally on Your Phone -- 2025-06-04
- RL Based Sales Conversion - I Just built a PyPI package -- 2025-06-04
- Is a VectorDB the best solution for this? -- 2025-06-04
- Finetuning or running the new gemma 3n models locally? -- 2025-06-04
- Automate Your CSV Analysis with AI Agents – CrewAI + Ollama -- 2025-06-04
- A simple guide to downloading models using Open WebUI & Ollama — no stress, just steps -- 2025-06-04
- what's the best ai model for large refactors? -- 2025-06-04
- Show HN: Wetlands – a lightweight Python library for managing Conda environments -- 2025-06-04
- Deadlocks in Go: the dark side of concurrency (2021) -- 2025-06-04
- 100 Drivers, 2200 km: A Natural Dataset of Driving Style toward Human-centered Intelligent Driving Systems -- 2025-06-04
- 100+ Metrics for Software Startups - A Multi-Vocal Literature Review -- 2025-06-04
- Yappus. Your Terminal Just Started Talking Back (The Fuck, but Better) -- 2025-06-03
- GPU consideration: AMD Pro W7800 -- 2025-06-03
- Thoughts on which open source is best for what use-cases -- 2025-06-03
- I accidentally too many P100 -- 2025-06-03
- Has anyone come across a good (open source) -- 2025-06-03
- Need Suggestions regarding ML Laptop Configuration -- 2025-06-03
- Web search tool - bing decommissioning -- 2025-06-03
- LLM function calls don't scale; code orchestration is simpler, more effective -- 2025-06-03
- GitHub issues is almost the best notebook in the world -- 2025-06-03
- Strengths and limitations of diffusion language models -- 2025-06-03
- How LLM uses MCP tools setup in OpenWebUI ? -- 2025-06-03
- Database_url string for mysql -- 2025-06-03
- kubeflow/kubeflow -- 2025-06-03
- Enhancing MySQL: MySQL improvement project -- 2025-06-03
- PRIME-RL/Entropy-Mechanism-of-RL -- 2025-06-02
- Atlas: Learning to Optimally Memorize the Context at Test Time -- 2025-06-02
- I'm building a Self-Hosted Alternative to OpenAI Code Interpreter, E2B -- 2025-06-01
- Giving Qwen 3 0.6B a Toolbelt in the form of MCP Support, Running Locally in Your Browser with Adjustable Thinking! -- 2025-06-01
- Turning my PC into a headless AI workstation -- 2025-06-01
- Bind tools to a model for use with Ollama and OpenWebUI -- 2025-06-01
- I know it's -- 2025-06-01
- We believe the future of AI is local, private, and personalized. -- 2025-06-01
- image search and query with natural language that runs on the local machine -- 2025-06-01
- What's the verdict on the new OpenAI Codex? -- how's code quality? Comparing to Cursor? -- 2025-06-01
- MCP explained without hype or fluff -- 2025-06-01
- Augmented Coding: Better with Principles -- 2025-06-01
- Best open source model for enterprise conversational support agent - worth it? -- 2025-06-01
- Speed-up VLLM server boot -- 2025-06-01
- dipampaul17/KVSplit -- 2025-06-01
- mcp-use/mcp-use -- 2025-05-30
- Cloi CLI: Local debugging agent that runs in your terminal -- 2025-05-30
- mistralai/Devstral-Small-2505_gguf -- 2025-05-30
- Open-Sourced Multimodal Large Diffusion Language Models -- 2025-05-30
- Built a Python library for text classification because I got tired of reinventing the wheel -- 2025-05-30
- Cleaning up responses to fix up synthetic data -- 2025-05-30
- My Gemma-3 musing .... after a good time dragging it through a grinder -- 2025-05-30
- Title: Seeking Help: A -- 2025-05-30
- Trying to learn ML - Book Recommendations -- 2025-05-30
- How to have cursor auto-apply code suggestions? -- 2025-05-30
- Triangle splatting: radiance fields represented by triangles -- 2025-05-30
- Why I Built My Own Audio Player -- 2025-05-30
- Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming -- 2025-05-30
- IDEA: Record your voice prompts, copy them straight into Ollama (100% local) -- 2025-05-30
- Migration to Postgres - Success -- 2025-05-30
- jinn091/go-form-parser -- 2025-05-28
- doganarif/GoVisual -- 2025-05-28
- Is Microsoft’s new Foundry Local going to be the “easy button” for running newer transformers models locally? -- 2025-05-28
- Cobolt is now available on Linux! 🎉 -- 2025-05-28
- Round Up: Current Best Local Models under 40B for Code & Tool Calling, General Chatting, Vision, and Creative Story Writing. -- 2025-05-28
- Best local model for M2 16gb MacBook Air for Analyzing Transcripts -- 2025-05-28
- Prompt Debugging -- 2025-05-28
- Not so Smart Agent (Ollama, Spring AI, MCP) -- 2025-05-28
- [Q] How can one get better at fixing models,training etc.? -- 2025-05-28
- FOSS - MCP Server generator from OpenAPI specification files (swagger/etapi) -- 2025-05-28
- Digital Payment System GNU Taler Gets Green Light to Operate in Switzerland -- 2025-05-28