Model Architecture
Transformers, mixture of experts, attention mechanisms, model design
478 articles across 132 editions
Articles
- Liquid AI releases LFM2-24B-A2B -- 2026-02-24
- Qwen3's most underrated feature: Voice embeddings -- 2026-02-24
- After many contributions craft, Crane now officially supports Qwen3-TTS! -- 2026-02-24
- We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file. -- 2026-02-24
- [Editorial] https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf -- 2026-02-12
- [Editorial] https://d3lm.medium.com/overly-agentic-why-anthropic-is-worried-about-opus-4-6-17eee0f8e5cd -- 2026-02-12
- [Editorial] https://www.linkedin.com/posts/avipil_i-got-my-first-bill-after-switching-to-claude-activity-7427320523870629889-vM5K -- 2026-02-12
- Pros/Cons and use case for bypassing permissions -- 2026-02-12
- [Editorial] https://www.kylerush.org/posts/opus-4-5-really-changed-things -- 2026-02-10
- [Editorial] https://www.linkedin.com/posts/ownyourai_i-taught-my-claude-code-to-swallow-a-32m-activity-7426902541868728321-2Z36 -- 2026-02-10
- [Editorial] https://docs.google.com/document/d/1I9r21TyQuAO1y2ecztBU0PSCpjHSL_vZJiA5v276Wro/mobilebasic -- 2026-02-10
- [Editorial] https://www.linkedin.com/posts/cole-medin-727752184_claude-codes-new-agent-teams-feature-is-share-7426633806792609792-pT3s -- 2026-02-10
- [Tool] claude-config-sync: Sync your Claude Code configuration across machines using GitHub Gists -- 2026-02-10
- [Editorial] https://www.linkedin.com/posts/ownyourai_i-just-woke-up-to-qwen3-coder-next-80b-activity-7424703876240695297-Nlqf -- 2026-02-04
- LiquidAI/LFM2.5-1.2B-Thinking -- 2026-02-04
- ByteDance-Seed/Stable-DiffCoder-8B-Instruct · Hugging Face -- 2026-02-03
- tencent/Youtu-VL-4B-Instruct -- 2026-02-03
- NousResearch/NousCoder-14B -- 2026-02-03
- transformers v5 final is out 🔥 -- 2026-01-30
- PaddlePaddle/PaddleOCR-VL-1.5 -- 2026-01-30
- zai-org/GLM-4.7 -- 2026-01-30
- MHA2MLA-VLM: Enabling DeepSeek's Economical Multi-Head Latent Attention across Vision-Language Models -- 2026-01-30
- Sharing my set of distilled small language models (3B) + training data in more than 50 low-resource languages -- 2026-01-30
- Introducing Kimi K2.5, Open-Source Visual Agentic Intelligence -- 2026-01-28
- ~60GB models on coding: GLM 4.7 Flash vs. GPT OSS 120B vs. Qwen3 Coder 30B -- your comparisons? -- 2026-01-28
- openbmb/AgentCPM-Report -- 2026-01-28
- Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice -- 2026-01-28
- Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation -- 2026-01-28
- One Year Since the “DeepSeek Moment” -- 2026-01-28
- Qwen/Qwen-Image-2512 -- 2026-01-28
- [Editorial] https://www.linkedin.com/posts/ownyourai_deepseek-just-released-the-first-vision-ai-activity-7421818927657385987-V1yo -- 2026-01-27
- Unsloth announces support for finetuning embedding models -- 2026-01-27
- Qwen/Qwen3-TTS-12Hz-0.6B-Base -- 2026-01-27
- Qwen/Qwen3-VL-Reranker-2B -- 2026-01-27
- [Editorial] https://www.linkedin.com/posts/ivandj_early-claims-around-self-evolving-memory-activity-7421307316437676033-l0Jm -- 2026-01-26
- [Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-world-model-activity-7421556928910290944-cx4v -- 2026-01-26
- stepfun-ai/Step3-VL-10B -- 2026-01-26
- Qwen/Qwen3-VL-Embedding-2B -- 2026-01-21
- Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2026-01-21
- Qwen/Qwen3-VL-Reranker-8B -- 2026-01-21
- Bartowski comes through again. GLM 4.7 flash GGUF -- 2026-01-21
- Step-Audio-R1.1 (Open Weight) by StepFun just set a new SOTA on the Artificial Analysis Speech Reasoning leaderboard -- 2026-01-20
- GLM-4.7-Flash -- 2026-01-20
- stepfun-ai/Step-Audio-R1.1 -- 2026-01-20
- tencent/HY-MT1.5-1.8B -- 2026-01-20
- Introducing GLM-Image -- 2026-01-14
- GPT-OSS -> MLA conversion breakthrough (20B), still looking for compute + collaborators -- 2026-01-14
- FrogBoss 32B and FrogMini 14B from Microsoft -- 2026-01-14
- Qwen/Qwen3-VL-Embedding-8B -- 2026-01-14
- MCP for Financial Ontology! -- 2026-01-09
- A community index for MCPs that don’t disappear after the thread ends -- 2026-01-09
- tencent/HY-WorldPlay -- 2026-01-09
- meituan-longcat/LongCat-Image -- 2026-01-09
- Introducing Falcon H1R 7B -- 2026-01-09
- [Editorial] https://www.linkedin.com/posts/ernst-van-gassen-9196a7b5_we-spend-a-lot-of-time-trying-to-make-prompts-ugcPost-7414966440023482368-dgh5 -- 2026-01-09
- [Editorial] https://leakhub.ai/ -- 2026-01-09
- [Editorial] https://www.linkedin.com/posts/hardmaru_survival-of-the-fittest-code-blog-https-activity-7415068590485458944-3tqP -- 2026-01-09
- A closer look at a BGP anomaly in Venezuela -- 2026-01-09
- Show HN: I visualized the entire history of Citi Bike in the browser -- 2026-01-09
- Modifying a QingPing Air Quality Monitor for Local MQTT Access -- 2026-01-09
- [Editorial] https://github.com/hiyouga/LlamaFactory -- 2026-01-07
- Tongyi-MAI/MAI-UI-8B · Hugging Face -- 2026-01-07
- MultiverseComputingCAI/HyperNova-60B · Hugging Face -- 2026-01-07
- [Editorial] https://www.alwaysfurther.ai/blog/train-4b-model-to-beat-claude-sonnet-gemini -- 2026-01-05
- [Experimental] Gemma 3 4B - Dark CoT: Pushing 4B Reasoning to 33%+ on GPQA Diamond -- 2026-01-05
- Youtu-LLM-2B-GGUF is here! -- 2026-01-05
- upstage/Solar-Open-100B -- 2026-01-05
- EditMGT — fast, localized image editing with Masked Generative Transformers -- 2025-12-30
- Francis-Rings/FlashPortrait -- 2025-12-30
- zai-org/GLM-TTS -- 2025-12-30
- Flowception: Temporally Expansive Flow Matching for Video Generation -- 2025-12-30
- I've been experimenting with SLM's a lot recently. My goal was to prove even SLMs can be accurate with the right architecture behind it. -- 2025-12-22
- MBZUAI releases K2-V2 - 70B fully open model. -- 2025-12-22
- microsoft/Fara-7B -- 2025-12-22
- Alibaba Tongyi Open Sources Two Audio Models: Fun-CosyVoice 3.0 (TTS) and Fun-ASR-Nano-2512 (ASR) -- 2025-12-19
- My professor lent me an A6000, so I tried to build a coding model. Here is Anni! (Qwen3-14B Fine-tune) -- 2025-12-19
- Two years ago, I was just a math major. Now I've built the 1.5B router model used by HuggingFace. Can I bring it to Cursor? -- 2025-12-19
- Independent evaluation of GPT5.2 on SWE-bench: 5.2 high is #3 behind Gemini, 5.2 medium behind Sonnet 4.5 -- 2025-12-16
- Did I overhype Claude Code? GPT + Comet are quietly beating it for me -- 2025-12-16
- 🚀 New: Olmo 3.1 Think 32B & Olmo 3.1 Instruct 32B -- 2025-12-16
- ByteDance/Dolphin-v2 -- 2025-12-16
- zai-org/AutoGLM-Phone-9B-Multilingual -- 2025-12-16
- [Editorial] https://www.linkedin.com/posts/eric-vyacheslav-156273169_a-7m-model-just-surpassed-deepseek-r1-gemini-activity-7405985266043297792-s1Jn -- 2025-12-15
- Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models -- 2025-12-15
- New in llama.cpp: Model Management -- 2025-12-12
- Successful prototype component prompt - For big projects -- 2025-12-12
- Building RNJ-1: What makes It different from Gemma 3? -- 2025-12-11
- EssentialAI/rnj-1 -- 2025-12-11
- From Azure Functions to FreeBSD -- 2025-12-10
- [Editorial] https://www.linkedin.com/posts/dakharlamov_too-slow-thats-what-they-called-engineers-activity-7403964370390646784-J27P -- 2025-12-09
- [Editorial] https://arxiv.org/abs/2511.20920 -- 2025-12-09
- [Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_your-ai-agents-context-window-is-not-a-database-activity-7403394612893024256-9S87 -- 2025-12-09
- [Editorial] https://www.linkedin.com/posts/reuvencohen_introducing-ruvector-postgres-a-self-learning-activity-7403841311029837824-jogp -- 2025-12-09
- Emacs is my new window manager -- 2025-12-08
- shubh-io/DockMate -- 2025-12-08
- Comfy-Org/HunyuanVideo_1.5_repackaged -- 2025-12-08
- We were tired of guessing which local model to use for which query. built a speculative execution lib that figures it out (github) -- 2025-12-05
- Nimony (eventually Nim 3.0) Design Principles -- 2025-12-03
- nvidia/Orchestrator-8B · Hugging Face -- 2025-12-03
- allenai/Olmo-3-1125-32B -- 2025-12-03
- moonshotai/Kimi-Linear-48B-A3B-Instruct -- 2025-12-02
- orabazes/FLUX.2-dev-GGUF -- 2025-12-02
- Transformers v5: Simple model definitions powering the AI ecosystem -- 2025-12-02
- Pocketbase – open-source realtime back end in 1 file -- 2025-11-28
- [Editorial] https://www.rand.org/pubs/commentary/2025/11/electromagnetic-warfare-natos-blind-spot-could-decide.html -- 2025-11-28
- In depth analysis of Nvidia's Jet Nemotron models -- 2025-11-26
- Hidden causes of LLM latency, its not just the model size -- 2025-11-26
- Benchmark: Self-Hosted Qwen-30B (LoRA) vs. Llama-3.1-8B vs. GPT-4.1-nano. Comparison of parsing success rates and negative constraints. -- 2025-11-25
- [Editorial] https://arxiv.org/html/2510.04871v1 -- 2025-11-24
- [Editorial] https://www.linkedin.com/posts/ingason_agenticai-erp-aiagents-activity-7396551539538141185-PxNU -- 2025-11-24
- facebook/sam3 -- 2025-11-24
- rl-research/DR-Tulu-8B -- 2025-11-24
- Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning -- 2025-11-24
- marinero4972/Open-o3-Video -- 2025-11-17
- nanonets/Nanonets-OCR2-3B -- 2025-11-17
- nvidia/ChronoEdit-14B-Diffusers-Upscaler-Lora -- 2025-11-17
- ibm-granite/granite-4.0-h-350m -- 2025-11-14
- tencent/HunyuanWorld-Mirror -- 2025-11-14
- [Editorial] https://www.npmjs.com/package/neural-trader -- 2025-11-14
- Critical RCE patched in Imunify360 affects up to 50M+ websites -- 2025-11-14
- Kubernetes Ingress Nginx is retiring -- 2025-11-14
- About KeePassXC's Code Quality Control -- 2025-11-14
- Visualizing Quantization Types -- 2025-11-11
- Co-authored a book called "Build DeepSeek from Scratch" | Live Now -- 2025-11-11
- Writing your own BEAM -- 2025-11-11
- DIY Powerwall Blows Clouds, Competition Out of the Water -- 2025-11-11
- BAAI/Emu3.5-Image -- 2025-11-07
- Riemannian Optimization for LoRA on the Stiefel Manifold -- 2025-11-07
- "On-the-fly" code reviews with ollama. It kinda works.. -- 2025-11-07
- I Built an "AI Art Director" Agent to Orchestrate Image and Video Models. -- 2025-11-07
- What's your most unexpected Claude workflow discovery? -- 2025-11-07
- [Research] Cross-Stage Vulnerabilities in Large Language Model Architectures -- 2025-11-07
- runZeroInc/runZeroHound -- 2025-11-07
- openai/gpt-oss-safeguard-120b -- 2025-11-07
- Why does Image Recognition work in llama-server but not through Open WebUI? -- 2025-11-06
- Has anyone tested ollama on Whisplay HAT with Raspberry pi zero 2W? -- 2025-11-06
- allenai/olmOCR-2-7B-1025 -- 2025-11-06
- is there simple way like .bat to compress to q4-q8 like Unsloth, Qwen3-VL-30B-A3B-Thinking-abliterated model -- 2025-11-06
- MiniMax M2 Llama.cpp support merged -- 2025-11-05
- Which AI IDE should I use under $20/month? -- 2025-11-05
- lzA6/video-to-txt -- 2025-11-05
- xaviviro/python-toon -- 2025-11-05
- [Editorial] https://www.evokesecurity.com/blogs/prompt-injection-is-for-everyone -- 2025-11-04
- Found a remote file inclusion vulnerability in an AI-generated app before launch -- 2025-11-04
- Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously -- 2025-11-04
- Attacking macOS XPC Helpers: Protocol Reverse Engineering and Interface Analysis -- 2025-11-04
- [Research] Unvalidated Trust: Cross-Stage Failure Modes in LLM/agent pipelines arXiv -- 2025-11-04
- [Editorial] Context Engineering Handbook -- 2025-11-03
- Qwen3-VL-32B Q8 speeds in llama.cpp vs vLLM FP8 on a RTX PRO 6000 -- 2025-11-03
- OSS alternative to Open WebUI - ChatGPT-like UI, API and CLI -- 2025-11-03
- Looking for a RAG UI manager to meet our needs to replace Zapier -- 2025-11-03
- LiquidAI/LFM2-1.2B-Extract -- 2025-11-03
- DeepSeek may have found a new way to improve AI’s ability to remember -- 2025-11-02
- Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection -- 2025-11-02
- OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) -- 2025-10-31
- briaai/FIBO -- 2025-10-31
- snowyfizz/Vision-Detection-API -- 2025-10-29
- Show HN: Apache Fory Rust – 10-20x faster serialization than JSON/Protobuf -- 2025-10-29
- huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning -- 2025-10-29
- AlphaXiv,Compare the Deepseek-OCR and Mistral-OCR OCR models -- 2025-10-26
- Open-Bee/Bee-8B-RL -- 2025-10-26
- datalab-to/chandra -- 2025-10-26
- Unlock the power of images with AI Sheets -- 2025-10-26
- Picture in Picture / Webcam detect model on HuggingFace -- 2025-10-25
- Show HN: Story Keeper – AI agents with narrative continuity instead of memory -- 2025-10-25
- Show HN: Deta Surf – An open source and local-first AI notebook -- 2025-10-25
- Show HN: Tommy – Turn ESP32 devices into through-wall motion sensors -- 2025-10-25
- 20x Max Plan (€216) takes 2% of weekly Opus usage for a single Deep Research Query. That equals 50 per week if you use it ONLY for this and never continue or respond -- 2025-10-24
- I got tired of OpenAI dependency. Built a multi-LLM control center instead. -- 2025-10-19
- Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
- Claude Code taking a coffee break 🤔 -- 2025-10-19
- Show HN: Cmux – Coding Agent Multiplexer -- 2025-10-19
- [Editorial] Agentic Orchestration -- 2025-10-19
- Chicken Squisher 3000: Squish-Proof Security -- 2025-10-19
- Show HN: Largest open-source multimodal AI dataset -- 2025-10-18
- KORMo-Team/KORMo-10B-sft -- 2025-10-18
- ByteDance/FaceCLIP -- 2025-10-18
- Do you use multiple AI models for coding? Trying to validate a workflow problem -- 2025-10-17
- The “Compounding Engineering” mindset changed how I think about AI coding tools -- 2025-10-17
- Show HN: Rebuilt Bible search app to run 100% client-side with Transformers.js -- 2025-10-13
- swiss-ai/Apertus-8B-Instruct-2509 -- 2025-10-13
- A Childhood Dream, Created and Open Sourced -- 2025-10-13
- Some small tools for you - Ollama Managment UI, Passkey authentication proxy -- 2025-10-12
- Single prompt I run after git commit (before push) for AI diff/commit review -- 2025-10-12
- Show HN: Gitcasso – Syntax Highlighting and Draft Recovery for GitHub Comments -- 2025-10-12
- simonw/claude-skills -- 2025-10-12
- Custom models don't work after v0.6.33 update - Anyone else? -- 2025-10-12
- Pardus AI: Open source AI Assistant thanks for the help with Ollama -- 2025-10-09
- What to use for refactoring -- 2025-10-09
- Claude Code finally has a planning partner — I built an AI backlog manager for solo devs -- 2025-10-09
- Granite4 Small-h 32b-A9b (Q4_K_M) at FULL 1M context window is using only 73GB of VRAM - Life is good! -- 2025-10-09
- Run Open AI GPT-OSS on a mobile phone (Demo) -- 2025-10-09
- AI21 releases Jamba 3B, the tiny model outperforming Qwen 3 4B and IBM Granite 4 Micro! -- 2025-10-09
- inclusionAI/Ling-mini-2.0 -- 2025-10-09
- [Editorial] The Tiny Recursive Mode -- 2025-10-08
- deepseek-ai/DeepSeek-V3.1-Terminus -- 2025-10-08
- Behavioral Modification Systems in Large Language Models: A Methodological Analysis of Long Conversation Reminders -- 2025-10-08
- Ring Flash 2.0 104B A6B with Linear Attention released a few days ago -- 2025-10-07
- Bring Your Own Data (BYOD) -- 2025-09-30
- CohereLabs/command-a-reasoning-08-2025 -- 2025-09-30
- Built an MCP server for Claude Desktop to browse Reddit in real-time -- 2025-09-30
- MetalQwen3: Full GPU-Accelerated Qwen3 Inference on Apple Silicon with Metal Shaders – Built on qwen3.c - WORK IN PROGRESS -- 2025-09-28
- yangdongchao/UniAudio2 -- 2025-09-28
- jimsweb/aiMIDI -- 2025-09-28
- Handy – Free open-source speech-to-text app written in Rust -- 2025-09-28
- IndexTeam/IndexTTS-2 -- 2025-09-28
- beankeji-cloud/SLiteIO -- 2025-09-28
- GitHub - shantur/jarvis-mcp: Bring your AI to life—talk to assistants instantly in your browser. Zero hasle, No API keys, No Whisper -- 2025-09-27
- PC memory costs to climb as fabs chase filthy lucre in servers and HBM -- 2025-09-27
- AI and licensing (commercial use) -- 2025-09-26
- I trained an LLM from scratch AMA! -- 2025-09-26
- pengzhangzhi/Open-dLLM -- 2025-09-26
- SWE-Bench Pro -- 2025-09-23
- Investigating Training Data Detection in AI Coders -- 2025-09-23
- A first stab at packaging llama.cpp in a performance-optimized manner -- 2025-09-23
- Model: Qwen3 Next Pull Request llama.cpp -- 2025-09-23
- Unobtanium No More; Perhaps We Already Have All The Elements We Need -- 2025-09-21
- support for the upcoming Olmo3 model has been merged into llama.cpp -- 2025-09-21
- Running Nvidia CUDA Pytorch/vLLM projects and pipelines on AMD with no modifications -- 2025-09-21
- A Quick Look At The AMD Instinct MI355X With ROCm 7.0 -- 2025-09-21
- Uncensored AI model for from 4b Max 8b -- 2025-09-21
- facebook/MobileLLM-R1-950M -- 2025-09-20
- OpenGVLab/InternVL3_5-241B-A28B -- 2025-09-20
- KBlueLeaf/HDM-xut-340M-anime -- 2025-09-20
- GPT-OSS:20b & Qwen 4b are a match made in heaven for 24GB VRAM builds -- 2025-09-18
- Was working in RAG recently got to know how well Gemma3 4B performs -- 2025-09-18
- ROCm 6.4.3 -> 7.0-rc1 after updating got +13.5% at 2xR9700 -- 2025-09-18
- Is it possible for different brand GPUs to work together? -- 2025-09-18
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts -- 2025-09-18
- LFM2-1.2B safety benchmark -- 2025-09-18
- Has anyone successfully gotten Ollama models (or any models) to execute SQL queries through natural language in Openwebui? -- 2025-09-18
- yangzhou24/OmniWorld -- 2025-09-18
- Analog Optical Computer for Inference and Combinatorial Optimization -- 2025-09-18
- Running Qwen-Next (Instruct and Thinking) MLX BF16 with MLX-LM on Macs -- 2025-09-17
- [Editorial] REFRAG: Rethinking RAG based Decoding -- 2025-09-17
- The Practicality Of Solar Powered Meshtastic -- 2025-09-17
- Kwai-Klear/Klear-46B-A2.5B-Instruct -- 2025-09-16
- WestZhang/VibeVoice-Large-pt -- 2025-09-16
- FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference -- 2025-09-16
- Qwen 3 Next Series – Qwen/Qwen3 Next 80B A3B Instruct Detected -- 2025-09-16
- Test-time Prompt Intervention -- 2025-09-16
- [Editorial] Tricks from OpenAI gpt-oss YOU can use with transformers -- 2025-09-15
- openbmb/MiniCPM4.1-8B -- 2025-09-15
- nunchaku-tech/nunchaku-qwen-image -- 2025-09-15
- ggml-org/gpt-oss-20b-GGUF -- 2025-09-15
- MBZUAI releases K2 Think. 32B reasoning model based on Qwen 2.5 32B backbone, focusing on high performance in math, coding and science. -- 2025-09-14
- unsloth/Qwen3-Coder-30B-A3B-Instruct-1M-GGUF -- 2025-09-14
- Effecient hot-swappable LoRA variant supported in llama.cpp -- 2025-09-12
- Qwen/Qwen3-Next-80B-A3B-Instruct -- 2025-09-12
- swiss-ai/Apertus-70B-Instruct-2509 -- 2025-09-12
- GRASPED: Graph Anomaly Detection using Autoencoder with Spectral Encoder and Decoder (Full Version) -- 2025-09-12
- stepfun-ai/step3 -- 2025-09-12
- Exploring State-Space-Model based Language Model in Music Generation -- 2025-09-12
- HuggingFaceModelDownloader v2.0 — fast resume, a slick TUI, and powerful filters for GGUF/variants -- 2025-09-12
- Roo Code Cloud is here with Task Sync & Roomote Control || Roo Code 3.28.0 Release Notes -- 2025-09-12
- Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers -- 2025-09-12
- Made a one-click SearXNG fork with Redis, plus Dockerized Tika+OCR, and soon: local TTS/STT on Intel iGPU + AMD NPU -- 2025-09-12
- Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search -- 2025-09-10
- Introducing FineVision: a huge open-source dataset for training SOTA Vision Language Models -- 2025-09-10
- wildminder/ComfyUI-VibeVoice -- 2025-09-10
- bytedance/USO -- 2025-09-10
- Wan-AI/Wan2.2-I2V-A14B -- 2025-09-10
- YanoljaNEXT-Rosetta: A Collection of Translation Models in Different Sizes -- 2025-09-07
- nasa-ibm-ai4science/Surya-1.0 -- 2025-09-06
- Vulkan back ends, what do you use? -- 2025-09-06
- A new OpenAI model? Could this be 5.1 or 5o? What do you think? -- 2025-09-06
- haasonsaas/dspy-0to1-guide -- 2025-09-06
- 16 reproducible failures → upgraded into a 300+ page Global Fix Map. one link inside, feedback wanted -- 2025-09-06
- VibeVoice RIP? What do you think? -- 2025-09-05
- lodestones/Chroma1-HD -- 2025-09-05
- Welcome EmbeddingGemma, Google's new efficient embedding model -- 2025-09-05
- NousResearch/Hermes-4-405B -- 2025-09-05
- After deepseekv3 I feel like other MoE architectures are old or outdated. Why did Qwen chose a simple MoE architecture with softmax routing and aux loss for their Qwen3 models when there’s been better architectures for a while? -- 2025-09-02
- The Hacker's Guide to Building an AI Supercluster -- 2025-09-02
- CAD, From Scratch: MakerCAD -- 2025-09-02
- PSO-Merging: Merging Models Based on Particle Swarm Optimization -- 2025-09-02
- Coral-Protocol/Anemoi -- 2025-09-01
- After researchers unmasked a prolific SMS scammer, a new operation has emerged -- 2025-09-01
- Silent No More: Open-Source Fix for Mic Mishaps -- 2025-09-01
- How to reliably detect cross-listed job ads across multiple sites? -- 2025-09-01
- GPT OSS Fine-tuning QAT -- 2025-08-31
- Semantic Structure in Large Language Model Embeddings -- 2025-08-31
- facebook/dinov3-vit7b16-pretrain-lvd1689m -- 2025-08-31
- MizzenAI/HPSv3 -- 2025-08-31
- Some thoughts on LLMs and software development -- 2025-08-30
- Show HN: Sideko – Hybrid deterministic/LLM generator for API SDKs and docs -- 2025-08-30
- [Editorial] AI interfaces for future -- 2025-08-29
- I’ve Debugged 100+ RAG/LLM Pipelines. These 16 Bugs Always Come Back. (70 days, 800 stars) -- 2025-08-29
- Updates to Consumer Terms and Privacy Policy -- 2025-08-29
- Hierarchical Reasoning Model (HRM) implementation for text generation -- 2025-08-27
- SQLite-Vector adds support for float16 and bfloat16 (CPU, NEON, AVX2 and SSE2) -- 2025-08-26
- zai-org/GLM-4.5 -- 2025-08-26
- rednote-hilab/dots.vlm1.inst -- 2025-08-26
- black-forest-labs/FLUX.1-Krea-dev -- 2025-08-26
- Datarus-R1-14B-Preview, an adaptive multi-step reasoning LLM for automated data analysis -- 2025-08-24
- Fully Open source, serverless, community-driven MCP alternative built in Python, TS and Go -- 2025-08-24
- unsloth/Kimi-K2-Instruct-GGUF -- 2025-08-24
- DeepSeek V3.1 Reasoner improves over DeepSeek R1 on the Extended NYT Connections benchmark -- 2025-08-24
- Qwen3-30B-A3B-Instruct 2507 vs Qwen3-Coder Flash -- 2025-08-23
- Your model zoo for Software dev / webdev -- 2025-08-23
- sriniously/go-boilerplate -- 2025-08-23
- Design Patterns in MCP: Literate Reasoning -- 2025-08-22
- FlyMyAI/flymyai-lora-trainer -- 2025-08-22
- Zedless: Zed fork focused on privacy and being local-first -- 2025-08-22
- Show HN: Rucat – Cat for Prompt Engineers -- 2025-08-22
- NEW VERSION: 0.6.23 Has Just Released! - Many fixes and new features, huge changelog -- 2025-08-22
- Docker Model Runner is really neat -- 2025-08-20
- Build a Powerful RAG Web Scraper with Ollama and LangChain -- 2025-08-20
- Ollama interface with memory -- 2025-08-20
- YuminosukeSato/pyproc -- 2025-08-20
- AGENTS.md – Open format for guiding coding agents -- 2025-08-20
- NVIDIA Nemotron Nano 2 and the Nemotron Pretraining Dataset v1 -- 2025-08-20
- deepseek-ai/DeepSeek-V3.1-Base · Hugging Face -- 2025-08-20
- mistralai/Devstral-Small-2507 -- 2025-08-20
- zai-org/GLM-4.5-Air -- 2025-08-20
- microsoft/Phi-4-mini-flash-reasoning -- 2025-08-20
- ModelTC/Qwen-Image-Lightning -- 2025-08-20
- Fast Type-Aware Linting in Oxlint -- 2025-08-20
- GPT-5, where does it shine for you? -- 2025-08-19
- vidore/colqwen-omni-v0.1 -- 2025-08-19
- Tutorial: Open WebUI and llama-swap works great together! Demo of setup, model swapping and activity monitoring. -- 2025-08-18
- Concurrency in open-weight/open-source models? -- 2025-08-18
- [Editorial] Claude Flow, Alpha 90 release -- 2025-08-17
- Optimizing Text gen webui (oobabooga) for MOE models (Qwen3-235b, GLM 4.5) -- 2025-08-17
- Chen-zexi/vllm-cli -- 2025-08-17
- zyfoxx/subhunter -- 2025-08-17
- HuggingFaceTB/SmolLM3-3B-Base -- 2025-08-16
- mistralai/Voxtral-Small-24B-2507 -- 2025-08-16
- TiTan - a tiny model for tags and titles -- 2025-08-16
- baidu/ERNIE-4.5-VL-424B-A47B-PT -- 2025-08-16
- MiniLM (BERT) embeddings in C from scratch -- 2025-08-16
- GENNAI CLI - A ReAct-based agent CLI -- 2025-08-16
- gptme v0.28.0 major release - agent CLI with local model support -- 2025-08-16
- WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling -- 2025-08-16
- character-ai/pipelining-sft -- 2025-08-15
- Compass-Thinker-7B Technical Report -- 2025-08-15
- Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning -- 2025-08-15
- SaaS Is Dead -- 2025-08-15
- PYX: The next step in Python packaging -- 2025-08-15
- [Editorial] GLM-4.5, enterprise use -- 2025-08-13
- Best local model with function calling? -- 2025-08-13
- agentica-org/DeepSWE-Preview -- 2025-08-13
- janhq/Jan-v1-4B-GGUF -- 2025-08-13
- gpt-oss jailbreak workflow -- 2025-08-11
- GPT-5 removed logprob support from the API - technical breakdown and implications -- 2025-08-11
- A model for pure text continuation (not chirpy little Q&A assistant)? -- 2025-08-11
- Suggestion for upgrading hardware for MOE inference and fine-tuning. -- 2025-08-09
- Best models under 16GB?? -- 2025-08-09
- Explicit tail calls are now available on Rust Nightly (become keyword) -- 2025-08-09
- HuggingFaceTB/SmolLM3-3B -- 2025-08-09
- mistralai/Magistral-Small-2507 -- 2025-08-09
- N8N + OpenWebUI -- 2025-08-08
- Create space-saving clones on macOS with Python -- 2025-08-08
- Experience with GLM-4.5-Air + claude code? -- 2025-08-08
- Just when you thought Qwen was done... -- 2025-08-08
- Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs -- 2025-08-08
- Context Management by Trimming Conversation -- 2025-08-06
- Exploiting Primacy Effect To Improve Large Language Models -- 2025-08-06
- Help: Qwen3-Coder + LM Studio + Continue.dev (VSCode) + Mac 64GB M3 Max — 500 Internal Server Error, Even After Unsloth Fix -- 2025-08-04
- CWC now supports kimi.com (K2) and chat.z.ai (GLM-4.5) to enable coding with top tier models at no cost -- 2025-08-04
- 🧠 ICM+DPO: Used Qwen3's coherent understanding to improve Gemma3 at math - cross-model capability transfer with zero supervision -- 2025-08-03
- NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining -- 2025-08-03
- Build an AI Shopping Assistant with Gradio MCP Servers -- 2025-08-01
- [Editorial] Alternative to vector db rag -- 2025-07-30
- This year’s best open-source models and most cost-effective models -- 2025-07-29
- I’m looking for multimodal image input support and uncensored LLM -- 2025-07-29
- nvidia/audio-flamingo-3 -- 2025-07-29
- mistralai/Voxtral-Mini-3B-2507 -- 2025-07-29
- ziangcao0312/PhysX-3D -- 2025-07-29
- UI/UX benchmark update 7/22: Newest Qwen models added, Qwen3 takes the lead in terms of win rate (though still early) -- 2025-07-28
- unsloth/Qwen3-235B-A22B-Thinking-2507-GGUF -- 2025-07-28
- zai-org/GLM-4.5 -- 2025-07-28
- Tesslate/UIGEN-X-32B-0727 -- 2025-07-28
- had to fine-tune qwen since llama sucks at summarizing -- 2025-07-28
- orchestre-dev/ccproxy -- 2025-07-28
- [Editorial] neural networks don’t need to be giant to be powerful -- 2025-07-27
- Qwen/Qwen3-235B-A22B-Thinking-2507 -- 2025-07-27
- mistralai/Magistral-Small-2507 -- 2025-07-27
- cmdaltctr/claude-gemini-mcp-slim -- 2025-07-26
- RichardAtCT/claude-code-openai-wrapper -- 2025-07-26
- Freigeist - The new Vibe Coding Platform -- 2025-07-26
- From chaotic prompting to structured workflow: My Claude evolution -- 2025-07-24
- A Request for Comments (RFC) for MCP-alternative Universal Tool Calling Protocol (UTCP) was created -- 2025-07-22
- How to use the same context across LLMs and Agents -- 2025-07-22
- Has anyone actually ran VLAs locally and how good are they? -- 2025-07-22
- google/medsiglip-448 -- 2025-07-22
- Questions about AI for translation -- 2025-07-22
- microsoft/Phi-4-mini-flash-reasoning -- 2025-07-22
- LGAI-EXAONE/EXAONE-4.0-32B -- 2025-07-22
- Replacing thinking with tool usage enables reasoning in small language models -- 2025-07-22
- Struggling to Generate Polished UI with Claude Code -- 2025-07-20
- Skywork/Skywork-R1V3-38B -- 2025-07-20
- ByteDance-Seed/Seed-X-PPO-7B -- 2025-07-20
- Diffusion model support in llama.cpp. -- 2025-07-16
- GLM-4 MoE incoming -- 2025-07-16
- GeoArrow and GeoParquet, and the Future of Geospatial Data Analysis -- 2025-07-15
- Programming Affordances That Invite Mistakes -- 2025-07-14
- HuggingFaceTB/SmolLM3-3B-Base -- 2025-07-14
- OLMo 2 - a family of fully-open language models -- 2025-07-12
- google/videoprism -- 2025-07-12
- Why don’t we have a big torrent repo for open-source LLMs? -- 2025-07-12
- Local PDF Database searchable with ollama - best setup? -- 2025-07-12
- Tinyllama on old Mediatek G80 android device -- 2025-07-12
- I used Ollama to build a Cursor for PDFs -- 2025-07-12
- Advice on switching to LLM -- 2025-07-12
- Building the Hugging Face MCP Server -- 2025-07-11
- Support for the upcoming IBM Granite 4.0 has been merged into llama.cpp -- 2025-07-11
- support for Falcon-H1 model family has been merged into llama.cpp -- 2025-07-11
- fsndzomga/metadspy -- 2025-07-10
- osmosis-ai/Osmosis-Apply-1.7B -- 2025-07-10
- IntervitensInc/pangu-pro-moe-model -- 2025-07-10
- Continual Gradient Low-Rank Projection Fine-Tuning for LLMs -- 2025-07-10
- pola-rs/polars -- 2025-07-09
- Introducing an open source cross-platform graphical interface LLM client -- 2025-07-08
- llama-server vs llama python binding -- 2025-07-08
- Llama server completion not working correctly -- 2025-07-08
- introducing cocoindex - super simple etl to prepare data for ai, with dynamic index (ollama integrated) -- 2025-07-08
- Higher topk and num_ctx or map/reduce ? -- 2025-07-08
- Looking for an upgrade from Meta-Llama-3.1-8B-Instruct-Q4_K_L.gguf, especially for letter parsing. Last time I looked into this was a very long time ago (7 months!) What are the best models nowadays? -- 2025-07-08
- Best models by size? -- 2025-07-08
- Planning a 7–8B Model Benchmark on 8GB GPU — What Should I Test & Measure? -- 2025-07-08
- DLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching -- 2025-07-08
- Efficient MultiModal Data Pipeline -- 2025-07-08
- Are non-autoregressive models really faster than autoregressive ones after all the denoising steps? -- 2025-07-06
- Yuan-ManX/ComfyUI-OmniGen2 -- 2025-07-06
- Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 -- 2025-07-06
- Using local models with Void -- 2025-07-05
- Intel GPU vLLM Docker Compose Bootstrap with Phi-lthy4 on A770 -- 2025-07-05
- Accelerated LLM Inference on AMD Instinct™ GPUs with vLLM 0.9.x and ROCm -- 2025-07-05
- Gemma 3n fully available in the open-source ecosystem! -- 2025-07-05
- 5060ti 16gb or 9060xt 16gb for small llm server -- 2025-07-05
- Qwen3 models in MLX format! -- 2025-07-05
- Best local coding model right now? -- 2025-07-05
- Found a Web3 LLM That Actually Gets DeFi Right -- 2025-07-03
- apple/DiffuCoder-7B-cpGRPO -- 2025-07-03
- Hoshinonyaruko/Gensokyo-MCP -- 2025-07-01
- ahmadallobani/BaldHead -- 2025-06-29
- modelcontextprotocol/registry -- 2025-06-27
- 0-concordance of knotted surfaces and Alexander ideals -- 2025-06-26
- unfinishedtr/progressbar -- 2025-06-24
- wearyfurnitur/confiq -- 2025-06-24
- Show HN: Controlling 3D models with voice and hand gestures -- 2025-06-24
- open-webui/mcpo -- 2025-06-23
- Built a fully local Whisper + pyannote stack to replace Otter. Full diarisation, transcripts & summaries on GPU. -- 2025-06-23
- MiniMax latest open-sourcing LLM, MiniMax-M1 — setting new standards in long-context reasoning,m -- 2025-06-23
- Run qwen 30b-a3b on Android local with Alibaba MNN Chat -- 2025-06-23
- A new PDF translation tool -- 2025-06-23
- What Really Happens When You Ask a Cursor a Question with GitHub MCP Integrated -- 2025-06-23
- [Q] How to Speed Up Mistral 7B Inference in LM Studio? 31s/Chunk on RTX 3070 -- 2025-06-23
- Cyber security guys are about to become very on demand in the coming few years -- 2025-06-23
- The first big AI disaster is yet to happen -- 2025-06-23
- Trading with Claude, and writing your own MCP server -- 2025-06-23
- Ecne AI Podcast Generator - Update -- 2025-06-23
- Help me decide on hardware for LLMs -- 2025-06-23
- chungmin99/pyroki -- 2025-06-23
- nvidia/GR00T-N1.5-3B -- 2025-06-23
- nvidia/Cosmos-Predict2-2B-Text2Image -- 2025-06-22
- trendmicro/vision-one-mcp-server -- 2025-06-20
- Minidoracat/mcp-feedback-enhanced -- 2025-06-20
- meta-llama/Llama-3.1-8B-Instruct -- 2025-06-19
- MiniMaxAI/MiniMax-M1-80k -- 2025-06-19
- ckanthony/openapi-mcp -- 2025-06-18
- Ta0ing/MCP-SecurityTools -- 2025-06-18
- fluxions/vui -- 2025-06-13
- carbon-language/carbon-lang -- 2025-06-12
- CJackHwang/AIstudioProxyAPI -- 2025-06-11
- News publishers call Google's AI Mode 'theft' -- 2025-06-11
- typelevel/cats -- 2025-06-11
- wesm/pydata-book -- 2025-06-11
- jedisct1/openapi-mcp -- 2025-06-09
- arcee-ai/Homunculus -- 2025-06-04
- GitHub MCP exploited: Accessing private repositories via MCP -- 2025-06-01
- dipampaul17/KVSplit -- 2025-06-01
- facebook/OMol25 -- 2025-05-30
- Is Microsoft’s new Foundry Local going to be the “easy button” for running newer transformers models locally? -- 2025-05-28
- Cobolt is now available on Linux! 🎉 -- 2025-05-28
- Round Up: Current Best Local Models under 40B for Code & Tool Calling, General Chatting, Vision, and Creative Story Writing. -- 2025-05-28
- Best local model for M2 16gb MacBook Air for Analyzing Transcripts -- 2025-05-28
- Prompt Debugging -- 2025-05-28
- Not so Smart Agent (Ollama, Spring AI, MCP) -- 2025-05-28
- [Q] How can one get better at fixing models,training etc.? -- 2025-05-28
- FOSS - MCP Server generator from OpenAPI specification files (swagger/etapi) -- 2025-05-28
- Digital Payment System GNU Taler Gets Green Light to Operate in Switzerland -- 2025-05-28