Voice & Audio
Text-to-speech, speech recognition, voice cloning
76 articles across 35 editions
Articles
- Speech to text via LLM -- 2026-01-16
- kyutai-labs/pocket-tts -- 2026-01-16
- zai-org/GLM-ASR-Nano-2512 -- 2025-12-12
- zai-org/GLM-TTS -- 2025-12-11
- openbmb/VoxCPM1.5 -- 2025-12-11
- MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
- nvidia/parakeet_realtime_eou_120m-v1 -- 2025-12-03
- Qwen/Qwen3-VL-4B-Instruct -- 2025-11-20
- Soul-AILab/SoulX-Podcast-1.7B -- 2025-11-20
- Last week in Multimodal AI - Local Edition -- 2025-11-12
- pnnbao97/VieNeu-TTS -- 2025-11-12
- FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation -- 2025-11-12
- GLaDOS TTS finetuning on MLX from the original game files -- 2025-11-04
- zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
- guyyariv/DyPE -- 2025-11-04
- Esonhugh/go-rex-java -- 2025-10-27
- SuperSonic – SuperCollider's audio engine in a Web AudioWorklet -- 2025-10-27
- 3-way FTP: Pushing files around with silly and unusual methods -- 2025-10-27
- HRV Gets Home Automation Upgrades -- 2025-10-27
- Open source streaming STT (Parakeet + Silero + Pipecat Smart Turn) -- 2025-10-19
- Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
- BASICODE: A Bit Like Java, But From The 1980s -- 2025-10-18
- Audio transcription with llama.cpp multimodal -- 2025-10-18
- I built a fully automated AI podcast generator that connects to ollama -- 2025-10-18
- Chinny (iOS/MacOS): offline, on-device voice cloning with an optimized Chatterbox model -- 2025-10-12
- herimor/voxtream -- 2025-10-12
- microsoft/VibeVoice-Large -- 2025-10-12
- chetwinlow1/Ovi -- 2025-10-12
- Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2025-10-12
- kyomber/CVE-2025-8088 -- 2025-10-08
- This Week in Security: CVSS 0, Chwoot, and Not in the Threat Model -- 2025-10-08
- I created the cheapest possible AI voice agent (over 30x less expensive than Elevenlabs and OpenAI Realtime). Check out the Github repo below if you want to try it for yourself! -- 2025-10-07
- MaximeRivest/maivi -- 2025-10-07
- nineninesix/kani-tts-370m -- 2025-10-07
- We just open-sourced Kroko ASR: a fast, streaming alternative to Whisper. It’s early days, we’d love testers, feedback, and contributors. -- 2025-10-04
- Chaos96/NTPP -- 2025-09-27
- We made a new AI interface that is compatible with Ollama -- 2025-09-24
- if-ai/ComfyUI_HunyuanVideoFoley -- 2025-09-24
- Show HN: Inferencer – Run and deeply control local AI models (macOS release) -- 2025-09-24
- tencent/HunyuanWorld-Voyager -- 2025-09-24
- FireRedTeam/FireRedTTS2 -- 2025-09-24
- OpenBMB/VoxCPM -- 2025-09-22
- voicepowered-ai/VibeVoice-finetuning -- 2025-09-22
- Why is the name of a wireless mouse hard-coded into Windows Bluetooth drivers? -- 2025-09-17
- Qwen3-Coder-480B Q2_K_XL same speed as Qwen3-235b-instruct Q3_K_XL WHY? -- 2025-09-09
- Renting GPUs is hilariously cheap -- 2025-09-09
- Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma -- 2025-09-09
- Tencent-Hunyuan/HunyuanWorld-Voyager -- 2025-09-09
- Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
- UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
- Voice cloning -- 2025-09-08
- TencentARC/ToonComposer -- 2025-09-04
- MeiGen-AI/InfiniteTalk -- 2025-09-04
- RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) -- 2025-09-03
- High-Logic/Genie -- 2025-09-03
- Has someone used OWebUi with Docling to talk to pdfs with visualizations? -- 2025-09-01
- THU-BPM/Omni-SafetyBench -- 2025-09-01
- AIDC-AI/Ovis2.5-9B -- 2025-09-01
- TTS VibeVoice FastAPI -- 2025-08-30
- Microsoft VibeVoice TTS : Open-Sourced, Supports 90 minutes speech, 4 distinct speakers at a time -- 2025-08-29
- tencent/HunyuanVideo-Foley -- 2025-08-29
- Made Chatterbox TTS a bit faster again on CUDA (155it/s on 3090) -- 2025-08-25
- KittenML/KittenTTS -- 2025-08-25
- Kitten TTS Web Demo -- 2025-08-09
- Show HN: I built a tool to replace capcut audio transcription -- 2025-08-09
- Whispers From The Void, Transcribed With AI -- 2025-08-09
- kyutai/tts-voices -- 2025-08-08
- Explore KittenTTS with Gradio: Easy Text-to-Speech model -- 2025-08-06
- Introcuding KokoroDoki a Local, Open-Source and Real-Time TTS. -- 2025-07-19
- Voxtral – Frontier open source speech understanding models -- 2025-07-19
- AI can now translate brain scans to text -- 2025-07-19
- Suggestions to build local voice assistant -- 2025-07-03
- google/gemma-3n-E4B -- 2025-07-03
- openai/whisper-large-v3 -- 2025-06-23
- Audio-Foundation-Models/ConversationTTS -- 2025-06-18
- ResembleAI/chatterbox -- 2025-06-06