Voice & Audio

Text-to-speech, speech recognition, voice cloning

80 articles across 37 editions

Articles

  1. Show HN: Sub-500ms latency voice agent from scratch -- 2026-03-05
  2. PKU-YuanGroup/Helios: Real Real-Time Long Video Generation Model -- 2026-03-04
  3. StyleStream: Real-Time Zero-Shot Voice Style Conversion -- 2026-03-04
  4. KokoClone: Kokoro TTS, but it clones voices now -- 2026-03-04
  5. Speech to text via LLM -- 2026-01-16
  6. kyutai-labs/pocket-tts -- 2026-01-16
  7. zai-org/GLM-ASR-Nano-2512 -- 2025-12-12
  8. zai-org/GLM-TTS -- 2025-12-11
  9. openbmb/VoxCPM1.5 -- 2025-12-11
  10. MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
  11. nvidia/parakeet_realtime_eou_120m-v1 -- 2025-12-03
  12. Qwen/Qwen3-VL-4B-Instruct -- 2025-11-20
  13. Soul-AILab/SoulX-Podcast-1.7B -- 2025-11-20
  14. Last week in Multimodal AI - Local Edition -- 2025-11-12
  15. pnnbao97/VieNeu-TTS -- 2025-11-12
  16. FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation -- 2025-11-12
  17. GLaDOS TTS finetuning on MLX from the original game files -- 2025-11-04
  18. zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
  19. guyyariv/DyPE -- 2025-11-04
  20. Esonhugh/go-rex-java -- 2025-10-27
  21. SuperSonic – SuperCollider's audio engine in a Web AudioWorklet -- 2025-10-27
  22. 3-way FTP: Pushing files around with silly and unusual methods -- 2025-10-27
  23. HRV Gets Home Automation Upgrades -- 2025-10-27
  24. Open source streaming STT (Parakeet + Silero + Pipecat Smart Turn) -- 2025-10-19
  25. Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
  26. BASICODE: A Bit Like Java, But From The 1980s -- 2025-10-18
  27. Audio transcription with llama.cpp multimodal -- 2025-10-18
  28. I built a fully automated AI podcast generator that connects to ollama -- 2025-10-18
  29. Chinny (iOS/MacOS): offline, on-device voice cloning with an optimized Chatterbox model -- 2025-10-12
  30. herimor/voxtream -- 2025-10-12
  31. microsoft/VibeVoice-Large -- 2025-10-12
  32. chetwinlow1/Ovi -- 2025-10-12
  33. Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2025-10-12
  34. kyomber/CVE-2025-8088 -- 2025-10-08
  35. This Week in Security: CVSS 0, Chwoot, and Not in the Threat Model -- 2025-10-08
  36. I created the cheapest possible AI voice agent (over 30x less expensive than Elevenlabs and OpenAI Realtime). Check out the Github repo below if you want to try it for yourself! -- 2025-10-07
  37. MaximeRivest/maivi -- 2025-10-07
  38. nineninesix/kani-tts-370m -- 2025-10-07
  39. We just open-sourced Kroko ASR: a fast, streaming alternative to Whisper. It’s early days, we’d love testers, feedback, and contributors. -- 2025-10-04
  40. Chaos96/NTPP -- 2025-09-27
  41. We made a new AI interface that is compatible with Ollama -- 2025-09-24
  42. if-ai/ComfyUI_HunyuanVideoFoley -- 2025-09-24
  43. Show HN: Inferencer – Run and deeply control local AI models (macOS release) -- 2025-09-24
  44. tencent/HunyuanWorld-Voyager -- 2025-09-24
  45. FireRedTeam/FireRedTTS2 -- 2025-09-24
  46. OpenBMB/VoxCPM -- 2025-09-22
  47. voicepowered-ai/VibeVoice-finetuning -- 2025-09-22
  48. Why is the name of a wireless mouse hard-coded into Windows Bluetooth drivers? -- 2025-09-17
  49. Qwen3-Coder-480B Q2_K_XL same speed as Qwen3-235b-instruct Q3_K_XL WHY? -- 2025-09-09
  50. Renting GPUs is hilariously cheap -- 2025-09-09
  51. Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma -- 2025-09-09
  52. Tencent-Hunyuan/HunyuanWorld-Voyager -- 2025-09-09
  53. Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
  54. UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
  55. Voice cloning -- 2025-09-08
  56. TencentARC/ToonComposer -- 2025-09-04
  57. MeiGen-AI/InfiniteTalk -- 2025-09-04
  58. RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) -- 2025-09-03
  59. High-Logic/Genie -- 2025-09-03
  60. Has someone used OWebUi with Docling to talk to pdfs with visualizations? -- 2025-09-01
  61. THU-BPM/Omni-SafetyBench -- 2025-09-01
  62. AIDC-AI/Ovis2.5-9B -- 2025-09-01
  63. TTS VibeVoice FastAPI -- 2025-08-30
  64. Microsoft VibeVoice TTS : Open-Sourced, Supports 90 minutes speech, 4 distinct speakers at a time -- 2025-08-29
  65. tencent/HunyuanVideo-Foley -- 2025-08-29
  66. Made Chatterbox TTS a bit faster again on CUDA (155it/s on 3090) -- 2025-08-25
  67. KittenML/KittenTTS -- 2025-08-25
  68. Kitten TTS Web Demo -- 2025-08-09
  69. Show HN: I built a tool to replace capcut audio transcription -- 2025-08-09
  70. Whispers From The Void, Transcribed With AI -- 2025-08-09
  71. kyutai/tts-voices -- 2025-08-08
  72. Explore KittenTTS with Gradio: Easy Text-to-Speech model -- 2025-08-06
  73. Introcuding KokoroDoki a Local, Open-Source and Real-Time TTS. -- 2025-07-19
  74. Voxtral – Frontier open source speech understanding models -- 2025-07-19
  75. AI can now translate brain scans to text -- 2025-07-19
  76. Suggestions to build local voice assistant -- 2025-07-03
  77. google/gemma-3n-E4B -- 2025-07-03
  78. openai/whisper-large-v3 -- 2025-06-23
  79. Audio-Foundation-Models/ConversationTTS -- 2025-06-18
  80. ResembleAI/chatterbox -- 2025-06-06