Voice & Audio

Text-to-speech, speech recognition, voice cloning

76 articles across 35 editions

Articles

  1. Speech to text via LLM -- 2026-01-16
  2. kyutai-labs/pocket-tts -- 2026-01-16
  3. zai-org/GLM-ASR-Nano-2512 -- 2025-12-12
  4. zai-org/GLM-TTS -- 2025-12-11
  5. openbmb/VoxCPM1.5 -- 2025-12-11
  6. MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
  7. nvidia/parakeet_realtime_eou_120m-v1 -- 2025-12-03
  8. Qwen/Qwen3-VL-4B-Instruct -- 2025-11-20
  9. Soul-AILab/SoulX-Podcast-1.7B -- 2025-11-20
  10. Last week in Multimodal AI - Local Edition -- 2025-11-12
  11. pnnbao97/VieNeu-TTS -- 2025-11-12
  12. FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation -- 2025-11-12
  13. GLaDOS TTS finetuning on MLX from the original game files -- 2025-11-04
  14. zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
  15. guyyariv/DyPE -- 2025-11-04
  16. Esonhugh/go-rex-java -- 2025-10-27
  17. SuperSonic – SuperCollider's audio engine in a Web AudioWorklet -- 2025-10-27
  18. 3-way FTP: Pushing files around with silly and unusual methods -- 2025-10-27
  19. HRV Gets Home Automation Upgrades -- 2025-10-27
  20. Open source streaming STT (Parakeet + Silero + Pipecat Smart Turn) -- 2025-10-19
  21. Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
  22. BASICODE: A Bit Like Java, But From The 1980s -- 2025-10-18
  23. Audio transcription with llama.cpp multimodal -- 2025-10-18
  24. I built a fully automated AI podcast generator that connects to ollama -- 2025-10-18
  25. Chinny (iOS/MacOS): offline, on-device voice cloning with an optimized Chatterbox model -- 2025-10-12
  26. herimor/voxtream -- 2025-10-12
  27. microsoft/VibeVoice-Large -- 2025-10-12
  28. chetwinlow1/Ovi -- 2025-10-12
  29. Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2025-10-12
  30. kyomber/CVE-2025-8088 -- 2025-10-08
  31. This Week in Security: CVSS 0, Chwoot, and Not in the Threat Model -- 2025-10-08
  32. I created the cheapest possible AI voice agent (over 30x less expensive than Elevenlabs and OpenAI Realtime). Check out the Github repo below if you want to try it for yourself! -- 2025-10-07
  33. MaximeRivest/maivi -- 2025-10-07
  34. nineninesix/kani-tts-370m -- 2025-10-07
  35. We just open-sourced Kroko ASR: a fast, streaming alternative to Whisper. It’s early days, we’d love testers, feedback, and contributors. -- 2025-10-04
  36. Chaos96/NTPP -- 2025-09-27
  37. We made a new AI interface that is compatible with Ollama -- 2025-09-24
  38. if-ai/ComfyUI_HunyuanVideoFoley -- 2025-09-24
  39. Show HN: Inferencer – Run and deeply control local AI models (macOS release) -- 2025-09-24
  40. tencent/HunyuanWorld-Voyager -- 2025-09-24
  41. FireRedTeam/FireRedTTS2 -- 2025-09-24
  42. OpenBMB/VoxCPM -- 2025-09-22
  43. voicepowered-ai/VibeVoice-finetuning -- 2025-09-22
  44. Why is the name of a wireless mouse hard-coded into Windows Bluetooth drivers? -- 2025-09-17
  45. Qwen3-Coder-480B Q2_K_XL same speed as Qwen3-235b-instruct Q3_K_XL WHY? -- 2025-09-09
  46. Renting GPUs is hilariously cheap -- 2025-09-09
  47. Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma -- 2025-09-09
  48. Tencent-Hunyuan/HunyuanWorld-Voyager -- 2025-09-09
  49. Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
  50. UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
  51. Voice cloning -- 2025-09-08
  52. TencentARC/ToonComposer -- 2025-09-04
  53. MeiGen-AI/InfiniteTalk -- 2025-09-04
  54. RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) -- 2025-09-03
  55. High-Logic/Genie -- 2025-09-03
  56. Has someone used OWebUi with Docling to talk to pdfs with visualizations? -- 2025-09-01
  57. THU-BPM/Omni-SafetyBench -- 2025-09-01
  58. AIDC-AI/Ovis2.5-9B -- 2025-09-01
  59. TTS VibeVoice FastAPI -- 2025-08-30
  60. Microsoft VibeVoice TTS : Open-Sourced, Supports 90 minutes speech, 4 distinct speakers at a time -- 2025-08-29
  61. tencent/HunyuanVideo-Foley -- 2025-08-29
  62. Made Chatterbox TTS a bit faster again on CUDA (155it/s on 3090) -- 2025-08-25
  63. KittenML/KittenTTS -- 2025-08-25
  64. Kitten TTS Web Demo -- 2025-08-09
  65. Show HN: I built a tool to replace capcut audio transcription -- 2025-08-09
  66. Whispers From The Void, Transcribed With AI -- 2025-08-09
  67. kyutai/tts-voices -- 2025-08-08
  68. Explore KittenTTS with Gradio: Easy Text-to-Speech model -- 2025-08-06
  69. Introcuding KokoroDoki a Local, Open-Source and Real-Time TTS. -- 2025-07-19
  70. Voxtral – Frontier open source speech understanding models -- 2025-07-19
  71. AI can now translate brain scans to text -- 2025-07-19
  72. Suggestions to build local voice assistant -- 2025-07-03
  73. google/gemma-3n-E4B -- 2025-07-03
  74. openai/whisper-large-v3 -- 2025-06-23
  75. Audio-Foundation-Models/ConversationTTS -- 2025-06-18
  76. ResembleAI/chatterbox -- 2025-06-06