Voice & Audio

Text-to-speech, speech recognition, voice cloning

81 articles across 38 editions

Articles

  1. k2-fsa/OmniVoice — High-Quality Voice Cloning TTS for 600+ Languages -- 2026-04-16
  2. Show HN: Sub-500ms latency voice agent from scratch -- 2026-03-05
  3. PKU-YuanGroup/Helios: Real Real-Time Long Video Generation Model -- 2026-03-04
  4. StyleStream: Real-Time Zero-Shot Voice Style Conversion -- 2026-03-04
  5. KokoClone: Kokoro TTS, but it clones voices now -- 2026-03-04
  6. Speech to text via LLM -- 2026-01-16
  7. kyutai-labs/pocket-tts -- 2026-01-16
  8. zai-org/GLM-ASR-Nano-2512 -- 2025-12-12
  9. zai-org/GLM-TTS -- 2025-12-11
  10. openbmb/VoxCPM1.5 -- 2025-12-11
  11. MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark -- 2025-12-04
  12. nvidia/parakeet_realtime_eou_120m-v1 -- 2025-12-03
  13. Qwen/Qwen3-VL-4B-Instruct -- 2025-11-20
  14. Soul-AILab/SoulX-Podcast-1.7B -- 2025-11-20
  15. Last week in Multimodal AI - Local Edition -- 2025-11-12
  16. pnnbao97/VieNeu-TTS -- 2025-11-12
  17. FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation -- 2025-11-12
  18. GLaDOS TTS finetuning on MLX from the original game files -- 2025-11-04
  19. zeusftk/FTK_CANVAS_AGENT_for_Comfyui -- 2025-11-04
  20. guyyariv/DyPE -- 2025-11-04
  21. Esonhugh/go-rex-java -- 2025-10-27
  22. SuperSonic – SuperCollider's audio engine in a Web AudioWorklet -- 2025-10-27
  23. 3-way FTP: Pushing files around with silly and unusual methods -- 2025-10-27
  24. HRV Gets Home Automation Upgrades -- 2025-10-27
  25. Open source streaming STT (Parakeet + Silero + Pipecat Smart Turn) -- 2025-10-19
  26. Turn ChatGPT into a real-time meeting assistant (via MCP + Apps SDK) -- 2025-10-19
  27. BASICODE: A Bit Like Java, But From The 1980s -- 2025-10-18
  28. Audio transcription with llama.cpp multimodal -- 2025-10-18
  29. I built a fully automated AI podcast generator that connects to ollama -- 2025-10-18
  30. Chinny (iOS/MacOS): offline, on-device voice cloning with an optimized Chatterbox model -- 2025-10-12
  31. herimor/voxtream -- 2025-10-12
  32. microsoft/VibeVoice-Large -- 2025-10-12
  33. chetwinlow1/Ovi -- 2025-10-12
  34. Phr00t/Qwen-Image-Edit-Rapid-AIO -- 2025-10-12
  35. kyomber/CVE-2025-8088 -- 2025-10-08
  36. This Week in Security: CVSS 0, Chwoot, and Not in the Threat Model -- 2025-10-08
  37. I created the cheapest possible AI voice agent (over 30x less expensive than Elevenlabs and OpenAI Realtime). Check out the Github repo below if you want to try it for yourself! -- 2025-10-07
  38. MaximeRivest/maivi -- 2025-10-07
  39. nineninesix/kani-tts-370m -- 2025-10-07
  40. We just open-sourced Kroko ASR: a fast, streaming alternative to Whisper. It’s early days, we’d love testers, feedback, and contributors. -- 2025-10-04
  41. Chaos96/NTPP -- 2025-09-27
  42. We made a new AI interface that is compatible with Ollama -- 2025-09-24
  43. if-ai/ComfyUI_HunyuanVideoFoley -- 2025-09-24
  44. Show HN: Inferencer – Run and deeply control local AI models (macOS release) -- 2025-09-24
  45. tencent/HunyuanWorld-Voyager -- 2025-09-24
  46. FireRedTeam/FireRedTTS2 -- 2025-09-24
  47. OpenBMB/VoxCPM -- 2025-09-22
  48. voicepowered-ai/VibeVoice-finetuning -- 2025-09-22
  49. Why is the name of a wireless mouse hard-coded into Windows Bluetooth drivers? -- 2025-09-17
  50. Qwen3-Coder-480B Q2_K_XL same speed as Qwen3-235b-instruct Q3_K_XL WHY? -- 2025-09-09
  51. Renting GPUs is hilariously cheap -- 2025-09-09
  52. Ex-Miner Turned Local LLM Enthusiast, now I have a Dilemma -- 2025-09-09
  53. Tencent-Hunyuan/HunyuanWorld-Voyager -- 2025-09-09
  54. Smartphone Sensors Unlocked: Turn Your Phone into a Physics Lab -- 2025-09-08
  55. UniSLU: Unified Spoken Language Understanding from Heterogeneous Cross-Task Datasets -- 2025-09-08
  56. Voice cloning -- 2025-09-08
  57. TencentARC/ToonComposer -- 2025-09-04
  58. MeiGen-AI/InfiniteTalk -- 2025-09-04
  59. RELEASED: ComfyUI Wrapper for Microsoft’s new VibeVoice TTS (voice cloning in seconds) -- 2025-09-03
  60. High-Logic/Genie -- 2025-09-03
  61. Has someone used OWebUi with Docling to talk to pdfs with visualizations? -- 2025-09-01
  62. THU-BPM/Omni-SafetyBench -- 2025-09-01
  63. AIDC-AI/Ovis2.5-9B -- 2025-09-01
  64. TTS VibeVoice FastAPI -- 2025-08-30
  65. Microsoft VibeVoice TTS : Open-Sourced, Supports 90 minutes speech, 4 distinct speakers at a time -- 2025-08-29
  66. tencent/HunyuanVideo-Foley -- 2025-08-29
  67. Made Chatterbox TTS a bit faster again on CUDA (155it/s on 3090) -- 2025-08-25
  68. KittenML/KittenTTS -- 2025-08-25
  69. Kitten TTS Web Demo -- 2025-08-09
  70. Show HN: I built a tool to replace capcut audio transcription -- 2025-08-09
  71. Whispers From The Void, Transcribed With AI -- 2025-08-09
  72. kyutai/tts-voices -- 2025-08-08
  73. Explore KittenTTS with Gradio: Easy Text-to-Speech model -- 2025-08-06
  74. Introcuding KokoroDoki a Local, Open-Source and Real-Time TTS. -- 2025-07-19
  75. Voxtral – Frontier open source speech understanding models -- 2025-07-19
  76. AI can now translate brain scans to text -- 2025-07-19
  77. Suggestions to build local voice assistant -- 2025-07-03
  78. google/gemma-3n-E4B -- 2025-07-03
  79. openai/whisper-large-v3 -- 2025-06-23
  80. Audio-Foundation-Models/ConversationTTS -- 2025-06-18
  81. ResembleAI/chatterbox -- 2025-06-06