Encrypted chats still leak topics

Published on November 14, 2025

Microsoft researchers detailed “Whisper Leak,” a side-channel that lets a passive adversary infer topics of encrypted, streaming LLM chats by analyzing packet sizes and inter-arrival timing. Trained classifiers (including CNNs, LSTMs, and Transformers) identified specific prompt topics with >98% scores for many models from Alibaba, DeepSeek, Mistral, Microsoft, OpenAI, and xAI; Google and Amazon services showed more resistance, likely due to token batching, but aren’t immune. Mitigations deployed by several providers include appending variable-length random text to mask token-length signals; users concerned about privacy should avoid sensitive prompts on untrusted networks, use VPNs, and consider non‑streaming modes where feasible (more: https://thehackernews.com/2025/11/microsoft-uncovers-whisper-leak-attack.html).

The broader lesson: security isn’t just about encryption, it’s also about traffic patterns. Whisper Leak joins prior timing and cache-based attacks, and its effectiveness grows with more samples over time. For enterprises, this is a governance issue—streaming is UX gold, but it increases metadata leakage risk. Providers are already rolling out mitigations, but teams should assume topic inference remains plausible for high-sensitivity use cases and design workflows accordingly.

Separately, Cisco’s AI Defense team reported that open‑weight models are highly susceptible to adversarial manipulation in multi‑turn conversations, with attack success rates between 25.86% and 92.78%—2x to 10x higher than single‑turn baselines. Capability‑focused models (e.g., Llama 3.3, Qwen 3) were more vulnerable than safety‑oriented designs (e.g., Gemma 3). The takeaway echoed by practitioners: layer controls outside the model, assume eventual prompt injection success, and block dangerous actions downstream (more: https://www.linkedin.com/posts/helloamychang_death-by-a-thousand-prompts-open-model-vulnerability-activity-7392678891724861441-foCf/;).

To safely wire tools into agent workflows, a solo developer released a “bridge + prompt injector” for AnythingLLM that emulates a Model Context Protocol (MCP) endpoint, routes calls to real MCP services across Docker networks, and adds a security layer—no docker.sock, no DinD, with input sanitization, allow‑lists, and audit logging. The injector decides when to call tools (e.g., time, weather, docs) via system rules and JSON directives, preserving isolation and control while enabling tool access in containerized setups (more: https://www.reddit.com/r/LocalLLaMA/comments/1otvabi/anythingllm_mcp_bridge_prompt_injector/).

Agent coordination is also maturing around MCP semantics. The mcp_agent_mail project positions itself as “Gmail for coding agents,” adding message-based coordination plus consistent MCP server/client tooling. A recent refactor replaced “claims/claim_paths” with “file reservations/reserve_file_paths” across docs, tests, examples, and configs—reducing onboarding errors and making RBAC, OAuth, and API usage consistent after earlier terminology changes (more: https://github.com/Dicklesworthstone/mcp_agent_mail).

For execution safety, Katakate’s k7 offers self‑hosted, lightweight VM sandboxes for untrusted code using Kata Containers + Firecracker on K3s. It hardens workloads with VM-level isolation, seccomp, non‑root defaults, egress lockdowns, and hashed API keys, and targets use cases like custom serverless, hardened CI/CD, and AI agents that run arbitrary code at scale. It’s in early alpha and under security review, but shows the right “blast radius minimization” instincts for agentic systems (more: https://github.com/Katakate/k7).

A critical RCE in Imunify360’s Ai‑Bolit scanner (up to 56M sites impacted) was patched in v32.7.4.0; the flaw stems from deobfuscation routines that extracted attacker-controlled function names and executed them without safety checks via a helper, enabling calls like system/exec/shell_exec. Given limited vendor disclosure and prior critical issues, admins should update urgently and hunt for compromise (more: https://patchstack.com/articles/remote-code-execution-vulnerability-found-in-imunify360/).

Neural Trader is a high‑performance, AI‑driven trading platform built on Rust and TypeScript, designed to be controlled directly through natural‑language prompts in tools like Claude Code and Cursor. It combines fast backtesting, live trading, neural models, portfolio optimization, and risk management in a modular architecture of eighteen packages. The system supports self‑learning, multi‑agent swarms, and advanced features like federated learning and sublinear‑time solvers. Users can install the full platform or assemble only the components they need. (more: https://www.npmjs.com/package/neural-trader).

Kubernetes SIG Network and the Security Response Committee announced the retirement of Ingress NGINX by March 2026. Ongoing maintenance is best‑effort until then; no security fixes afterward. Technical debt and insufficient maintainership, including unsafe “snippet” directives, drove the decision. Users should migrate to Gateway API or alternative Ingress controllers; existing artifacts remain available, but relying on a retired controller for ingress security is a gamble (more: https://www.kubernetes.dev/blog/2025/11/12/ingress-nginx-retirement/).

Security-critical apps are tightening code hygiene, too. KeePassXC outlined its quality-control process and updated contribution policies to address AI‑generated code, while shipping features like Proton Pass import, passkey refinements, and UX fixes. The message is simple: for password managers, rigorous review and signed, tested changes aren’t optional (more: https://keepassxc.org/blog/2025-11-09-about-keepassxcs-code-quality-control/).

A hobbyist setup combined LM Studio, Caddy, and Cloudflare Tunnels to self‑host a GPT‑style chat with Zero Trust auth. Commenters suggest that Cloudflared can front the UI directly, while OpenWebUI offers multi‑user auth and llama.cpp via llama‑swap enables multi‑model backends (STT/TTS/embeddings). The practical tip: secure the public endpoint, mind CORS, and keep it simple if Cloudflare already solves port mapping (more: https://www.reddit.com/r/LocalLLaMA/comments/1ov4isj/i_built_my_own_selfhosted_gpt_with_lm_studio/).

Troubleshooting Ollama: a user’s downloaded model wasn’t listed, but re‑pull reported it existed—likely a multi‑instance or script issue. Suggestions included pulling directly without the script and checking for multiple Ollama daemons, a common pitfall in homelabs (more: https://www.reddit.com/r/ollama/comments/1owcm4k/cant_find_model_in_ollama/).

Meanwhile, a PSA rallied Claude Code Web users to spend their free credits fixing broken Hugging Face Spaces—dependency hell, dead links, and mismatched Gradio versions—by cloning repos, letting Claude repair build/runtime issues, testing, and PR’ing. It’s a clever way to upskill while lifting the community (more: https://www.reddit.com/r/ClaudeAI/comments/1ow6n41/psa_claude_code_web_users_want_something_useful/).

For teams standardizing Retrieval‑Augmented Generation across projects, llama‑pg is an open-source “RAG as a Service” orchestrator that automates parsing (e.g., LlamaParse) and embeddings via TimescaleDB’s pgai, with OpenAI‑compatible models and Helm/Docker deployment options. Centralizing ingestion/vectorization improves governance and reuse while keeping data private (more: https://www.reddit.com/r/LocalLLaMA/comments/1otmmq5/i_built_a_rag_as_a_service_orchestrator_for_local/).

On the edge, a tutorial and code demonstrate domain‑specific LoRA fine‑tuning of small language models, conversion to ONNX, and secure execution in browsers and Node.js. This pattern—private fine‑tuning + web‑runtime inference—shrinks latency, avoids sending data out, and fits modest hardware, useful for enterprise document assistants and offline apps (more: https://www.reddit.com/r/LocalLLaMA/comments/1ou1a2x/finetuning_slms_and_running_them_securely_in_your/).

A Rust reimplementation of Karpathy’s Nanochat is available as inference‑only today, aiming to be a small, hackable cognitive core with a centralized model loader and Hugging Face integration. The author notes Candle’s GPU kernels were “buggy” in their experience and only marginal speedups over PyTorch so far; training (SFT/RL) may come later. It’s a reminder: language choice can reduce dependencies, but GPU kernel maturity determines throughput (more: https://www.reddit.com/r/LocalLLaMA/comments/1otu6ez/hi_reddit_i_rebuilt_karpathys_nanochat_in_pure/).

Hugging Face and Google Cloud announced a deeper partnership: a CDN Gateway that caches models/datasets on Google Cloud for Vertex AI, GKE, Cloud Run, and VM users; native TPU support in HF libraries; and simplified deployment from model pages into Vertex Model Garden or GKE. With 10x growth in Google Cloud usage and petabytes of monthly downloads, the integration targets time‑to‑first‑token and model supply‑chain robustness while lowering costs on HF Inference Endpoints (more: https://huggingface.co/blog/google-cloud).

Salesforce and UIUC researchers proposed xRouter, an RL‑trained router that either answers directly or orchestrates calls to external models, optimizing a success‑contingent, cost‑sensitive reward: “no success, no reward; on success, cheaper is better.” It tracks costs per turn and per episode, logs decisions for audit, and shows that learned routing can outperform brittle escalation heuristics—though eliciting complex multi‑step behaviors in small open models remains hard. Tooling and evaluation are released to spur further work (more: https://arxiv.org/abs/2510.08439v1).

A complementary problem is domain shift: models trained on one dataset often flop on another. An editorial highlights a simple training approach that encourages domain‑invariant features by making the network unable to tell which dataset any example came from. The editor calls it well‑motivated and easy to integrate, with consistent gains on out‑of‑domain data—substance over incremental benchmark shaving (more: https://www.linkedin.com/posts/andriyburkov_when-you-train-a-model-on-one-dataset-it-activity-7392804316769701888-166x/).

IBM’s Granite‑4.0‑H‑350M is a 350M‑parameter instruct model under Apache‑2.0 with multilingual support and strong tool‑calling. The “H” variants blend attention with Mamba2 layers, long context (32K), and shared embeddings. Benchmarks show competitive small‑model performance (e.g., HumanEval pass@1 ≈39, GSM8K 8‑shot ≈31) and solid alignment metrics; intended for on‑device and fine‑tune‑friendly deployments, with simple Transformers examples provided (more: https://huggingface.co/ibm-granite/granite-4.0-h-350m).

Tencent’s HunyuanWorld‑Mirror is a feed‑forward model that ingests any subset of geometric priors—calibrated intrinsics, camera pose, depths—and predicts a suite of 3D outputs in one pass: point clouds, multi‑view depths, camera parameters, surface normals, even 3D Gaussians. The “any‑prior prompting” and universal prediction architecture aim to unify 3D reconstruction tasks across varied inputs (more: https://huggingface.co/tencent/HunyuanWorld-Mirror).

Digging into OpenAI’s Responses API with web_search enabled shows two distinct lists: all discovered sources and the subset actually cited in message annotations. To improve citation odds, practical content tweaks help: use tables over prose for extractability, semantic HTML (proper headings/lists), freshness signals (“Last updated: YYYY‑MM‑DD”), schema.org markup (FAQ/HowTo/Article), TL;DR up front, and allow OAI‑SearchBot in robots.txt. A tool called Datagum tests accessibility, source discovery, and citation gaps per query set—though reactions were mixed to negative in comments (more: https://www.reddit.com/r/ChatGPTCoding/comments/1os9la9/i_took_a_deep_dive_into_chatgpts_web_search_api/).

Tiny386 brings a C99 80386 emulator with peripherals (from TinyEMU/QEMU) and SeaBIOS to the ESP32‑S3, booting Windows 95 on a $30 touchscreen dev board. It’s “borderline usable” in the demo, with potential speedups from smarter graphics emulation (e.g., period‑correct 2D accelerators) and storage drivers. It won’t replace a vintage 386DX‑40, but for DOS titles and point‑and‑click classics, it’s an impressive feat—and a lot zippier than Linux on an 8‑bit AVR ever was (more: https://hackaday.com/2025/11/13/tiny386-on-an-espressif-esp32-s3/).

Sources (21 articles)

[Editorial] https://www.linkedin.com/posts/andriyburkov_when-you-train-a-model-on-one-dataset-it-activity-7392804316769701888-166x/ (www.linkedin.com)
[Editorial] https://www.npmjs.com/package/neural-trader (www.npmjs.com)
[Editorial] https://thehackernews.com/2025/11/microsoft-uncovers-whisper-leak-attack.html (thehackernews.com)
AnythingLLM MCP Bridge & Prompt Injector (www.reddit.com)
I built a RAG as a Service orchestrator for local models (www.reddit.com)
Fine-Tuning SLMs and Running Them Securely in Your Web Browser (www.reddit.com)
I built my own self-hosted GPT with LM Studio, Caddy, and Cloudflare Tunnel (www.reddit.com)
Hi reddit, I rebuilt Karpathy's Nanochat in pure Rust [nanochat-rs] (www.reddit.com)
Can't find Model in Ollama (www.reddit.com)
I took a deep dive into ChatGPT's web_search API to learn how to get my content cited. Here's what I found. (www.reddit.com)
[PSA] Claude Code Web users: Want something useful to do with your $1k free credits? Help fix all the borked HuggingFace Spaces. (www.reddit.com)
Katakate/k7 (github.com)
Dicklesworthstone/mcp_agent_mail (github.com)
Critical RCE patched in Imunify360 affects up to 50M+ websites (patchstack.com)
Kubernetes Ingress Nginx is retiring (www.kubernetes.dev)
About KeePassXC's Code Quality Control (keepassxc.org)
ibm-granite/granite-4.0-h-350m (huggingface.co)
tencent/HunyuanWorld-Mirror (huggingface.co)
Tiny386 on an Espressif ESP32-S3 (hackaday.com)
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning (arxiv.org)
Building for an Open Future - our new partnership with Google Cloud (huggingface.co)