AI Security and Safety Concerns

Published on

Today's AI news: AI Security and Safety Concerns, Local AI Model Deployment and Optimization, AI Development Tools and Infrastructure, High-Performance ...

The rapid proliferation of AI coding agents has created a new category of security vulnerability that organizations are only beginning to understand—and the implications are sobering. Thomas Roccia's detailed analysis on SecurityBreak.io frames the problem bluntly: these tools have evolved from simple code assistants into "operators" that design interfaces, connect to email systems, send reminders, and execute real actions through terminal commands. The four capabilities that make modern agents like Claude Code, GitHub Copilot, and Codex useful—file system access, command execution, web browsing, and external service connections via MCP (Model Context Protocol) integrations—are precisely what makes them dangerous (more: https://blog.securitybreak.io/coding-agents-the-insider-threat-you-installed-yourself-35644a1d5409).

Roccia identifies two primary attack vectors. The first is prompt injection, where agents can be "poisoned" through natural language by reading external content or configuration files specifically crafted to exploit them. While AI models include guardrails, these protections remain easy to bypass. The second issue is visibility: during a 30-minute Claude Code session touching dozens of files and executing multiple commands, most teams cannot audit what actually happened. "At this point, most teams realize they do not have a tooling problem," Roccia writes. "They have a visibility problem." His proposed solution, NOVA Protector, uses Claude Code's hook system to intercept and log agent behavior at specific points in the execution lifecycle, providing the control layer that's currently missing from most deployments.

The OpenCode vulnerability disclosed this week illustrates these risks concretely. A security researcher found that OpenCode can execute commands defined in a repository's configuration file—meaning if you clone a repo and run the tool, the repository can run code on your machine without prompts or confirmation. The maintainers responded professionally but classified this as expected behavior, not a vulnerability: the tool doesn't sandbox the AI agent, and MCP servers are the user's responsibility. It's documented, but documentation doesn't help if people don't read it, and most developers the researcher spoke with didn't know that configuration files could spawn processes (more: https://www.linkedin.com/posts/activity-7419736138325696512-R0qY). The recommended mitigation is straightforward but often ignored: check configuration files before running unfamiliar repos, consider containers for untrusted code, and understand that config files now have more power than traditional expectations suggest.

Rob van der Veer's framework for protecting agentic AI systems outlines seven defensive layers, from model alignment through just-in-time authorization, with a sobering acknowledgment that each layer has been proven insufficient on its own (more: https://www.linkedin.com/posts/robvanderveer_ai-aisecurity-activity-7419684559086161920-XYNF). Meanwhile, the supply chain security problem extends beyond agents to the models themselves. Veritensor's scan of 2,500 Hugging Face models for malware demonstrates the scale of the challenge: the platform uses deep AST analysis and cryptographic verification to detect Remote Code Execution vulnerabilities, reverse shells, and Lambda injections that standard antivirus tools miss, while also flagging license compliance issues that could create legal exposure (more: https://github.com/ArseniiBrazhnyk/Veritensor). New research from Sahoo et al. proposes a Cross-Trace Verification Protocol that leverages a model's own predictions of execution traces across semantically equivalent program transformations to detect backdoors—an approach with theoretical guarantees against adversarial gaming (more: https://arxiv.org/abs/2512.13821). The common thread across all these developments: the security model for AI-assisted development remains fundamentally unsolved, and the community is still learning what questions to ask.

GLM-4.7-Flash has emerged as a compelling option for developers seeking high-performance local inference, and the tooling to run it properly is finally catching up. Unsloth announced that the 30B Mixture-of-Experts model—which tops benchmarks for its size class on SWE-Bench, GPQA, and several other evaluation suites—can now run locally on devices with 24GB RAM. The model features a 200K context window and excels at coding, agent workflows, chat, and reasoning tasks (more: https://www.linkedin.com/posts/unsloth_you-can-now-run-glm-47-flash-locally-on-activity-7419220348719624192-CV65).

Getting optimal performance required some community troubleshooting. Initial quantized versions produced nonsensical outputs due to an incorrect gating function parameter, but Unsloth's updated GGUFs inject the correct "sigmoid" scoring function directly into the metadata—no llama.cpp updates required, just re-download the quants (more: https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF). Users on RTX 6000 Blackwell hardware report over 2,000 tokens per second for prompt processing and 97 tokens per second for generation, with outputs described as "fantastic for a model this size." LM Studio's latest version includes native support, with users comparing its tool-calling capabilities favorably to Nemotron 30B (more: https://www.reddit.com/r/LocalLLaMA/comments/1qir5eq/here_is_how_to_get_glm_47_working_on_llamacpp/).

The recommended inference parameters require some tuning from defaults: adding --dry-multiplier 1.1 reduces looping issues (this differs from repeat penalty), while temperature settings around 0.2 with top-k 50 and top-p 0.95 work well for general use cases. Tool-calling scenarios benefit from lower dry-multiplier values or disabling it entirely. For those building on llama.cpp directly, a specific branch enables flash attention on CUDA, and an override flag (--override-kv deepseek2.expert_gating_func=int:2) ensures correct model behavior before the quantization fix propagated.

The gap between AI tool capabilities and their setup documentation continues to frustrate developers. A detailed community guide for connecting Aider to local inference endpoints reveals just how non-obvious the configuration process remains—a situation the author describes diplomatically as "nearly devoid of details or helpful examples" (more: https://www.reddit.com/r/LocalLLaMA/comments/1qis3y9/aiders_documentation_for_getting_connected_to/).

The solution requires three separate configuration files in the home directory, each serving distinct purposes. The main config file (~/.aider.conf.yml) handles API endpoint details, model identifiers, and paths to the other configuration files. A model settings file (~/.aider.model.settings.yml) defines edit format, repository map usage, and agentic coding flags—many with minimal documentation. Finally, a model metadata JSON file stores use-case agnostic parameters like max context and token costs. The critical gotcha: you must prepend openai/ to your model identifier everywhere, which Aider strips before passing requests to your OpenAI-compatible endpoint. So LMStudio only sees mistralai/devstral-small-2512 while the configuration files specify openai/mistralai/devstral-small-2512. The author notes that both Devstral and Gemini 3 Pro failed to help configure the setup correctly—a reflection of how sparse the documentation is rather than model capability.

For developers preferring a more unified approach, Polymcp now integrates Ollama for local and cloud execution with minimal boilerplate—a few lines of code switch between providers like gpt-oss:120b, Kimi K2, and Nemotron without additional setup (more: https://www.reddit.com/r/ollama/comments/1qepzot/polymcp_integrates_ollama_local_and_cloud/). Meanwhile, SageCompass demonstrates what production-ready LangGraph architecture looks like: a hexagonal pattern with 110 tests, organizing an AI decision system that checks whether business ideas actually need AI before resources are committed. The multi-stage pipeline—problem framing, measurable goals, data availability assessment, and decision synthesis—mirrors the workflow of human consultants while applying consistent logic (more: https://github.com/cleverhoods/sagecompass).

A fully enclosed 10-GPU mobile build designed for Mixture-of-Experts models demonstrates what's possible when someone commits $17,000 and considerable engineering effort to the local inference problem. The system targets Deepseek and Kimi K2 while supporting lengthy video generation and rapid high-detail image generation—the latter critical for a graphic designer's workflow (more: https://www.reddit.com/r/LocalLLaMA/comments/1qi4uj2/768gb_fully_enclosed_10x_gpu_mobile_ai_build/).

The hardware configuration combines an AMD Threadripper Pro 3995WX with 512GB DDR4 system RAM, eight NVIDIA RTX 3090s, and two RTX 5090s for a combined 768GB of addressable memory (256GB VRAM plus system RAM). The dual PSU setup—EVGA 1600W and Asrock 1300W—provides 2900W total power capacity. Running Ubuntu, the system achieves its primary design goals: mobility, full enclosure, and sufficient memory for extra-large MoE models.

The enclosure solution proved more challenging than expected. Mining frames zip-tied to wheeled racks were ruled out for aesthetic and structural reasons, and experimental attempts failed. The builder settled on the Thermaltake Core W200, a dual-system enclosure that most multi-GPU discussions overlook. By installing the motherboard upside-down in the secondary compartment, risers can connect to GPUs mounted in the main compartment—creating the ideal orientation for this density of cards. The tradeoff is working space: dense compartments, some zip ties still required to secure GPUs, and one minor caveat regarding full enclosure. Three of the 3090s use AIO hybrid cooling, requiring radiator mounting on a fan rail that prevents the glass panel from closing during operation. Without the two 5090s, the build cost would have been closer to $10,000 while remaining "extremely capable"—the 5090s were specifically included for image/video generation time savings. Two 6000 PROs alone would have consumed the entire project budget, illustrating the economics that drive these consumer-card configurations.

On-device browser agents are moving from research curiosity to practical reality. A new Chrome extension demonstrates local browser automation powered by WebGPU inference using Liquid LFM and Qwen models, running entirely within the browser without external hosting or Ollama (more: https://www.reddit.com/r/LocalLLaMA/comments/1qh10q9/demo_ondevice_browser_agent_qwen_running_locally/).

The architecture splits functionality between two models: a lightweight model interfaces directly with the browser via WebGPU extensions, handling DOM events and element interactions, while a more capable Qwen model processes natural language queries and knows the protocol for communicating with the browser model. This separation means the larger model doesn't need to see entire HTML documents—it sends high-level instructions like "click the submit button" rather than parsing full page structures that would rapidly fill context windows. The RunAnywhere SDK team plans to extend WebGPU integration across their Kotlin, Swift, React Native, and Flutter SDKs, all connecting to a C++ library managing multi-inference engine support. Current limitations include failures on google.com—any action targeting that domain or having it as the active tab causes execution failure.

Claude Code's remote development capabilities received significant upgrades that went largely unnoticed by the community. A developer using the Max200 subscription discovered that version 2.1.x releases introduced remote environment settings, pre-session hooks, and a Debian-on-Docker environment that enables a complete agentic configuration to run from a phone (more: https://www.reddit.com/r/ClaudeAI/comments/1qdp5ri/am_i_the_only_one_in_to_enjoy_the_latest_remote/). The workflow uses pre-session hooks to detect remote state, clone configuration from a private repo, and load all settings, agents, skills, and plugins into the remote container. Remote session performance now approaches laptop equivalents, including integration tests and Kubernetes management.

The Agentic QE Fleet's 14-day rewrite using Domain-Driven Design principles offers concrete metrics on what architectural discipline provides. Moving from 5,334 files across 60+ directories to 546 files organized into 12 clean bounded contexts didn't just reduce complexity—it improved agent output quality measurably. The team's "brutal honesty review skill" previously required three or more iteration rounds to get features properly implemented, integrated, and verified. With the DDD architecture, that dropped to two rounds consistently (more: https://forge-quality.dev/articles/14-days-12-domains-architecture). Integration between components remains the hardest problem in agentic development—components work in isolation, then struggle when they need to communicate—but explicit interfaces and separation of concerns reduce the blast radius of failures.

A candid Reddit thread cataloging simultaneous failures across multiple frontier models—Opus 4.5 "stupid again," GPT 5.2 "suddenly unable to fix stuff," Gemini 3 "tuned down to hell"—ended with an unexpectedly mundane diagnosis: a full disk caused by a runaway log file (more: https://www.reddit.com/r/ChatGPTCoding/comments/1qitqu3/all_major_ai_stupid_again_alternatives/). The author's self-deprecating update—"I've been doing this for 20 years+ and made a real rookie mistake"—offers a useful reminder that infrastructure problems often masquerade as model quality issues.

The thread nonetheless surfaced genuine observations about model performance variability. One commenter proposed a "decay theory": quality ebbs and flows, degrading over time until models are repackaged into newer versions. Major releases to thinking and compaction capabilities do create temporary chaos, as teams and users adapt to changed behavior. The "I'm leaving Claude forever" posts that spike after such releases reflect real frustration even when underlying causes are prosaic. GLM 4.7 received praise as an alternative that "feels GPT 5ish" for users experiencing issues with commercial models.

A more fundamental critique comes from a developer who has built systems using attention as a control signal rather than a relevance score, with explicit graph structures modeling relationships and temporal compression enabling what he describes as "infinite context"—accumulated exact state rather than sliding windows of likely next words (more: https://www.linkedin.com/posts/reuvencohen_llms-are-a-dead-end-not-because-they-are-activity-7419916372274470912-_5Lc). The argument: LLMs put intelligence in the wrong place, making them good at sounding right without understanding. When language is needed, a sentence transformer converts mathematical concepts to natural language for human consumption. The commenters push back appropriately: LLMs aren't just next-token predictors, they're trained on essentially the entire corpus of human knowledge captured as weights. But the architectural critique—that prediction differs fundamentally from comprehension, and current systems conflate the two—resonates with developers hitting the limits of what prompt engineering can achieve.

A year spent processing over one million emails for context engineering produced insights that challenge common assumptions about AI data preparation. Thread reconstruction proved significantly harder than anticipated: replies, forwards, participants joining mid-conversation, and decisions revised multiple emails later all complicate the question of "who said what and why it matters" (more: https://www.reddit.com/r/LocalLLaMA/comments/1qg4d4t/what_we_learned_processing_1m_emails_for_context/).

Most existing systems concatenate text chronologically, hoping LLMs will understand context. This approach fails quickly because it loses attribution and significance. RFC headers initially seemed promising but proved insufficient, leading the team to build client-specific parsing. Gmail uses nested quote blocks, Outlook formats "From: X, Sent: Y" headers differently, and users vary between bottom-posting and top-posting styles. Each variation requires explicit handling.

The discovery that "attachments are half the conversation" forced a complete rethinking of what counts as content. PDFs, contracts, and invoices aren't metadata—they drive decisions. Building OCR and structure parsing capabilities became essential. Community members validated this finding with war stories: managers saving charts as images embedded in Word documents for PowerPoint presentations, contract versions named "contract20260112.v2.correct-by-josef.1.4.topsecret.docx" with macros enabled, logo designs shared inside Excel files. The real world's attachment practices defy any tidy classification system. Multilingual threads presented another unexpected challenge—global teams switch languages mid-conversation more frequently than anticipated, and semantic search that performs well in English "completely breaks down" when cross-language understanding becomes necessary.

A developer who has shipped over two dozen MCP tools reaching millions of users offered a blunt assessment: "MCPs generally kind of suck, and the community is even worse" (more: https://www.linkedin.com/posts/reuvencohen_mcps-generally-kind-of-suck-and-the-community-activity-7420106621437095936-0qkE). The original idea—a common contract for tools to communicate with AI systems consistently—was reasonable. In practice, MCP abstracts the wrong layer. Intelligence gets pushed into prompts while enforcement lives outside the system. Behavior, permissions, and intent are negotiated through text rather than enforced through structure. Security becomes "prompt hygiene," authentication becomes convention, state becomes implied. That's not something you harden; it's something you babysit.

The rapid feature churn compounds these problems. Monthly additions increase surface area without adding guarantees. Tool authors chase compatibility, integrators stack glue code, and complexity grows faster than correctness. Governance receives particular criticism: MCP is controlled by insiders operating through a corporate-backed nonprofit whose incentives align with funders rather than builders. The recommendations: bring in developers who actually ship, freeze the protocol before adding features, make structure and enforcement first-class concerns, treat CLIs and traditional APIs as peers, and hold events outside the United States where they're accessible to global contributors.

Docker's new Sandboxes feature addresses some of these concerns by providing secure local execution for coding agents. When you run docker ai sandbox, Docker creates a container from a template image, mounts your current working directory at the same path, injects Git configuration so commits are attributed correctly, and prompts for authentication that persists across sessions (more: https://docs.docker.com/ai/sandboxes). The approach gives agents autonomy within containerized boundaries—a middle ground between full system access and crippled functionality.

The Memories MCP server takes a different approach to the agent problem: providing long-term memory that allows agents to learn product domains, understand codebase quirks, and adapt to coding preferences without solving the same issues repeatedly (more: https://github.com/cvsouth/memories-mcp). Running completely locally with no external data transmission, it uses a memory consolidation process inspired by human neuroscience—when agents finish tasks, learnings are recorded, processed by a local daemon, and merged into a memory file for future recall. Tencent's Youtu-Tip project pursues similar goals through a different architecture: a proactive on-device AI assistant with 1.96B and 4B models providing agent invocation, contextual intent detection, and GUI automation, all running offline (more: https://github.com/TencentCloudADP/youtu-tip). The convergence is clear: the community increasingly wants capable agents that keep data local and provide visibility into what's happening.

Sources (22 articles)

  1. [Editorial] https://www.linkedin.com/posts/reuvencohen_mcps-generally-kind-of-suck-and-the-community-activity-7420106621437095936-0qkE (www.linkedin.com)
  2. [Editorial] https://www.linkedin.com/posts/unsloth_you-can-now-run-glm-47-flash-locally-on-activity-7419220348719624192-CV65 (www.linkedin.com)
  3. [Editorial] https://www.linkedin.com/posts/activity-7419736138325696512-R0qY (www.linkedin.com)
  4. [Editorial] https://www.linkedin.com/posts/reuvencohen_llms-are-a-dead-end-not-because-they-are-activity-7419916372274470912-_5Lc (www.linkedin.com)
  5. [Editorial] docker ai sandbox (docs.docker.com)
  6. [Editorial] agentic qe (forge-quality.dev)
  7. [Editorial] https://www.linkedin.com/posts/robvanderveer_ai-aisecurity-activity-7419684559086161920-XYNF (www.linkedin.com)
  8. [Editorial] https://blog.securitybreak.io/coding-agents-the-insider-threat-you-installed-yourself-35644a1d5409 (blog.securitybreak.io)
  9. Here is how to get GLM 4.7 working on llama.cpp with flash attention and correct outputs (www.reddit.com)
  10. Demo: On-device browser agent (Qwen) running locally in Chrome (www.reddit.com)
  11. 768Gb Fully Enclosed 10x GPU Mobile AI Build (www.reddit.com)
  12. Aider's documentation for getting connected to local inference sucks. Hopefully this helps. (www.reddit.com)
  13. What we learned processing 1M+ emails for context engineering (www.reddit.com)
  14. Polymcp Integrates Ollama – Local and Cloud Execution Made Simple (www.reddit.com)
  15. All major AI stupid again, alternatives? (www.reddit.com)
  16. Am I the only one in to enjoy the latest remote code sessions on Claude.ai with my full agentic config? Anyone else had some breakthrough with it? (www.reddit.com)
  17. cvsouth/memories-mcp (github.com)
  18. TencentCloudADP/youtu-tip (github.com)
  19. I scanned 2,500 Hugging Face models for malware/issues. Here is the data (github.com)
  20. Show HN: LangGraph architecture that scales (hexagonal pattern, 110 tests) (github.com)
  21. Provably unmasking malicious behavior through execution traces (arxiv.org)
  22. unsloth/GLM-4.7-Flash-GGUF (huggingface.co)

Related Coverage