Agent Security -- From Prompt Injection Pentesting to Systems Foundations

Published on

Today's AI news: Agent Security -- From Prompt Injection Pentesting to Systems Foundations, Governing Agent Actions -- The Authorization Boundary Problem, The Maturing Agent Toolchain, AI Hardware -- Hard-Coded Silicon Meets Salvaged GPUs, The Human Side of AI-Assisted Development, Agent Learning and Memory Systems, Gemini 3.1 Pro. 22 sources curated from across the web.

Agent Security -- From Prompt Injection Pentesting to Systems Foundations

Prompt injection has held the number-one slot on the OWASP LLM Top 10 for long enough that it should feel familiar. It does not. A comprehensive new guide from Bugcrowd walks security engineers through the full attack surface of LLM-integrated systems, and the two real-world case studies it includes are worth the read alone. In the first, a pen tester coerced a chat assistant's unrestricted HTTP tool into fetching the AWS Instance Metadata Service at `169.254.169.254`, harvesting temporary IAM credentials -- access key ID, secret key, session token -- through nothing more than a conversational prompt. In the second, a poisoned PDF uploaded to a RAG-based document assistant silently overrode the system prompt, causing the model to leak nonpublic internal API URLs and staging environment details that the attacker then used to enumerate less-secured infrastructure. The core problem remains unchanged: "An LLM doesn't inherently distinguish" instructions from data; "everything it sees is just text that might contain instructions." The guide's testing methodology -- map what the model can call, what data it reads, what happens to its output, then probe systematically across direct injection, indirect injection, and insecure output handling -- is the closest thing the field has to a standardized playbook. (more: https://www.bugcrowd.com/blog/a-guide-to-the-hidden-threat-of-prompt-injection)

A much larger effort to formalize this problem arrived from Google, UC San Diego, UW-Madison, Meta FAIR, Gray Swan AI, and Cornell in an updated Systematization of Knowledge paper. "Systems Security Foundations for Agentic Computing" identifies four fundamental challenges when applying classical security engineering to agents. First, the Trusted Computing Base is probabilistic -- "a probabilistic reference monitor might correctly deny unauthorized access 99% of the time, but the remaining 1% can be exploited." Second, security policies are dynamic and task-specific; unlike Android apps with fixed permission sets, agents synthesize programs on the fly from natural-language prompts. Third, the security boundary is fuzzy: agents lack the layered architectures (hardware, kernel, process) that traditional systems use for enforcement. Fourth, prompt injection is functionally analogous to dynamic code loading -- "a very challenging problem that remains difficult to address today." The paper catalogs 11 real-world attacks on deployed systems, including Microsoft 365 Copilot data exfiltration via ASCII smuggling and Claude Code DNS exfiltration via the ping command, mapping each to violated classical security principles. The conclusion is measured but firm: "traditional security principles and techniques still apply and can thwart many attacks on agentic systems, but some attack classes expose genuinely new problems that require new solutions." (more: https://arxiv.org/abs/2512.01295)

On the offensive tooling side, Gadi Evron's Exploitation Validator takes a refreshingly honest approach to the hallucination problem in LLM-driven vulnerability scanning. The pipeline runs through five stages -- inventory, one-shot exploitability check, systematic attack-tree analysis, sanity validation against actual code ("file exists, path correct, lines accurate, code verbatim, flow is real"), and a final ruling that filters out findings with hedging language or unrealistic preconditions. The core design principle: "Assume exploitable until proven otherwise." It is a useful corrective to the wave of AI security tools that report findings without proving them. (more: https://github.com/gadievron/exploitation-validator) A related arXiv paper examines what breaks embodied AI security when LLM vulnerabilities intersect with cyber-physical system flaws, though the full paper was not yet available for analysis at publication time. (more: https://arxiv.org/abs/2602.17345v1)

Governing Agent Actions -- The Authorization Boundary Problem

When automation touches live workflows, the question stops being "does the AI work?" and becomes "who authorized this action?" The Unhyped AI newsletter crystallized this transition as the "AI Automation Ceiling" -- the predictable moment when agentic systems move from advisory pilots into production workflows and expose that "the organisation's capacity to govern delegation lagged behind deployment." The pattern is familiar to anyone who has watched enterprise software rollouts: "Exception rates rise. Definitions of 'good' fracture across functions. Outputs that are technically correct create operational rework." The essay's sharpest insight is that most organizations misdiagnose the slowdown as an execution problem when it is actually structural: "Agentic automation rarely fails first as intelligence. It fails as coordination." Shadow systems -- human workarounds built to compensate for the agent's blind spots -- become "the interest payment on exceptions you never costed." The three board-level questions it proposes (true exception rate including manual correction, named decision-rights owners for uncertainty, and explicit stop criteria without political fallout) are the kind of operational framework that separates organizations that scale agents from those that carry them. (more: https://unhypedai.substack.com/p/the-ai-automation-ceiling)

Faramesh offers an architectural answer to this coordination gap. The research paper (arXiv 2601.17744) introduces the Action Authorization Boundary (AAB) -- a mandatory, non-bypassable enforcement layer between agent reasoning and real-world execution. The core argument: "inference produces information, whereas execution produces consequences," and current frameworks collapse this distinction by treating proposal and execution as a single step. Every agent-generated action passes through a deterministic authorization function that maps a Canonical Action Representation, policy set, and system state to one of three outcomes: PERMIT, DEFER, or DENY. The system achieves 2.24ms median decision latency, sustains 7,800 actions per minute on a single M1 machine, and in stress testing produced zero double-executions across one million duplicated requests with 64 concurrent workers. The open-source implementation provides YAML-based policies with first-match-wins evaluation, automatic risk scoring, and a real-time web dashboard -- with drop-in wrappers for LangChain, CrewAI, AutoGen, and MCP. (more: https://www.arxiv.org/pdf/2601.17744) (more: https://github.com/faramesh/faramesh-core) The video walkthrough demonstrates the practical flow: a $85 damaged-shirt refund auto-approves against policy, a $225 missing-order claim pauses for manager review, and a $2,600 full-refund attempt is blocked instantly before touching the customer's account. The sealed audit trail -- policy version, exact parameters, human-vs-policy approval -- is the kind of evidence chain that compliance teams actually need. (more: https://www.youtube.com/watch?v=i9m3u9xz7d0)

The Maturing Agent Toolchain

The infrastructure around AI agents is converging fast. Charlotte, a new open-source MCP server for browser interaction, attacks the token-efficiency problem head-on. Instead of dumping raw accessibility trees into the context window, it decomposes pages into structured representations with landmarks, headings, and stable hash-based element IDs across three detail levels -- minimal for orientation, summary for context, full for deep inspection. The benchmarks are striking: a Wikipedia page consumes 7,667 characters through Charlotte versus 1,040,636 through Playwright MCP; Hacker News takes 336 characters versus 61,230. A 100-page browsing session drops from roughly $15.30 in input tokens on Claude Opus to about $0.09. The project was built and verified by an agent using Charlotte itself -- catching a mobile overflow bug and fixing 16 unlabeled SVG icons without a human looking at the page. (more: https://www.reddit.com/r/Anthropic/comments/1rbtqxi/i_built_an_open_source_browser_mcp_server_that/)

On the execution side, Kilntainers gives every agent an ephemeral Linux sandbox via MCP -- disposable containers that isolate tool execution and prevent agents from persisting state or escaping their environment. (more: https://www.reddit.com/r/LocalLLaMA/comments/1rbqwlh/give_every_agent_an_ephemeral_linux_sandbox_via/) Run-Agent takes orchestration in a different direction: a shell script and markdown-based message bus that launches Claude, Codex, or Gemini agents with full isolation and traceability. Each agent gets a defined role (orchestrator, researcher, implementer, reviewer, tester, debugger, monitor), a dedicated run folder capturing the exact prompt and full stdout/stderr, and an append-only message bus for inter-agent communication. The design philosophy is deliberately low-tech -- "no databases, no infrastructure, just a shared markdown file" -- which makes it auditable in ways that complex orchestration frameworks often are not. The agents run with full permissions by design, a tradeoff the documentation is refreshingly honest about. (more: https://run-agent.jonnyzzz.com)

Manifold rounds out the tooling picture by giving each agent its own git branch, its own terminal, and real-time conflict detection when two agents touch the same file -- "Three agents. One repo. Zero conflicts." The UI is a multi-pane terminal environment with resizable layouts, a built-in diff viewer, and file browser with syntax highlighting. No Electron. No loading spinners. The tool is agent-agnostic (Claude Code, Codex, Gemini CLI) and the isolation model is simple: automatic branch creation on spawn, cleanup on completion. (more: https://manifold.no)

AI Hardware -- Hard-Coded Silicon Meets Salvaged GPUs

At one end of the inference hardware spectrum, Taalas emerged from stealth with over $200 million in funding and a radical proposition: stop trying to make compute engines malleable and just etch the model weights directly into transistors. The "Hard Coded Inference" architecture pairs a mask ROM recall fabric with SRAM for KV caches and adaptations. CEO Ljubisa Bajic -- who previously founded Tenstorrent -- claims the density is "basically insane": "We can store four bits away and do the multiply related to it -- everything -- with a single transistor." The first-generation HC1 chip is fabricated on TSMC 6nm with 53 billion transistors, supports 8 billion parameters per chip, and burns about 200 watts per card. Performance benchmarks on Llama 3.1 8B show substantial throughput advantages over Nvidia's B200, Groq, SambaNova, and Cerebras, with significantly lower latency and cost per token. The obvious limitation -- every model revision requires a new chip spin -- is mitigated by a two-month turnaround from weights to deployed PCI-Express cards, changing only two metal layers. With training runs costing billions and model release cadences lengthening, Taalas argues the economics work: it costs 100x more to train a model than to customize an HC chip. A frontier-class LLM across multiple HC2 cards is planned for year-end. (more: https://www.nextplatform.com/2026/02/19/taalas-etches-ai-models-onto-transistors-to-rocket-boost-inference)

At the other end of the spectrum, an anonymous team has converted 800 retired Ethereum mining RX 580 GPUs into an AI inference cluster optimized for mass document OCR. The fundamental constraint is that these cards have decent 8GB VRAM but low PCIe speeds and are each stranded on Celeron G3950 boards -- tensor parallelism across nodes is impossible. The solution: treat each GPU as a completely independent inference worker running Qwen3-VL-8B via llama.cpp's Vulkan backend. Getting there required building the entire graphics stack from source (libdrm, Wayland, Mesa with the RADV Vulkan driver, Vulkan SDK) and discovering that AVX2/FMA binaries segfault on Celerons. The practical result is 200 GPUs processing 200 pages simultaneously with a four-tier quality escalation path (q4 through bf16 on six GPUs). A 966-page clinical ophthalmology textbook -- dense medical terminology, complex diagrams, multi-column layouts -- costs about $0.50 in electricity versus $12 through the OpenAI API, at comparable quality. A video frame analysis pipeline (800 frames across 800 GPUs for 60 seconds of video) is in development. (more: https://www.reddit.com/r/LocalLLaMA/comments/1r87ou8/update3_repurposing_800_rx_580s_converted_to_ai/)

The Human Side of AI-Assisted Development

Mark Russinovich and Scott Hanselman argue in a new ACM opinion piece that generative AI has created a "seniority-biased technological change" that amplifies experienced engineers while dragging down juniors. The evidence is concrete: Microsoft's Project Societas saw seven part-time engineers deliver a consumer-ready preview in 10 weeks with 98% AI-generated code. But the agents exhibit "intern-like" failures -- masking race conditions with sleeps, claiming success despite significant bugs, duplicating code, dismissing crashes. Labor data shows employment of 22-to-25-year-olds in AI-exposed jobs fell roughly 13% after GPT-4's release, even as senior roles grew. The proposed solution is a "preceptor program" pairing early-in-career developers with mentors at 3:1-5:1 ratios for at least a year, with coding assistants defaulting to Socratic coaching before code generation. The framing draws on MIT research showing "cognitive debt" -- reduced brain activity and lower recall -- in adults who used ChatGPT for writing. Quoting Ethan Mollick: "Every time we hand work to a wizard, we lose a chance to develop our own expertise; to build the very judgment we need to evaluate the wizard's work." (more: https://dl.acm.org/doi/10.1145/3779312)

That judgment crisis plays out in raw, unfiltered detail in "The Conductor Won't Stop Conducting," a personal essay documenting 81 Claude Code sessions, 596 messages, fifteen releases in eleven days, and 38 wrong-approach corrections. The author corrected the same CLAUDE.md rule -- don't run the full test suite -- twenty times across nine days. Each violation caused an out-of-memory crash, killing the session and destroying in-progress work. Three database losses in six days. An HNSW vector dimension mismatch (128 vs 768) that took six releases to eradicate. A code quality sweep revealing that a claimed "700+ patterns replaced" was actually 132 of 763 -- seventeen percent. The essay turns inward when the author discloses that his grandmother died a month prior and his father was hospitalized: "The moment I'm venting at an LLM, I'm no longer engineering. I'm coping. And coping isn't a debugging strategy." The actionable conclusion -- separate diagnosis from implementation, encode verification criteria so thoroughly they work even when you don't -- is the kind of hard-won operational wisdom that no benchmark captures. (more: https://forge-quality.dev/articles/conductor-wont-stop-conducting)

A developer building a local agent to manage Minecraft servers via Docker and RCON distilled complementary lessons from the opposite end of the sophistication spectrum. The single biggest improvement was not a better prompt but running real shell commands to discover the environment before asking the LLM to write code: "When the coder model gets `container_id = "a1b2c3d4"` injected as an actual Python variable, it uses it. When it has to guess, it invents IDs that don't exist." Structural fixes -- deleting bad state from context, rewriting task descriptions from scratch rather than appending errors -- beat "try again" retry logic every time. And hard pass/fail contracts ("each subtask declares what it must produce as STATE:key=value prints in stdout") catch the number-one local model failure mode: "the LLM writes code that prints 'Success!' without actually doing anything." (more: https://www.reddit.com/r/LocalLLaMA/comments/1r8e3ye/i_built_a_proof_of_concept_agent_that_manages/)

The thread connecting all three is that AI-assisted development bottlenecks have shifted from code generation to everything around it. An essay on intentional overuse as a learning strategy frames this explicitly: "The model isn't the bottleneck. The surrounding workflow is." The author advocates deliberately using AI for tasks it is probably bad at -- nuanced architecture decisions, subtle performance work -- because "the failures are the data." The mapped boundaries include review becoming slower than generation, context engineering emerging as its own discipline, multi-agent coordination overhead as the real cost nobody talks about, and the security surface growing with every tool the agent touches. The recommendation echoes the continuous delivery movement: "If deploying is painful, deploy more, until it isn't. If AI-assisted refactoring keeps going wrong, do more of it." (more: https://jedi.be/blog/2026/intentional-overuse-is-an-ai-coding-learning-strategy)

Agent Learning and Memory Systems

SkillRL, a new framework from the AIMING Lab, tackles the core limitation of episodic LLM agents: they cannot learn from past experience. Rather than storing raw trajectories (which are "lengthy and contain significant redundancy and noise"), SkillRL distills successful episodes into demonstration patterns and failed ones into concise failure lessons, then organizes both into a hierarchical skill library with general strategic principles and task-specific heuristics. A recursive evolution mechanism lets the skill library co-evolve with the agent's policy during reinforcement learning -- after each validation epoch, failed trajectories are analyzed to propose new skills and refine existing ones. The results on a 7B Qwen model are remarkable: 89.9% success rate on ALFWorld (beating GPT-4o by 41.9% and Gemini-2.5-Pro by 29.6%), with 10-20x token compression compared to raw trajectory storage. Removing the hierarchical structure costs 13.1%; replacing skills with raw trajectories causes up to 25% degradation. The skill library grows from 55 to 100 skills during training, with task-specific skills nearly doubling. (more: https://arxiv.org/html/2602.08234v1)

AgentDB v3, built on RuVector, extends the memory-as-substrate idea into what it calls a "Cognitive Container" -- a single RVF file holding vectors with HNSW indexes, LoRA adapters and reinforcement policies, episodic and causal memory, compression tiers, and a cryptographic witness chain proving the history has not been tampered with. Because the container is append-safe and copy-on-write, you can branch an entire intelligence state in milliseconds, test a new strategy, compare reward signals, and promote the better branch. The practical implication for agentic coding: a code review agent accumulates structured experience across thousands of pull requests, a debugging agent retains causal memory from production incidents, and all of it is portable -- move the file and the agent behaves identically. (more: https://www.linkedin.com/posts/reuvencohen_introducing-agentdb-v3-not-just-a-vector-activity-7431370152413364224-SSdR)

Agentic QE v3.7.0 integrates the RVF container format to give quality engineering agents a cryptographic witness chain -- "When an agent says 'all tests passed,' you can verify it. No more completion theater." The release also adds MinCut-based test optimization that models the test suite as a graph problem to identify critical-path versus safely-skippable tests, and copy-on-write learning isolation so agents can experiment with new testing strategies without contaminating production knowledge. (more: https://www.linkedin.com/posts/dragan-spiridonov_agenticqe-qualityengineering-opensource-share-7431695849065226240-Jpqq)

Gemini 3.1 Pro

Google shipped Gemini 3.1 Pro across consumer and developer products this week, building on the Gemini 3 series with what it calls a step forward in core reasoning. On ARC-AGI-2 -- a benchmark evaluating a model's ability to solve entirely new logic patterns -- 3.1 Pro scored 77.1%, more than doubling the reasoning performance of 3 Pro. The model is rolling out in preview via the Gemini API in AI Studio, Antigravity, Vertex AI, Gemini Enterprise, Gemini CLI, Android Studio, NotebookLM, and the Gemini app. The demo reel emphasizes practical applications over benchmarks: generating website-ready animated SVGs from text prompts, building a live ISS orbit dashboard from public telemetry APIs, and coding a 3D starling murmuration with hand-tracking and generative audio. Google notes it is releasing in preview "to validate these updates and continue to make further advancements in areas such as ambitious agentic workflows." Higher limits are available for Google AI Pro and Ultra subscribers. (more: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/)

Sources (22 articles)

  1. [Editorial] Bugcrowd Guide to Prompt Injection (bugcrowd.com)
  2. [Editorial] arXiv Research (arxiv.org)
  3. [Editorial] Exploitation Validator (github.com)
  4. What Breaks Embodied AI Security: LLM Vulnerabilities, CPS Flaws, or Something Else? (arxiv.org)
  5. [Editorial] The AI Automation Ceiling (unhypedai.substack.com)
  6. [Editorial] Faramesh — Research Paper (arxiv.org)
  7. [Editorial] Faramesh — Core Repository (github.com)
  8. [Editorial] Faramesh — Video Introduction (youtube.com)
  9. Charlotte: Open Source Browser MCP Server — 136x More Token-Efficient for Agents (reddit.com)
  10. Kilntainers: Give Every Agent an Ephemeral Linux Sandbox via MCP [Open Source] (reddit.com)
  11. [Editorial] Run-Agent (run-agent.jonnyzzz.com)
  12. [Editorial] Manifold (manifold.no)
  13. [Editorial] Taala's Etches AI Models onto Transistors to Rocket-Boost Inference (nextplatform.com)
  14. Repurposing 800 RX 580s into an AI Inference Cluster: Mass Document OCR at 24x Lower Cost (reddit.com)
  15. [Editorial] ACM Research (dl.acm.org)
  16. [Editorial] The Conductor Won't Stop Conducting (forge-quality.dev)
  17. Local Agent Managing Minecraft Servers: Hard Lessons in Making LLMs Actually Do Things (reddit.com)
  18. [Editorial] Intentional Overuse Is an AI Coding Learning Strategy (jedi.be)
  19. [Editorial] arXiv Research (arxiv.org)
  20. [Editorial] Introducing AgentDB v3 (linkedin.com)
  21. [Editorial] Agentic Quality Engineering (linkedin.com)
  22. Gemini 3.1 Pro (blog.google)