Ontologies and procedural memory rise

Published on

AI agents are getting smarter not by luck, but by structure. A widely shared industry take argues that agents “can’t reason without semantic structure,” and that production systems fail when the...

Ontologies and procedural memory rise

AI agents are getting smarter not by luck, but by structure. A widely shared industry take argues that agents “can’t reason without semantic structure,” and that production systems fail when they skip enforced schemas and ontologies in favor of free-form text (more: https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc). The prescription is layered: generation-time structured outputs (not just parse-time validation), an ontology-backed knowledge graph for entity relationships and constraints, orchestration logic grounded in business rules, and multi-agent coordination over structured interfaces. The payoff cited is material—hours to seconds processing, 10x write speedups via Apache Arrow, and cheaper entity matching at Netflix-like scale—though governance remains the bottleneck. The warning is sober: valid JSON does not equal correct facts; evaluation and monitoring are still must-haves (more: https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc).

Two fresh research artifacts push this discipline forward. First, Memp formalizes procedural memory for agents: instead of relearning workflows each time, agents distill past trajectories into reusable, two-level scripts (fine-grained steps and higher-level plans), retrieve by similarity, and update continuously as environments change (modeled as an MDP) (more: https://arxiv.org/abs/2508.06433v1). The paper details builders for memory units, retrieval via vector similarity, and maintenance functions that correct or deprecate stale memories—reducing token waste and latency for repeatable, long-horizon tasks (more: https://arxiv.org/abs/2508.06433v1).

Second, a memory-centric model, mem-agent, trains a compact 4B parameter model (Qwen3-4B-Thinking-2507 base) with GSPO over a scaffold that reads/writes an Obsidian-like markdown memory and interacts via tool APIs (file ops, directories, size, link navigation) (more: https://huggingface.co/driaforall/mem-agent). It organizes responses into , , and tags, executing code in a sandboxed loop. On the authors’ md-memory-bench with an o3 judge, it trails only a much larger Qwen variant and outperforms several closed and open contenders on an overall score—promising, especially as a Model Context Protocol (MCP) server in a bigger system (more: https://huggingface.co/driaforall/mem-agent).

At the framework layer, Lattice Agent (Go) codifies the agent plumbing many teams rebuild ad hoc: graph-aware, RAG-backed memory with importance scoring and maximal marginal relevance; adapters for Gemini, Anthropic, and Ollama; and first-class Universal Tool Calling Protocol (UTCP) tools that become reusable modules across agents and teams (more: https://github.com/Raezil/lattice-agent). Beyond portability, its focus on memory stores (in-memory, PostgreSQL+pgvector, MongoDB, Neo4j, Qdrant) and automatic pruning addresses a production pain point: keeping context useful, bounded, and fast (more: https://github.com/Raezil/lattice-agent).

Drawing thoughts boosts multimodal reasoning

Multimodal LLMs can see; reasoning about what they see is harder. Latent Sketchpad proposes giving models an internal visual scratchpad—interleaving text reasoning with generated visual latents rendered as sketches via a pretrained decoder (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/). Two key parts make it tick: a Context-Aware Vision Head that autoregressively produces these visual representations and a Sketch Decoder that turns latents into human-interpretable drawings (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/).

Tested on their MazePlanning dataset and across frontier MLLMs (including Gemma3 and Qwen2.5-VL), the approach matches or beats the base models on reasoning, with the important bonus of interpretability—visual thoughts one can inspect (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/). The framing aligns with a broader trend: richer internal tools (memory, planners, sketches) that help models do less guessing and more deliberate problem solving.

Dynamic MIDI AI

A solo developer building a multi-LLM text adventure hit a familiar snag: if the world is open-ended, static soundtracks break immersion. The solution: train a small, local MIDI generator as a dynamic composer (about 20M parameters; ~140 MB in full precision, aiming for fp16 ONNX in-game) (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/). The 5-stage curriculum starts unconditional, evolves from mode collapse to simple melody/harmony, then learns velocity/rests and branches into “jazzy” vs. “lyrical,” and finally faces a multi-instrument dataset that initially yields an “instrument salad” before discovering orchestration and counterpoint (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/). The design goal is explicit: the smallest working model that can run locally alongside other small models (TTS, etc.), controlled by another LLM’s prompt encoder (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/).

On the release front, NVIDIA’s Audio Flamingo 3 is now on Hugging Face in safetensors, based on Qwen3.5 and Whisper (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/). Community questions immediately turned to practicality: can it be converted to GGUF/MLX and supported in llama.cpp? The answer depends on graph/layout support in target runtimes, but safetensors availability lowers friction for downstream experimentation (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/).

Open safety models spark debate

OpenAI released gpt-oss-safeguard, two open-weight safety reasoning models (120B and 20B) fine-tuned from gpt-oss under Apache 2.0, built to interpret custom policies and classify messages, responses, and conversations (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). The models are on Hugging Face, with a guide for prompting and policy authoring (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). Community response spans excitement and eye-rolls: some see a free, capable automoderation engine; others question latency costs of 20B+ classifiers and note that smaller guards were once the norm (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). Practicality improved quickly: Unsloth posted GGUF and BF16 builds for both sizes, easing local deployment (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/).

Even with larger guardrails, usage context matters. One commenter framed it as automoderation (“policies” as community rules), while others mused about adapting such models as game masters with rules-aware flexibility—ambitious, but a reminder that safety classifiers excel when policies are crisp and evaluative, not creative (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). In parallel, briaai surfaced FIBO on Hugging Face; details are sparse in the provided excerpt, but it’s another sign of the rapid cadence of model releases landing on open hubs (more: https://huggingface.co/briaai/FIBO).

Bending hardware to run giants

A hands-on guide shows OpenAI’s gpt-oss-120B (a 117B-parameter Mixture-of-Experts) running at ~19 tokens/sec on an AMD RX 6900 XT 16 GB under Ubuntu 24.04 LTS and ROCm 6.2.4 via llama.cpp’s ROCm backend—thanks to MoE’s ~5.1B active parameters per token and meticulous CPU/GPU offloading, thread tuning, and AVX-512-enabled builds (95% VRAM utilization reported) (more: https://www.reddit.com/r/LocalLLaMA/comments/1ofxt6s/optimizing_gptoss120b_on_amd_rx_6900_xt_16gb/). It’s a useful counterweight to “you need an H100” assumptions: architecture-aware deployment can make frontier-scale models usable on consumer hardware, within bounds (more: https://www.reddit.com/r/LocalLLaMA/comments/1ofxt6s/optimizing_gptoss120b_on_amd_rx_6900_xt_16gb/).

Format hygiene is part of the story. Audio Flamingo 3’s release in safetensors triggered questions about conversion into GGUF/MLX and llama.cpp support—indicative of the community’s push to standardize runtimes and file formats for portability across desktops and mobile (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/). The open ecosystem continues to converge on a handful of formats, but conversion feasibility still depends on model graph quirks and runtime maturity.

Local-first AI apps consolidate

Jan, an open-source desktop ChatGPT alternative, now runs Ollama models without shuffling model folders: add an Ollama provider and point to http://localhost:11434/v1 in settings (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/). Users noted they’d long connected clients to Ollama; the team clarified details around remote access: to reach a non-localhost Ollama instance, enable CORS and bind properly (e.g., OLLAMA_ORIGINS=* and OLLAMA_HOST=0.0.0.0), and remember the /v1 path—alongside the obvious security caveat to avoid exposing broad origins on shared networks (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/). Jan’s llama.cpp extension supports Vulkan, and the team contrasted its ChatGPT-like simplicity with OpenWebUI/LM Studio, while acknowledging current gaps like custom headers and emphasizing a memory roadmap (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/).

ClaraVerse takes consolidation further, pitching a single, fully local, privacy-first AI workspace that bundles Llama.cpp chat, multi-provider support, tool calling, agent building, Stable Diffusion, and n8n-style automation—no backend or API keys required (more: https://github.com/claraverse-space/ClaraVerse). It exposes a unified API over a React UI and “Clara Core,” ties into ComfyUI, N8N, and an MCP ecosystem, and embraces zero telemetry and offline operation—while candidly noting active development, incomplete docs, and a permissive “use it, modify it, sell it” license stance (more: https://github.com/claraverse-space/ClaraVerse).

Agent plumbing is also becoming more standardized in editors. A developer exploring Codex (VS Code extension) asked whether to wire Supabase via the Supabase MCP server or install Supabase CLI so Codex can drive it directly—reportedly saving tokens—alongside how “--read-only” affects new-project setup (you’d expect it to block creation) (more: https://www.reddit.com/r/ChatGPTCoding/comments/1ofxw8i/codex_and_supabase/). As a reminder for readers: MCP here is Model Context Protocol, and consistent tooling contracts are exactly what keeps multi-agent, tool-rich setups from devolving into prompt spaghetti (more: https://www.reddit.com/r/ChatGPTCoding/comments/1ofxw8i/codex_and_supabase/).

Privacy, security, and anti-tracking

Hashing is not anonymization. The FTC’s technology blog revisits a decade-old point with fresh cases: hashed email addresses, MAC addresses, device IDs, and ad IDs can still track or reidentify users; small identifier spaces are trivially brute-forceable; and claims of “non-identifiable” fall apart when platforms can match hashes back to people (more: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/07/no-hashing-still-doesnt-make-your-data-anonymous). Enforcement examples include Nomi (hashed MACs still uniquely identified shoppers), BetterHelp (hashed emails shared with Facebook allegedly reidentified mental-health seekers), Premom (ad IDs enabling cross-app tracking), and InMarket (device IDs tracking without informed consent) (more: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/07/no-hashing-still-doesnt-make-your-data-anonymous).

On the user-side defense stack, Tor Browser 15.0 ships with a Firefox ESR 140 base, improved tab management and address bar on desktop, and an Android screen lock—plus a new unified Tor User Support portal to centralize documentation (more: https://blog.torproject.org/new-release-tor-browser-150/). Recent updates also include Arti 1.6.0 with circuit padding and side-channel mitigations, and Tails 7.0/7.1 based on Debian 13 with refreshed application versions (more: https://blog.torproject.org/new-release-tor-browser-150/).

Developers should mark calendars: npm is rolling out security changes to authentication on October 13, 2025—90-day token lifetime caps and TOTP 2FA restrictions—with classic tokens being revoked in November, so CI/CD pipelines need updates (more: https://www.npmjs.com/package/strange-loops) (more: https://www.npmjs.com/package/aidefencet). On the offensive research side, ShareHound maps SMB share permissions into BloodHound OpenGraphs, with multithreaded BFS discovery and customizable rule matching—plus ready-to-use Cypher queries to find write/full-control access and file types by extension (more: https://github.com/p0dalirius/ShareHound). It’s a crisp illustration of how graph representations make complex permission relationships tractable—and why defenders should assume adversaries have this lens too (more: https://github.com/p0dalirius/ShareHound).

Compute limits and hacker culture

Regardless of model speedups, the physical world pushes back. US gas producers say there’s enough fuel for AI data centers, but a shortage of gas turbines could slow demand growth and raise costs, with utilities warning that pipeline constraints risk blackouts and price spikes (more: https://www.energyintel.com/00000199-e958-dd61-abb9-ebdf44e10000). Oil-field services firms are eyeing distributed data center power as an adjacent opportunity, while Southeast Asia’s LNG-to-power plans may feel similar turbine supply pressure (more: https://www.energyintel.com/00000199-e958-dd61-abb9-ebdf44e10000). The lesson: AI’s growth curve is increasingly gated by energy infrastructure, not just chips.

Meanwhile, hacker culture keeps hardware honest. Hackaday’s Supercon 2025 badge is intentionally customizable: the front is a second PCB with no electronics, solely to press the keyboard membrane—meaning attendees can replace it with CNC’d aluminum, laser-cut wood, or multi-material 3D prints using provided STEP/DXF/SVG files (more: https://hackaday.com/2025/10/27/the-supercon-2025-badge-is-built-to-be-customized/). Practical constraints are explicit—~2 mm spacing between boards, ~1.7 mm thickness near the keyboard, and M3 hardware to mount—so the platform remains hackable without becoming fragile (more: https://hackaday.com/2025/10/27/the-supercon-2025-badge-is-built-to-be-customized/). It’s a fitting counterpoint to the datacenter: local, tangible, modifiable, and fun.

Sources (20 articles)

  1. [Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc (www.linkedin.com)
  2. [Editorial] https://www.npmjs.com/package/strange-loops (www.npmjs.com)
  3. [Editorial] https://github.com/claraverse-space/ClaraVerse (github.com)
  4. [Editorial] AIF? (www.npmjs.com)
  5. Optimizing gpt-oss-120B on AMD RX 6900 XT 16GB: Achieving 19 tokens/sec (www.reddit.com)
  6. OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) (www.reddit.com)
  7. Flamingo 3 released in safetensors (www.reddit.com)
  8. My LLM-powered text adventure needed a dynamic soundtrack, so I'm training a MIDI generation model to compose it on the fly. Here's a video of its progress so far. (www.reddit.com)
  9. Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs (www.reddit.com)
  10. You can now run Ollama models in Jan (www.reddit.com)
  11. Codex and Supabase (www.reddit.com)
  12. p0dalirius/ShareHound (github.com)
  13. Raezil/lattice-agent (github.com)
  14. FTC: No, hashing still doesn't make your data anonymous (www.ftc.gov)
  15. Tor Browser 15.0 (blog.torproject.org)
  16. US Gas Turbine Shortage Likely to Slow AI Demand Growth (www.energyintel.com)
  17. briaai/FIBO (huggingface.co)
  18. driaforall/mem-agent (huggingface.co)
  19. The Supercon 2025 Badge is Built to be Customized (hackaday.com)
  20. Memp: Exploring Agent Procedural Memory (arxiv.org)