Ontologies and procedural memory rise

Published on October 31, 2025

Ontologies and procedural memory rise

AI agents are getting smarter not by luck, but by structure. A widely shared industry take argues that agents “can’t reason without semantic structure,” and that production systems fail when they skip enforced schemas and ontologies in favor of free-form text (more: https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc). The prescription is layered: generation-time structured outputs (not just parse-time validation), an ontology-backed knowledge graph for entity relationships and constraints, orchestration logic grounded in business rules, and multi-agent coordination over structured interfaces. The payoff cited is material—hours to seconds processing, 10x write speedups via Apache Arrow, and cheaper entity matching at Netflix-like scale—though governance remains the bottleneck. The warning is sober: valid JSON does not equal correct facts; evaluation and monitoring are still must-haves (more: https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc).

Two fresh research artifacts push this discipline forward. First, Memp formalizes procedural memory for agents: instead of relearning workflows each time, agents distill past trajectories into reusable, two-level scripts (fine-grained steps and higher-level plans), retrieve by similarity, and update continuously as environments change (modeled as an MDP) (more: https://arxiv.org/abs/2508.06433v1). The paper details builders for memory units, retrieval via vector similarity, and maintenance functions that correct or deprecate stale memories—reducing token waste and latency for repeatable, long-horizon tasks (more: https://arxiv.org/abs/2508.06433v1).

Second, a memory-centric model, mem-agent, trains a compact 4B parameter model (Qwen3-4B-Thinking-2507 base) with GSPO over a scaffold that reads/writes an Obsidian-like markdown memory and interacts via tool APIs (file ops, directories, size, link navigation) (more: https://huggingface.co/driaforall/mem-agent). It organizes responses into , , and tags, executing code in a sandboxed loop. On the authors’ md-memory-bench with an o3 judge, it trails only a much larger Qwen variant and outperforms several closed and open contenders on an overall score—promising, especially as a Model Context Protocol (MCP) server in a bigger system (more: https://huggingface.co/driaforall/mem-agent).

At the framework layer, Lattice Agent (Go) codifies the agent plumbing many teams rebuild ad hoc: graph-aware, RAG-backed memory with importance scoring and maximal marginal relevance; adapters for Gemini, Anthropic, and Ollama; and first-class Universal Tool Calling Protocol (UTCP) tools that become reusable modules across agents and teams (more: https://github.com/Raezil/lattice-agent). Beyond portability, its focus on memory stores (in-memory, PostgreSQL+pgvector, MongoDB, Neo4j, Qdrant) and automatic pruning addresses a production pain point: keeping context useful, bounded, and fast (more: https://github.com/Raezil/lattice-agent).

Drawing thoughts boosts multimodal reasoning

Multimodal LLMs can see; reasoning about what they see is harder. Latent Sketchpad proposes giving models an internal visual scratchpad—interleaving text reasoning with generated visual latents rendered as sketches via a pretrained decoder (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/). Two key parts make it tick: a Context-Aware Vision Head that autoregressively produces these visual representations and a Sketch Decoder that turns latents into human-interpretable drawings (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/).

Tested on their MazePlanning dataset and across frontier MLLMs (including Gemma3 and Qwen2.5-VL), the approach matches or beats the base models on reasoning, with the important bonus of interpretability—visual thoughts one can inspect (more: https://www.reddit.com/r/LocalLLaMA/comments/1oivkg2/latent_sketchpad_sketching_visual_thoughts_to/). The framing aligns with a broader trend: richer internal tools (memory, planners, sketches) that help models do less guessing and more deliberate problem solving.

Dynamic MIDI AI

A solo developer building a multi-LLM text adventure hit a familiar snag: if the world is open-ended, static soundtracks break immersion. The solution: train a small, local MIDI generator as a dynamic composer (about 20M parameters; ~140 MB in full precision, aiming for fp16 ONNX in-game) (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/). The 5-stage curriculum starts unconditional, evolves from mode collapse to simple melody/harmony, then learns velocity/rests and branches into “jazzy” vs. “lyrical,” and finally faces a multi-instrument dataset that initially yields an “instrument salad” before discovering orchestration and counterpoint (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/). The design goal is explicit: the smallest working model that can run locally alongside other small models (TTS, etc.), controlled by another LLM’s prompt encoder (more: https://www.reddit.com/r/LocalLLaMA/comments/1og8jil/my_llmpowered_text_adventure_needed_a_dynamic/).

On the release front, NVIDIA’s Audio Flamingo 3 is now on Hugging Face in safetensors, based on Qwen3.5 and Whisper (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/). Community questions immediately turned to practicality: can it be converted to GGUF/MLX and supported in llama.cpp? The answer depends on graph/layout support in target runtimes, but safetensors availability lowers friction for downstream experimentation (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/).

Open safety models spark debate

OpenAI released gpt-oss-safeguard, two open-weight safety reasoning models (120B and 20B) fine-tuned from gpt-oss under Apache 2.0, built to interpret custom policies and classify messages, responses, and conversations (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). The models are on Hugging Face, with a guide for prompting and policy authoring (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). Community response spans excitement and eye-rolls: some see a free, capable automoderation engine; others question latency costs of 20B+ classifiers and note that smaller guards were once the norm (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). Practicality improved quickly: Unsloth posted GGUF and BF16 builds for both sizes, easing local deployment (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/).

Even with larger guardrails, usage context matters. One commenter framed it as automoderation (“policies” as community rules), while others mused about adapting such models as game masters with rules-aware flexibility—ambitious, but a reminder that safety classifiers excel when policies are crisp and evaluative, not creative (more: https://www.reddit.com/r/LocalLLaMA/comments/1oj32pd/openai_gptosssafeguard_two_openweight_reasoning/). In parallel, briaai surfaced FIBO on Hugging Face; details are sparse in the provided excerpt, but it’s another sign of the rapid cadence of model releases landing on open hubs (more: https://huggingface.co/briaai/FIBO).

Bending hardware to run giants

A hands-on guide shows OpenAI’s gpt-oss-120B (a 117B-parameter Mixture-of-Experts) running at ~19 tokens/sec on an AMD RX 6900 XT 16 GB under Ubuntu 24.04 LTS and ROCm 6.2.4 via llama.cpp’s ROCm backend—thanks to MoE’s ~5.1B active parameters per token and meticulous CPU/GPU offloading, thread tuning, and AVX-512-enabled builds (95% VRAM utilization reported) (more: https://www.reddit.com/r/LocalLLaMA/comments/1ofxt6s/optimizing_gptoss120b_on_amd_rx_6900_xt_16gb/). It’s a useful counterweight to “you need an H100” assumptions: architecture-aware deployment can make frontier-scale models usable on consumer hardware, within bounds (more: https://www.reddit.com/r/LocalLLaMA/comments/1ofxt6s/optimizing_gptoss120b_on_amd_rx_6900_xt_16gb/).

Format hygiene is part of the story. Audio Flamingo 3’s release in safetensors triggered questions about conversion into GGUF/MLX and llama.cpp support—indicative of the community’s push to standardize runtimes and file formats for portability across desktops and mobile (more: https://www.reddit.com/r/LocalLLaMA/comments/1ohh481/flamingo_3_released_in_safetensors/). The open ecosystem continues to converge on a handful of formats, but conversion feasibility still depends on model graph quirks and runtime maturity.

Local-first AI apps consolidate

Jan, an open-source desktop ChatGPT alternative, now runs Ollama models without shuffling model folders: add an Ollama provider and point to http://localhost:11434/v1 in settings (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/). Users noted they’d long connected clients to Ollama; the team clarified details around remote access: to reach a non-localhost Ollama instance, enable CORS and bind properly (e.g., OLLAMA_ORIGINS=* and OLLAMA_HOST=0.0.0.0), and remember the /v1 path—alongside the obvious security caveat to avoid exposing broad origins on shared networks (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/). Jan’s llama.cpp extension supports Vulkan, and the team contrasted its ChatGPT-like simplicity with OpenWebUI/LM Studio, while acknowledging current gaps like custom headers and emphasizing a memory roadmap (more: https://www.reddit.com/r/ollama/comments/1oixco7/you_can_now_run_ollama_models_in_jan/).

ClaraVerse takes consolidation further, pitching a single, fully local, privacy-first AI workspace that bundles Llama.cpp chat, multi-provider support, tool calling, agent building, Stable Diffusion, and n8n-style automation—no backend or API keys required (more: https://github.com/claraverse-space/ClaraVerse). It exposes a unified API over a React UI and “Clara Core,” ties into ComfyUI, N8N, and an MCP ecosystem, and embraces zero telemetry and offline operation—while candidly noting active development, incomplete docs, and a permissive “use it, modify it, sell it” license stance (more: https://github.com/claraverse-space/ClaraVerse).

Agent plumbing is also becoming more standardized in editors. A developer exploring Codex (VS Code extension) asked whether to wire Supabase via the Supabase MCP server or install Supabase CLI so Codex can drive it directly—reportedly saving tokens—alongside how “--read-only” affects new-project setup (you’d expect it to block creation) (more: https://www.reddit.com/r/ChatGPTCoding/comments/1ofxw8i/codex_and_supabase/). As a reminder for readers: MCP here is Model Context Protocol, and consistent tooling contracts are exactly what keeps multi-agent, tool-rich setups from devolving into prompt spaghetti (more: https://www.reddit.com/r/ChatGPTCoding/comments/1ofxw8i/codex_and_supabase/).

Privacy, security, and anti-tracking

Hashing is not anonymization. The FTC’s technology blog revisits a decade-old point with fresh cases: hashed email addresses, MAC addresses, device IDs, and ad IDs can still track or reidentify users; small identifier spaces are trivially brute-forceable; and claims of “non-identifiable” fall apart when platforms can match hashes back to people (more: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/07/no-hashing-still-doesnt-make-your-data-anonymous). Enforcement examples include Nomi (hashed MACs still uniquely identified shoppers), BetterHelp (hashed emails shared with Facebook allegedly reidentified mental-health seekers), Premom (ad IDs enabling cross-app tracking), and InMarket (device IDs tracking without informed consent) (more: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/2024/07/no-hashing-still-doesnt-make-your-data-anonymous).

On the user-side defense stack, Tor Browser 15.0 ships with a Firefox ESR 140 base, improved tab management and address bar on desktop, and an Android screen lock—plus a new unified Tor User Support portal to centralize documentation (more: https://blog.torproject.org/new-release-tor-browser-150/). Recent updates also include Arti 1.6.0 with circuit padding and side-channel mitigations, and Tails 7.0/7.1 based on Debian 13 with refreshed application versions (more: https://blog.torproject.org/new-release-tor-browser-150/).

Developers should mark calendars: npm is rolling out security changes to authentication on October 13, 2025—90-day token lifetime caps and TOTP 2FA restrictions—with classic tokens being revoked in November, so CI/CD pipelines need updates (more: https://www.npmjs.com/package/strange-loops) (more: https://www.npmjs.com/package/aidefencet). On the offensive research side, ShareHound maps SMB share permissions into BloodHound OpenGraphs, with multithreaded BFS discovery and customizable rule matching—plus ready-to-use Cypher queries to find write/full-control access and file types by extension (more: https://github.com/p0dalirius/ShareHound). It’s a crisp illustration of how graph representations make complex permission relationships tractable—and why defenders should assume adversaries have this lens too (more: https://github.com/p0dalirius/ShareHound).

Compute limits and hacker culture

Regardless of model speedups, the physical world pushes back. US gas producers say there’s enough fuel for AI data centers, but a shortage of gas turbines could slow demand growth and raise costs, with utilities warning that pipeline constraints risk blackouts and price spikes (more: https://www.energyintel.com/00000199-e958-dd61-abb9-ebdf44e10000). Oil-field services firms are eyeing distributed data center power as an adjacent opportunity, while Southeast Asia’s LNG-to-power plans may feel similar turbine supply pressure (more: https://www.energyintel.com/00000199-e958-dd61-abb9-ebdf44e10000). The lesson: AI’s growth curve is increasingly gated by energy infrastructure, not just chips.

Meanwhile, hacker culture keeps hardware honest. Hackaday’s Supercon 2025 badge is intentionally customizable: the front is a second PCB with no electronics, solely to press the keyboard membrane—meaning attendees can replace it with CNC’d aluminum, laser-cut wood, or multi-material 3D prints using provided STEP/DXF/SVG files (more: https://hackaday.com/2025/10/27/the-supercon-2025-badge-is-built-to-be-customized/). Practical constraints are explicit—~2 mm spacing between boards, ~1.7 mm thickness near the keyboard, and M3 hardware to mount—so the platform remains hackable without becoming fragile (more: https://hackaday.com/2025/10/27/the-supercon-2025-badge-is-built-to-be-customized/). It’s a fitting counterpoint to the datacenter: local, tangible, modifiable, and fun.

Sources (20 articles)

[Editorial] https://www.linkedin.com/posts/anthony-alcaraz-b80763155_ai-agents-cant-reason-without-semantic-structure-activity-7389222435906244608-QMMc (www.linkedin.com)
[Editorial] https://www.npmjs.com/package/strange-loops (www.npmjs.com)
[Editorial] https://github.com/claraverse-space/ClaraVerse (github.com)
[Editorial] AIF? (www.npmjs.com)
Optimizing gpt-oss-120B on AMD RX 6900 XT 16GB: Achieving 19 tokens/sec (www.reddit.com)
OpenAI: gpt-oss-safeguard: two open-weight reasoning models built for safety classification (Now on Hugging Face) (www.reddit.com)
Flamingo 3 released in safetensors (www.reddit.com)
My LLM-powered text adventure needed a dynamic soundtrack, so I'm training a MIDI generation model to compose it on the fly. Here's a video of its progress so far. (www.reddit.com)
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs (www.reddit.com)
You can now run Ollama models in Jan (www.reddit.com)
Codex and Supabase (www.reddit.com)
p0dalirius/ShareHound (github.com)
Raezil/lattice-agent (github.com)
FTC: No, hashing still doesn't make your data anonymous (www.ftc.gov)
Tor Browser 15.0 (blog.torproject.org)
US Gas Turbine Shortage Likely to Slow AI Demand Growth (www.energyintel.com)
briaai/FIBO (huggingface.co)
driaforall/mem-agent (huggingface.co)
The Supercon 2025 Badge is Built to be Customized (hackaday.com)
Memp: Exploring Agent Procedural Memory (arxiv.org)