Platform Security & Offense Tooling

Published on May 22, 2026

Today's AI news: Platform Security & Offense Tooling, Censorship Circumvention & Network Privacy, Open-Weight Model Proliferation, AI Industry Consolidation & Monetization, Agent Architectures & Human-AI Collaboration, Research & Hacker Craft. 22 sources curated from across the web.

Platform Security & Offense Tooling

GitHub disclosed it is investigating unauthorized access to its internal repositories, stating: "We currently have no evidence of impact to customer information stored outside of GitHub's internal repositories (such as our customers' enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity" (more: https://twitter.com/github/status/2056884788179726685). The language is deliberately scoped — internal repos only, no customer data impact confirmed — but the public acknowledgment suggests the incident is non-trivial. Internal repositories at a platform hosting over 100 million developers contain source code, CI/CD pipeline definitions, deployment configurations, and infrastructure secrets. Anyone who has done incident response knows the difference between "no evidence of impact" and "confirmed no impact" — the former means the investigation is still running. Coming months after CVE-2026-3854, where a single git push could trigger unsandboxed code execution on GitHub storage nodes via the babeld proxy — giving access to millions of public and private repositories on compromised nodes — this is starting to look like a pattern rather than a one-off. The phrase "monitoring our infrastructure for follow-on activity" is incident response language for "we found the footprints but we're still mapping how far the intruder went."

Meanwhile, the offensive security tooling ecosystem around AI coding assistants hit another milestone. A new Claude Code plugin called hackingtool-plugin packages 183 pentesting and OSINT tools into a natural-language skill interface, wrapping the established Z4nzu/hackingtool project (more: https://github.com/AKCodez/hackingtool-plugin). The inventory spans 20+ categories: information gathering (Amass, nmap, Subfinder, TruffleHog), web attack tools (Nuclei, ffuf, Gobuster, Katana), post-exploitation frameworks (Havoc C2, Sliver, PEASS-ng), Active Directory tooling (BloodHound, Impacket, NetExec), cloud security scanners (Prowler, Pacu, Trivy), and mobile security (Frida, MobSF). Of the 183 tools, 56 are classified as plug-and-play; the other 127 depend on the execution environment. Every invocation routes through a central ht_run.py script that auto-selects the right backend — native Bash, WSL, or purpose-built Docker images from ProjectDiscovery, TruffleSecurity, and Kali Linux registries.

Users interact in plain English ("recon example.com," "scan my repo for vulnerabilities"), and Claude picks the tools and interprets output. This follows the exact trajectory Trail of Bits documented when they standardized on Claude Code for security research — 94 plugins, 201 tools, bug discovery jumping from 15/week to 200/week. Dedicated security plugins for AI coding assistants are becoming a distinct product category. The "authorized security testing only" disclaimer does the necessary legal work, but the barrier between authorized and unauthorized use is thin when the tooling is this accessible and the install is two commands.

Censorship Circumvention & Network Privacy

Two new open-source tools address the problem of knowing exactly how and where internet traffic is being interfered with. The more technically rigorous is rkn-block-checker, a Python CLI that diagnoses which layer of the network stack Russia's RKN/TSPU censorship infrastructure is targeting for each blocked domain (more: https://github.com/MayersScott/rkn-block-checker). The tool walks DNS → TCP → TLS → HTTP for each target, stopping at the first failure. This layer-by-layer approach distinguishes between DNS poisoning (the ISP's resolver lies), TCP reset (IP-level blackholing), TLS DPI on SNI (the middlebox reads the Server Name Indication from the ClientHello and tears the connection down), and HTTP stub pages (an ISP-served "blocked by Roskomnadzor" response).

What elevates this above a simple connectivity checker is its calibrated confidence system. Each verdict carries a confidence level: HIGH when two independent signals agree (DNS poisoning confirmed via Cloudflare DoH comparison), MEDIUM when a known censorship pattern matches but server-side issues cannot be excluded, and LOW for ambiguous symptoms. The DNS comparison uses full address sets from the system resolver versus Cloudflare DoH, flagging rewriting only when the sets are completely disjoint — avoiding false positives on CDN-heavy sites that rotate A records between queries. Privacy is baked in: the default User-Agent mimics Chrome-on-Windows after a contributor correctly identified that the original self-identifying UA was unique enough to appear in VPN provider logs that regulators routinely request. No telemetry, no exfiltration — the only outbound connections are the probes themselves and the DoH control queries. In example runs, the tool classifies Instagram as TLS_BLOCK (DPI on SNI), ProtonVPN as DNS_BLOCK (poisoned at the resolver level), and Rutracker as HTTP_STUB (redirected to an ISP block page) — exactly the kind of granular diagnosis that tells a user which circumvention approach will actually work.

GooseRelayVPN's Android client takes a more direct route to circumvention, tunneling SOCKS5 traffic through Google-facing HTTPS endpoints via Apps Script — local traffic → SOCKS5 → AES-encrypted GooseRelay framing → HTTPS through Google infrastructure → VPS exit server (more: https://github.com/Hidden-Node/GooseRelayVPN-AndroidClient). By routing through Google's own endpoints, the traffic profile blends with legitimate API calls, making DPI classification harder. The Android implementation wraps this in a standard VpnService with profile-based configuration, split tunneling, and real-time telemetry — enough polish to suggest active deployment, not a weekend experiment.

Open-Weight Model Proliferation

The open-weight model ecosystem continues to absorb and redistribute frontier capabilities at accelerating speed. A new Hugging Face upload, Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled, makes the pattern explicit in its name: distill reasoning capabilities from Anthropic's flagship model into an open-weight Qwen variant that anyone can download and run locally (more: https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled). It joins a growing family of hybrid distillations — Qwopus3.5-9B-Coder-GGUF being another entry that blends Qwen and Opus-style reasoning into a compact coding-focused model (more: https://www.reddit.com/r/LocalLLaMA/comments/1tfin40/jackrongqwopus359bcodergguf_hugging_face/). When the US government flagged adversarial distillation in a policy memo earlier this year, warning that state actors could distill capabilities from frontier models into smaller, ungoverned ones, the concern was precisely this — except the distillation is happening openly, by hobbyists, on a public model hub. The policy conversation assumed nation-state adversaries; the reality is a college student with a Hugging Face account.

The throughput story is equally consequential. Multi-token prediction (MTP) is emerging as the single most impactful inference optimization for local deployments. Community benchmarks comparing NTP versus MTP quantization on Qwen 3.6 35B across GPUs and CPUs are showing the kind of gains that change deployment economics — earlier results on AMD Strix Halo demonstrated the MTP variant doubling decode throughput to 80 tokens/second (more: https://www.reddit.com/r/LocalLLaMA/comments/1tipihx/qwen_36_35b_gguf_ntp_vs_mtp_quantization_results/). The momentum is broad: Gemma 4 MTP support is now in active development (more: https://www.reddit.com/r/LocalLLaMA/comments/1tijpwl/wip_gemma_4_mtp/), and AMD's Lemonade inference engine has shipped v10.5.1 with MTP and ROCm 7.13 quick-start specifically targeting Strix Halo hardware (more: https://www.reddit.com/r/LocalLLaMA/comments/1th0z6k/lemonade_v1051_an_mtp_rocm_713_quick_start_for/). MTP is converging on table stakes rather than an experimental feature, and the quantization-plus-MTP combination means a 35B parameter model can deliver interactive-speed inference on hardware that costs under $2,500.

The uncensored model cottage industry is keeping pace with new architectures. Two new Gemma 4 31B "heretic" variants — Ortenzya-The-Creative-Wordsmith (more: https://www.reddit.com/r/LocalLLaMA/comments/1tf52m2/gemma4ortenzyathecreativewordsmith31bituncensoredh/) and Gembrain (more: https://www.reddit.com/r/LocalLLaMA/comments/1tg7s7j/gemma4gembrain31bituncensoredheretic_is_out_now_a/) — join a growing lineup of guardrail-removed variants. Gemma 4 continues to abliterate cleanly, unlike some chain-of-thought architectures that resist the technique entirely. The practical consequence of all this shows up in developer behavior: a self-described newbie vibe coder documented shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B locally (more: https://www.reddit.com/r/LocalLLaMA/comments/1ti1f2k/newbie_vibe_coding_experience_shifting_from/), joining a pattern where earlier reports showed developers routing 65% of coding tasks to local models with quality parity above 88%, dropping monthly API costs from $85 to $22. The migration from cloud to local is no longer just a privacy or cost argument — for a growing set of tasks, the quality gap has functionally closed.

AI Industry Consolidation & Monetization

Google has officially announced that ads will appear in AI Mode search results, deploying two new Gemini-generated ad formats: Conversational Discovery ads (the ad answers a user's specific question with tailored creative) and Highlighted Answers (ads inserted into AI Mode's recommendation lists) (more: https://blog.google/products/ads-commerce/google-marketing-live-search-ads/). Both formats feature what Google calls an "independent AI explainer" — Gemini synthesizes context about a product alongside the advertiser's creative, ostensibly for transparency. Beyond AI Mode, Google is launching AI-powered Shopping ads where Gemini writes custom product explainers, a Business Agent for Leads that replaces static lead forms with a Gemini chatbot inside the ad unit, and expanding the Direct Offers pilot with promotion bundling, native checkout via Universal Commerce Protocol, and travel integrations with Booking and Expedia. The 75% stat Google cites — people making "faster, more confident decisions" in AI Mode — becomes a liability metric when the mode starts blending editorial and sponsored content at the model level. An "independent AI explainer" generated by the same company selling the ad placement is a redefinition of independence that would make any compliance officer raise an eyebrow.

Mistral AI is making a different kind of bet, acquiring Emmi AI to build what it calls "the leading AI stack for Industrial Engineering" (more: https://www.emmi.ai/news/mistral-ai-acquires-emmi-ai). Emmi, founded in Linz, develops Physics AI models that accelerate industrial simulation across energy, automotive, semiconductors, and aerospace. The team of 30+ researchers and engineers joins Mistral's Science and Applied AI groups, and Linz becomes an official Mistral office alongside Paris, London, and San Francisco. Emmi had raised EUR 15 million — the largest seed round by an Austrian startup — and published AB-UPT, an architecture scaling neural surrogates for computational fluid dynamics to problems with 100M+ mesh cells. CEO Arthur Mensch called it a play for "manufacturers in high-stakes sectors like aerospace, automotive, or semiconductors." Chief Science Officer Guillaume Lample framed it more ambitiously: "By engineering the first comprehensive AI stack fueled by Physics AI, we are set to deliver real-time simulations and sophisticated digital twins." Emmi's published work includes NeuralDEM, an open-source deep learning alternative to CFD-DEM multiphysics simulations that runs at production scale in real time. The signal is clear: Mistral is positioning as a vertical industrial AI platform, not just another foundation model shop, and acquiring a team with peer-reviewed physics AI credentials gives the claim substance.

Google's developer tooling strategy also underwent a significant shift: Gemini CLI will stop serving requests on June 18, 2026, replaced by Antigravity CLI as part of the new Google Antigravity platform (more: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/). The blog post frames the transition in terms of multi-agent reality — "your workflows have simply outgrown those early days of 2025" and developers "now require multiple agents communicating with each other." Antigravity CLI carries over Agent Skills, Hooks, Subagents, and Extensions (renamed to plugins), but free and Pro/Ultra users lose access entirely on June 18. The 100,000+ GitHub stars and 6,000+ merged PRs on Gemini CLI represent real community investment that Google is asking developers to migrate on four weeks' notice — a timeline that will leave some shops scrambling.

Agent Architectures & Human-AI Collaboration

A research paper from Barcelona Supercomputing Center and NTT DATA introduces CIF (Collaborative Innovation Framework), a workflow architecture that addresses a real gap in HPC environments: how to include human oversight in compute-intensive AI workflows without wasting supercomputer time waiting for someone to review an intermediate result (more: https://arxiv.org/abs/2605.03743v1). CIF defines workflows declaratively in TOML configuration files, with human-in-the-loop (HITL) checkpoints treated as first-class workflow tasks. The innovation is that these checkpoints are asynchronous and non-blocking — when a workflow hits a human review point, only that branch pauses while other HPC jobs continue executing on SLURM-managed clusters. The architecture is modular: a TaskScheduler parses the TOML and resolves dependencies, a TaskExecutor handles I/O and environment setup across execution sites, a Watcher polls for completion via lightweight .done marker files, and a JobRegistry maintains global state across all tasks. The framework supports execution across tactical edge devices, HPC clusters, and cloud infrastructure, using Docker and Singularity containers for reproducibility.

The validation is concrete: ship detection in maritime surveillance, with inference running on a tactical field unit and model retraining on the MN5 supercomputer at BSC. When a supervisor rejected false negatives at a HITL checkpoint, CIF automatically scheduled retraining on MN5 without interrupting concurrent jobs. The retrained model detected multiple ships the baseline had missed. A systematic comparison against nine established workflow frameworks (Airflow, Luigi, Prefect, Nextflow, Snakemake, Argo, COMPSs, Pegasus, Kepler) found that CIF is the only one combining configuration-based specification, asynchronous HITL, and hybrid HPC execution. Funded by the European Defence Fund under the KOIOS Project, the framework is clearly aimed at military and security applications where human oversight is non-negotiable but compute budgets are tight.

In more practical territory, one LocalLLaMA user describes deploying a simple multi-agent architecture across an entire organization, keeping everything coordinated through local models (more: https://www.reddit.com/r/LocalLLaMA/comments/1thm9ek/simple_multiagent_architecture_running_across_our/). OpenMelon takes the agent concept in a creative direction — a Go-based terminal TUI that turns content production into an LLM tool-use loop with persistent character libraries, reference images, and creative continuity across sessions (more: https://github.com/eight-acres-lab/openmelon). The tool maintains character portraits across runs via a tag-based search system (grep, not vectors), supports multiple LLM providers (OpenRouter, OpenAI, Anthropic), and gates shell commands through three permission modes (strict, auto, trusted). Its creative continuity model — durable canon, episodes, assets, and retrieval — is designed for sustained projects rather than one-shot generation. Meanwhile, GPT 5.5 (Codex) is reportedly leading future prediction benchmarks, continuing OpenAI's push toward agentic code generation as a competitive differentiator (more: https://www.reddit.com/r/OpenAI/comments/1tevkye/gpt_55_codex_leading_the_future_prediction_race/).

Research & Hacker Craft

A researcher claims to have discovered a new semi-supervised learning algorithm by orchestrating four LLMs in a collaborative discovery process — using the models not as coding assistants but as independent reasoning agents converging on a novel method (more: https://www.reddit.com/r/learnmachinelearning/comments/1tj74qe/discovered_new_ssl_algorithm_with_help_of_4_llms/). The approach inverts the typical LLM-as-tool pattern: rather than asking one model to implement a known technique, multiple models are set up as a committee that explores, critiques, and refines ideas across the hypothesis space. If the claim holds up under peer review, it marks a shift from "AI accelerates known research" to "AI participates in discovery." On the local inference front, llama.cpp's webui has gained video file support via PR #22830, extending the project's multimodal capabilities beyond images into video input — territory previously reserved for cloud APIs (more: https://www.reddit.com/r/LocalLLaMA/comments/1tfdkji/webui_support_video_files_as_input_by_foldl_pull/).

The week's most technically dense artifact is a detailed writeup of 16 bytes of x86 real-mode DOS assembly that simultaneously generate a visual Matrix rain effect and drive the PC speaker with the same underlying computation (more: https://hellmood.111mb.de//wake_up_16b_writeup.html). Released at the Outline Demoparty in Ommen, the program uses int 10h to set up 40×25 text mode, points the data segment to VGA memory at 0xB800, then enters a tight loop: lodsb reads a byte, sub si, byte 57 shifts the pointer backward, xor [si], al computes the fractal, and out 61h, al bangs the result directly into the speaker port. The mathematics are remarkable — carry-free XOR addition produces a Sierpinski triangle via Lucas's theorem, and bit 1 of each cell maps to Rule 60 in elementary cellular automata. The sub 57 step size means the code traverses 8,192 positions before wrapping, hitting only 10 distinct screen columns and creating the diagonal rain effect. The sound comes from the Sierpinski geometry itself plus whatever happens to be in the video ROM BIOS shadow memory — a "secret ingredient" that gives the output its distinctively gritty, punky character, where the program's environment literally shapes its voice.

Sources (22 articles)

GitHub is investigating unauthorized access to their internal repositories (twitter.com)
AKCodez/hackingtool-plugin (github.com)
MayersScott/rkn-block-checker (github.com)
Hidden-Node/GooseRelayVPN-AndroidClient (github.com)
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled (huggingface.co)
Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face (reddit.com)
Qwen 3.6 35B GGUF: NTP vs MTP quantization results across GPUs and CPUs (reddit.com)
[WIP] Gemma 4 MTP (reddit.com)
Lemonade v10.5.1: an MTP + ROCm 7.13 quick start for Strix Halo (reddit.com)
gemma-4-Ortenzya-The-Creative-Wordsmith-31B-it-uncensored-heretic is Out Now (reddit.com)
Gemma-4-Gembrain-31B-it-uncensored-heretic Is Out Now (reddit.com)
Newbie vibe coding experience: Shifting from Claude Sonnet 4.6 to Qwen3.6-35B-A3B-UD-Q6_K (reddit.com)
Google officially announces that ads will be included in AI Mode search results (blog.google)
Mistral AI acquires Emmi AI (emmi.ai)
Gemini CLI will stop working from June 18, 2026 (developers.googleblog.com)
A Workflow-Oriented Framework for Asynchronous Human-AI Collaboration in Hybrid and Compute-Intensive HPC Environments (arxiv.org)
Simple Multi-Agent Architecture Running Across Our Entire Org. Keeping everything in Loop. (reddit.com)
eight-acres-lab/openmelon (github.com)
GPT 5.5 (Codex) leading the future prediction race (reddit.com)
Discovered new semi-supervised learning algorithm by orchestrating 4 LLMs (reddit.com)
webui: support video files as input by foldl · Pull Request #22830 · ggml-org/llama.cpp (reddit.com)
WriteUp: 16 Bytes of x86 that turn Matrix rain into sound (hellmood.111mb.de)