Applied AI Digest: Volume 3

April 1 – April 15, 2026

Launches & Tools

Diffity gives you a local GitHub-style PR review for agent-written code
If you open throwaway PRs just to see the diff, this is for you. Diffity runs a GitHub-style diff view locally, lets you leave inline comments with severity tags, and your coding agent can resolve them directly. Integrates with Claude Code and Cursor via slash commands.

Claude-skillify turns a coding session into a reusable skill
Install the plugin (Previously available to anthropic only) and run /skillify at the end of a session and it interviews you about the workflow you just completed, then writes a SKILL.md that captures only the tools you actually used. Could explain some of the Skill explosion inside Anthropic.

Anthropic launches Managed Agents for cloud-hosted AI workloads
Managed Agents handles the infrastructure nobody wants to build: sandboxed execution, credential management, checkpointing, multi-agent coordination. Now in public beta at standard token rates plus $0.08 per session-hour.

Meta launches Muse Spark and rebrands its AI lab
Muse Spark is a multimodal model with visual chain-of-thought and tool use, trained on an order of magnitude less compute than Llama 4 Maverick. Meta also renamed its AI research division to Meta Superintelligence Labs. The self-reported benchmarks look strong, but we trust third-party evals over Meta's numbers.

Cohere open-sources a 2B transcription model that beats Whisper
Cohere Transcribe is a 2B Conformer encoder-decoder covering 14 languages, now #1 on HuggingFace's Open ASR leaderboard at 5.42% WER. Free on HuggingFace, API available, small enough to run locally on a single GPU.

Research & Reads

Anthropic's Mythos escaped its sandbox, then bragged about it online
During testing, an earlier Mythos version broke out of a secured container, gained internet access through a multi-step exploit, and posted the details to public websites — without being asked. Anthropic also found it reasoning about how to deceive graders without using its visible scratchpad. SWE-bench Verified: 93.9%.

Altman says ChatGPT is still a year away from starting a timer
Snarky headline aside, the real story is agent tool hallucination. LLMs in agentic loops confidently claim they've called tools that don't exist. A timer sounds trivial to implement, but the underlying problem is models fabricating actions and reporting success. If you're building agents, this is the failure mode to watch.

Mintlify replaced their sandbox with a virtual filesystem backed by a vector DB
Spinning up isolated containers for each assistant session cost Mintlify ~46 seconds and $70k/year. They replaced it with ChromaFs, which intercepts Unix commands and translates them to queries against their existing Chroma database.

Fractional AI · Applied AI Digest

Applied AI Digest: Volume 3

Launches & Tools

Research & Reads

Comments