AI Security Roadmap

I want to know how agents break, and how to make them hold. Not from papers. From a real one I built and run.

The method is simple. Build a thing, then try to take it apart. I poke at my own coaching agent until it slips, work out why, fix it in the code, and write down what held and what did not. Then I check the finding against what the field has found, so I am not just grading my own work. When a defense holds by accident, I say so. The point is to find the ones that do not hold.

The OWASP Agentic Top 10 — goal hijack, tool misuse, memory poisoning, on down to rogue agents — is the frame I hold this work against. Each writeup below picks off one of those failure modes on a real agent I built and run.

The graph below is generated live from the writeups themselves. This page sits at the center. The hubs around it are the exploration categories, one per failure mode I keep pulling on: prompt injection, memory poisoning, tool misuse, cost exhaustion, and isolation, plus the method posts that tie them together. Every node hanging off a hub is a post that lands in that category, and where two posts reference each other you will see the edge drawn straight between them. Drag to pan, scroll to zoom, hover a node to trace its neighbors. It redraws itself every time I publish, so it always shows exactly where the work actually is.

Jones Codes

All posts

AI Security

Projects

AI Security Roadmap

The map

Graph View

Recent Posts

Posts

Every Model Could Already Do It

First Experience Using Antigravity

The Fix I Didn't Write

A Screenshot Is Not a Test