Self-improving Coding Agents

By Cryo Maverick · March 28, 2026 · 1 min read

In this article, I demonstrate how a more capable coding agent (Codex/GPT-5.4) successfully refined the prompt for a less capable agent (GitHub Copilot CLI/GPT-5-mini) using a custom evaluation tool called Katt. This "Test-Driven Agentic Workflow" (TDAW) increased passing evals from 0/3 to 3/3, but also significantly increased runtime and token usage, demonstrating that an agent can improve a prompt's effectiveness by clarifying instructions and fixing the evaluation process itself. Why this article now? I realized I haven't been very verbose about my AI research in relation to coding agents and the ability to extract accurate outputs, even with low-capability models. I've spent the last 3 years experimenting with ML and AI, and 1.5 years with coding agents, while using my engineering skills to create different harness strategies on top of them. With this more recent experiment, I felt like it was time to be more outspoken and share a bit of my learnings and journey with AI. So here co

Self-improving Coding Agents

Related Posts

Trending on ShareHub

Latest on ShareHub

Browse Topics

Around the Network