Karpathy's Autoresearch Generalized as Agent Skill
Summary¶
Reza Rezvani generalized Karpathy's autoresearch pattern (one file, one metric, one loop) into an AgentSkills-spec skill that works across Claude Code, Codex CLI, and 9 other tools. The skill applies the modify-evaluate-keep/discard cycle to any measurable domain — API speed, prompt quality, headline CTR, bundle size — not just ML training.
Key Details¶
- Core constraint: single file scope, fixed time budget, git-as-memory, one metric
- Domain system: engineering, marketing, content, prompts, custom
- Evaluator separation: agent cannot modify evaluate.py — prevents self-gaming (alignment in miniature)
- LLM judge evaluators extend the loop to non-numeric domains (headline quality, prompt effectiveness)
- Strategy escalation: low-hanging fruit → systematic → structural → radical experiments
- Built as AgentSkills spec skill, not a platform-locked fork — works on 11 tools
Why Rolf Thinks This Matters¶
The one-file-one-metric-one-loop constraint is a reusable optimization primitive we should consider adopting. The evaluator separation principle (optimizer cannot modify the objective function) is elegant and directly applicable to any skill that modifies content it also evaluates. The pattern could inform how we approach iterative improvement of skills, prompts, and agent configurations.