SkillOpt: Microsoft's Framework for Training Agent Skills Like Neural Networks
Microsoft just released SkillOpt, and it crystallizes something agent builders have been converging on independently: the skill file is the highest-leverage optimization target in a frozen-model agent system.
The core insight is simple. If you can’t change model weights (and for most deployment scenarios, you can’t), the next best thing to optimize is the instructions you feed the model. SkillOpt formalizes this by treating a plain markdown file as a trainable parameter and applying the same optimization discipline used in weight training.
The Neural Network Analogy
The mapping is precise:
| Neural Network Training | SkillOpt Training |
|---|---|
| Weights | Skill document (markdown) |
| Gradients | Trajectory-derived add/delete/replace edits |
| Learning rate | Edit budget cap |
| Validation split | Held-out task set |
| Epochs | Optimization rounds with slow/meta updates |
A frozen model runs tasks with the current skill, producing scored trajectories. A separate optimizer model analyzes failures in minibatches, proposes structured edits, and ranks them under a budget. If the candidate skill improves performance on held-out validation, the edit is accepted. If not, it’s rejected and stored so the optimizer doesn’t repeat failed changes.
What Gets Deployed
The output is a single best_skill.md file, typically 300 to 2,000 tokens. No weight changes. No extra inference-time calls. No special runtime dependencies.
The learned rules read like guidelines a thoughtful engineer would write after a day of debugging the benchmark — except they were discovered automatically through systematic optimization.
The Numbers
Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt matches or beats every competitor on all 52 evaluated cells. The competition includes human-written skills, one-shot LLM generation, Trace2Skill, TextGrad, GEPA, and EvoSkill.
On GPT-5.5 specifically:
- +23.5 points over no-skill baseline in direct chat
- +24.8 points inside the Codex agentic loop
- +19.1 points inside Claude Code
Transfer experiments show optimized skills retain value when moved across model scales, between Codex and Claude Code environments, and to adjacent math benchmarks without retraining.
Two Teams, Same Conclusion
SkillOpt isn’t the first system to optimize skill files. The Hermes Agent independently built the same concept through a combination of skill_manage, Curator, and an optimization loop called GEPA that scores, mutates, and promotes skill documents across runs.
Different architectures, different teams, same underlying insight: when you can’t touch the weights, the skill document becomes the natural place to accumulate learned behavior.
This convergence suggests we’re seeing a real pattern emerge in agent development, not just isolated experiments.
Practical Implications
For agent builders, SkillOpt provides a reproducible way to turn task feedback into better instructions. Instead of manually iterating on prompts based on intuition, you can run an optimization loop that systematically explores the edit space.
For agent frameworks that already use skill files (OpenClaw/Clawdbot, Claude Code with CLAUDE.md, Codex with AGENTS.md), SkillOpt offers a potential path to automatic skill improvement. Run tasks, collect scores, let the optimizer propose edits, validate on held-out examples, commit improvements.
The lightweight output (a markdown file under 2K tokens) also means optimized skills are portable. Train once, deploy anywhere that accepts text instructions.
Getting Started
git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .
Training requires:
- A benchmark config (SearchQA, ALFWorld, DocVQA, etc.)
- Data in train/val/test splits
- An LLM endpoint (Azure OpenAI, OpenAI, Anthropic, or local vLLM)
python scripts/train.py \
--config configs/searchqa/default.yaml \
--split_dir /path/to/your/data \
--optimizer_model gpt-5.5 \
--target_model gpt-5.5
The output lands in outputs/<run_name>/best_skill.md.
Links
- Paper: arxiv.org/abs/2605.23904
- GitHub: github.com/microsoft/SkillOpt
- Demo: YouTube walkthrough
The skill file just became a first-class citizen in the optimization loop. That’s worth paying attention to.