SkillOpt: Microsoft's Framework for Training Agent Skills Like Neural Networks

By Prahlad Menon Published 2026-05-30 3 min read

Microsoft just released SkillOpt, and it crystallizes something agent builders have been converging on independently: the skill file is the highest-leverage optimization target in a frozen-model agent system.

The core insight is simple. If you can’t change model weights (and for most deployment scenarios, you can’t), the next best thing to optimize is the instructions you feed the model. SkillOpt formalizes this by treating a plain markdown file as a trainable parameter and applying the same optimization discipline used in weight training.

The Neural Network Analogy

The mapping is precise:

Neural Network Training	SkillOpt Training
Weights	Skill document (markdown)
Gradients	Trajectory-derived add/delete/replace edits
Learning rate	Edit budget cap
Validation split	Held-out task set
Epochs	Optimization rounds with slow/meta updates

A frozen model runs tasks with the current skill, producing scored trajectories. A separate optimizer model analyzes failures in minibatches, proposes structured edits, and ranks them under a budget. If the candidate skill improves performance on held-out validation, the edit is accepted. If not, it’s rejected and stored so the optimizer doesn’t repeat failed changes.

What Gets Deployed

The output is a single best_skill.md file, typically 300 to 2,000 tokens. No weight changes. No extra inference-time calls. No special runtime dependencies.

The learned rules read like guidelines a thoughtful engineer would write after a day of debugging the benchmark — except they were discovered automatically through systematic optimization.

The Numbers

Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt matches or beats every competitor on all 52 evaluated cells. The competition includes human-written skills, one-shot LLM generation, Trace2Skill, TextGrad, GEPA, and EvoSkill.

On GPT-5.5 specifically:

+23.5 points over no-skill baseline in direct chat
+24.8 points inside the Codex agentic loop
+19.1 points inside Claude Code

Transfer experiments show optimized skills retain value when moved across model scales, between Codex and Claude Code environments, and to adjacent math benchmarks without retraining.

Two Teams, Same Conclusion

SkillOpt isn’t the first system to optimize skill files. The Hermes Agent independently built the same concept through a combination of skill_manage, Curator, and an optimization loop called GEPA that scores, mutates, and promotes skill documents across runs.

Different architectures, different teams, same underlying insight: when you can’t touch the weights, the skill document becomes the natural place to accumulate learned behavior.

This convergence suggests we’re seeing a real pattern emerge in agent development, not just isolated experiments.

Practical Implications

For agent builders, SkillOpt provides a reproducible way to turn task feedback into better instructions. Instead of manually iterating on prompts based on intuition, you can run an optimization loop that systematically explores the edit space.

For agent frameworks that already use skill files (OpenClaw/Clawdbot, Claude Code with CLAUDE.md, Codex with AGENTS.md), SkillOpt offers a potential path to automatic skill improvement. Run tasks, collect scores, let the optimizer propose edits, validate on held-out examples, commit improvements.

The lightweight output (a markdown file under 2K tokens) also means optimized skills are portable. Train once, deploy anywhere that accepts text instructions.

Getting Started

git clone https://github.com/microsoft/SkillOpt.git
cd SkillOpt
pip install -e .

Training requires:

A benchmark config (SearchQA, ALFWorld, DocVQA, etc.)
Data in train/val/test splits
An LLM endpoint (Azure OpenAI, OpenAI, Anthropic, or local vLLM)

python scripts/train.py \
  --config configs/searchqa/default.yaml \
  --split_dir /path/to/your/data \
  --optimizer_model gpt-5.5 \
  --target_model gpt-5.5

The output lands in outputs/<run_name>/best_skill.md.

Links

The skill file just became a first-class citizen in the optimization loop. That’s worth paying attention to.