Context Engineering: The Discipline That Separates Good AI Agents from Great Ones
A deep dive into Agent Skills for Context Engineering — the open-source toolkit cited in academic research that teaches you how to curate context windows like a professional AI engineer.
You’ve heard of prompt engineering. But the engineers shipping production AI agents in 2025 are talking about something deeper: context engineering.
The Agent-Skills-for-Context-Engineering repository by Muratcan Koylan is a comprehensive, open collection of structured skills — already cited in academic research from Peking University — that teaches the art and science of curating what goes into a model’s context window to maximize agent effectiveness.
Part 1: Foundations — The Mental Model
Prompt Engineering vs. Context Engineering
Most developers stop at prompt engineering: writing clever instructions to steer model behavior. That’s necessary, but it’s only one piece of the puzzle.
Context engineering is the discipline of managing everything the language model can attend to at inference time:
- System prompts
- Tool definitions
- Retrieved documents
- Message history
- Tool outputs
The fundamental constraint is not raw token capacity — it’s attention mechanics. As context length grows, models exhibit predictable degradation: the “lost-in-the-middle” phenomenon, U-shaped attention curves, and attention scarcity. The goal is finding the smallest high-signal set of tokens that maximizes the probability of the desired outcome.
Think of the model’s context window like a detective’s investigation board. A great detective (model) doesn’t pin every newspaper clipping they’ve ever read on the board — they curate only the most relevant evidence. Context engineering is the art of being that detective’s assistant.
What is an “Agent Skill”?
This repository implements the Agent Skills specification — a structured way to package guidance for AI agents. Each skill follows a standard format:
skill-name/
├── SKILL.md # Required: instructions + metadata
├── scripts/ # Optional: executable code
└── references/ # Optional: additional docs
The design follows progressive disclosure: at startup, an agent loads only skill names and descriptions. Full content loads only when a skill is activated for the relevant task. This keeps agents fast while giving them access to deep expertise on demand.
Part 2: The Investigation — Skill Architecture
13 Skills Across 5 Categories
The collection organizes 13 skills into a coherent learning path:
| Category | Skills |
|---|---|
| Foundational | context-fundamentals, context-degradation, context-compression |
| Architectural | multi-agent-patterns, memory-systems, tool-design, filesystem-context, hosted-agents |
| Operational | context-optimization, evaluation, advanced-evaluation |
| Development | project-development |
| Cognitive | bdi-mental-states |
Claude Code Plugin Integration
What makes this collection unique is its first-class integration with Claude Code’s plugin marketplace:
# Register the marketplace
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering
# Install specific plugin bundles
/plugin install context-engineering-fundamentals@context-engineering-marketplace
/plugin install agent-architecture@context-engineering-marketplace
/plugin install agent-evaluation@context-engineering-marketplace
Skills auto-activate based on task context — no manual configuration required.
Part 3: The Diagnosis — What This Actually Does for Developers
3.1 The Anatomy of Context (from context-fundamentals)
The context-fundamentals skill breaks down what actually lives in a model’s context:
| Component | Characteristics |
|---|---|
| System prompts | Loaded once, persist throughout the session |
| Tool definitions | Serialized near the front of context; descriptions steer behavior |
| Retrieved documents | Just-in-time loaded via RAG |
| Message history | Grows linearly; dominates long-running tasks |
| Tool outputs | Can reach 83.9% of total context in agent trajectories |
The key insight: context must be treated as a finite resource with diminishing marginal returns. Every new token depletes the attention budget.
Practical example — organizing system prompts with clear section boundaries:
<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>
<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
</INSTRUCTIONS>
<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>
<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>
3.2 Context Degradation Patterns (from context-degradation)
This is where things get empirically fascinating. The skill documents 5 distinct failure modes:
1. Lost-in-the-Middle — Information in the center of context receives 10-40% lower recall accuracy compared to information at the start or end. This is not a bug; it’s a consequence of attention mechanics.
2. Context Poisoning — A hallucination or incorrect tool output enters context and compounds through repeated reference. Recovery requires truncating to before the poisoning point.
3. Context Distraction — Even a single irrelevant document reduces performance. The effect is not proportional; it’s a step function.
4. Context Confusion — When context contains multiple task types, the model may apply constraints from the wrong task.
5. Context Clash — Multiple correct but conflicting pieces of information create contradictory guidance.
Model-specific degradation thresholds (from the skill’s reference data):
| Model | Degradation Onset | Severe Degradation |
|---|---|---|
| GPT-5.2 | ~64K tokens | ~200K tokens |
| Claude Opus 4.5 | ~100K tokens | ~180K tokens |
| Claude Sonnet 4.5 | ~80K tokens | ~150K tokens |
| Gemini 3 Pro | ~500K tokens | ~800K tokens |
Practical mitigation — the Four-Bucket approach:
# Write: save context outside the window
scratchpad: "Write intermediate results to filesystem"
# Select: pull only relevant context
retrieval: "Filter documents before loading"
# Compress: reduce tokens while preserving info
summarization: "Replace verbose outputs with compact references"
# Isolate: split across sub-agents
partition: "Give each agent a fresh, focused context"
3.3 Multi-Agent Patterns (from multi-agent-patterns)
The multi-agent skill reveals a critical insight: sub-agents exist primarily to isolate context, not to simulate organizational roles.
Token economics reality:
| Architecture | Token Multiplier |
|---|---|
| Single agent chat | 1× |
| Single agent with tools | ~4× |
| Multi-agent system | ~15× |
Despite the cost, multi-agent approaches unlock parallelization. Research on the BrowseComp evaluation found token usage explains 80% of performance variance — validating that distributing work across agents with separate context windows is worth the overhead.
The Telephone Game Problem — a critical pitfall in supervisor architectures:
LangGraph benchmarks found supervisor architectures initially performed 50% worse than optimized versions because supervisors paraphrase sub-agent responses incorrectly. The fix is a forward_message tool:
def forward_message(message: str, to_user: bool = True):
"""
Forward sub-agent response directly to user without supervisor synthesis.
Use when:
- Sub-agent response is final and complete
- Supervisor synthesis would lose important details
- Response format must be preserved exactly
"""
if to_user:
return {"type": "direct_response", "content": message}
return {"type": "supervisor_input", "content": message}
3.4 Real-World Examples Included
The repo ships 5 production-quality examples that demonstrate how skills combine in practice:
| Example | What It Demonstrates |
|---|---|
digital-brain-skill | Personal OS for founders — 6 modules, 4 automation scripts, JSONL append-only memory |
x-to-book-system | Multi-agent pipeline monitoring X accounts → generating daily synthesized books |
llm-as-judge-skills | TypeScript LLM evaluation tools — 19 passing tests, pairwise comparison, bias mitigation |
book-sft-pipeline | Fine-tune 8B model on any author’s style for $2 total cost |
interleaved_thinking | Cognitive architecture demonstration |
Part 4: The Resolution — How to Use It
For Claude Code Users
The fastest path:
# 1. Register marketplace
/plugin marketplace add muratcankoylan/Agent-Skills-for-Context-Engineering
# 2. Install the bundle you need
/plugin install context-engineering-fundamentals@context-engineering-marketplace
# Includes: context-fundamentals, context-degradation, context-compression, context-optimization
Skills activate automatically when you use trigger phrases:
| Trigger phrase | Skill activated |
|---|---|
| ”compress context” | context-compression |
| ”implement LLM-as-judge” | advanced-evaluation |
| ”design multi-agent system” | multi-agent-patterns |
| ”build background agent” | hosted-agents |
For Cursor / Codex / Any IDE
Copy the relevant SKILL.md content into your .rules file or project-specific instructions folder. The skills are deliberately platform-agnostic.
For Custom Implementations
The skills are designed as extractable patterns. Pick a skill that addresses your current challenge, extract the design principles, and implement them in your agent framework.
Learning Path Recommendation
- Start with
context-fundamentals— builds the mental model - Study
context-degradation— understand how things go wrong - Apply
context-compressionandcontext-optimization— prevent problems proactively - Expand to architectural skills based on your system needs
Final Mental Model
Context Engineering Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHAT: A structured collection of 13 agent skills for building
production-grade AI systems through context management
WHY: Context windows are finite. Attention is scarce.
More tokens ≠ better performance.
HOW: Progressive disclosure + Four-Bucket strategy
(Write → Select → Compress → Isolate)
KEY INSIGHT:
Sub-agents exist to isolate context, not simulate org charts.
Place critical info at context START or END, never middle.
A single irrelevant document degrades performance measurably.
IMPACT:
Cited in Peking University research on agentic skill evolution.
13 skills covering fundamentals → architecture → operations.
5 complete production examples with real code.
PLATFORMS: Claude Code · Cursor · Codex · Any agent framework
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Whether you’re building your first AI agent or optimizing a multi-agent system for production, this collection gives you the vocabulary, mental models, and concrete patterns to do context engineering right.
Repository: Agent-Skills-for-Context-Engineering Author: Muratcan Koylan
Related posts
-
Inside the Black Box: What Leaked AI System Prompts Reveal About How Your Favorite Tools Actually Think
A deep-dive into the most comprehensive collection of leaked system prompts from Cursor, Manus, Windsurf, Devin, v0, and 30+ other AI tools — revealing their core architectures, tool designs, and agent philosophies.
-
Superpowers: The Workflow That Teaches AI Agents Discipline
Superpowers makes coding agents slow down, ask questions, write plans, and test first. The result is less flashy AI code, but much more trustworthy code.
-
BitNet: The Era of 1-bit LLMs is Finally Here
Explore bitnet.cpp, Microsoft's official framework for 1-bit LLMs that replaces multiplications with additions for massive speedups.
-
Khoj: The Open-Source AI Second Brain You Can Self-Host
Khoj is an open-source personal AI app that acts as your AI second brain — chat with any LLM, search your documents with semantic AI, build custom agents, and self-host it completely on your own machine.