Skills — Teaching AI Agents Domain-Specific Engineering Capabilities

READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.

Introduction

AI agents like GitHub Copilot are increasingly capable of handling complex engineering workflows — but raw capability isn’t enough. An agent that loads every piece of knowledge it has into every interaction is wasteful, slow, and prone to confusion. The practical answer to this problem is skills: reusable, named capability packages that the agent loads only when they’re relevant.

This post digs into what skills are, how to structure them, when to use them over instruction files, and how teams building on top of LangChain, OpenTelemetry, Kubernetes, and Terraform are already thinking in skill-shaped patterns — even if they don’t call them that yet.

The Context Window Problem

Before explaining skills, it helps to understand the problem they solve.

An LLM’s context window is finite. Everything injected into it — instructions, tool definitions, memory, conversation history, code snippets — competes for the same limited budget. Injecting content that isn’t relevant to the current task doesn’t just waste tokens; it actively degrades the agent’s focus. A well-studied phenomenon in LLM research is that models tend to pay more attention to content near the beginning and end of their context window and less to content in the middle — often called the “lost in the middle” effect. If your context is full of irrelevant specialised knowledge, the relevant signal gets diluted.

Instruction files — always-on documents injected into every session — are the right tool for conventions that apply universally. But domain-specific runbooks, diagnostic workflows, and tool-specific procedures don’t apply universally. Loading a Kubernetes pod-failure diagnosis workflow into every Copilot session, including sessions where the developer is writing a CSS stylesheet, is exactly the kind of noise that degrades performance.

Skills are the solution. They stay dormant until the agent identifies a task that matches their domain, then they’re dynamically loaded and applied.

What a Skill Is

A skill is a self-contained, named unit of agent capability. It typically contains:

A description of what the skill does and when it applies
A workflow — the ordered set of steps, commands, or reasoning the agent should follow
Examples — concrete inputs and outputs that ground the agent’s behaviour in your specific environment

The agent’s orchestration layer uses the skill’s description to decide when to invoke it. When a match is detected — either through explicit invocation by the user, or through the agent’s own inference — the skill’s content is loaded into the active context and the agent follows the embedded workflow.

Example Skill Directory Structure

skills/
  kubernetes-debugging/
    skill.md          # Capability description and trigger conditions
    workflow.md       # Step-by-step diagnostic procedure
    examples/
      crashloop.md    # Example: CrashLoopBackOff diagnosis
      oom-killed.md   # Example: OOMKilled pod recovery
  terraform-drift/
    skill.md
    workflow.md
    examples/
      state-drift.md
  distributed-tracing/
    skill.md
    workflow.md
    examples/
      latency-spike.md
  performance-profiling/
    skill.md
    workflow.md
    examples/
      cpu-hotspot.md

The structure is intentionally flat and readable. Each skill is a directory. The skill author controls what goes in it; the agent’s loader controls when it’s pulled in.

A Concrete Skill: Kubernetes Pod Failure Diagnosis

Here’s what the skill.md for a Kubernetes debugging skill looks like in practice.

skills/kubernetes-debugging/skill.md

# Skill: Kubernetes Pod Failure Diagnosis

## When to Activate

Activate this skill when the user is investigating:
- Pods in CrashLoopBackOff, OOMKilled, or Pending state
- Failing readiness or liveness probes
- Nodes in NotReady state
- Deployment rollout failures
- Service connectivity issues

## Capabilities

- **Analyze pod logs**: Retrieve and parse logs from the failing pod and its previous restart
- **Inspect events**: Query namespace events ordered by timestamp to reconstruct the failure sequence
- **Detect crashloop causes**: Pattern-match log output against known crash signatures (OOM, missing config, failed health check, dependency timeout)
- **Inspect resource constraints**: Compare actual resource usage against requests and limits
- **Suggest remediation steps**: Propose targeted fixes based on the detected failure mode

## Dependencies

Requires `kubectl` access to the affected cluster. Assumes the user has at minimum `get` and `list` permissions on pods, events, and deployments in the target namespace.

skills/kubernetes-debugging/workflow.md

# Kubernetes Pod Failure Diagnosis Workflow

## Step 1: Identify the Failing Pod

```bash
kubectl get pods -n <namespace> --field-selector=status.phase!=Running

Note the pod name, restart count, and current status.

Step 2: Retrieve Current and Previous Logs

# Current container logs
kubectl logs <pod-name> -n <namespace> --tail=100

# Previous container logs (most useful for CrashLoopBackOff)
kubectl logs <pod-name> -n <namespace> --previous --tail=100

Look for: panic messages, OOM signals, missing environment variables, connection refused errors.

Step 3: Inspect Events

kubectl get events -n <namespace> --sort-by='.lastTimestamp' | grep <pod-name>

Events reveal: image pull failures, scheduling failures, probe failures, and node pressure evictions.

Step 4: Check Resource Usage

kubectl top pod <pod-name> -n <namespace>
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Limits\|Requests"

If the pod was OOMKilled, the actual memory usage will be at or above the limit.

Based on findings from steps 1–4, produce a structured summary:

Root cause: [one-sentence description]
Evidence: [the specific log lines or events that support the diagnosis]
Remediation: [the specific change to make — config value, resource limit, image tag]
Verification: [the command to run to confirm the fix worked]


This is a workflow the agent can follow mechanically once it's loaded. Without the skill, the agent would have to reason from first principles each time — which is slower, less consistent, and more likely to miss your environment's specific patterns.

---

## Open Source Patterns That Map to Skills

You don't need to design skill workflows from scratch. Open source projects have been accumulating exactly this kind of encoded domain knowledge for years. The trick is recognising where it lives and translating it into skill form.

### LangChain: Agent Tools and Chains

[LangChain](https://github.com/langchain-ai/langchain) formalises the concept of tools — discrete capabilities that an agent can invoke by name — and chains — ordered sequences of steps that compose those tools into workflows. This is structurally identical to the skill model.

A LangChain chain for document retrieval followed by synthesis looks like:

```python
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    chain_type="stuff"
)

Translated into a Copilot skill, this becomes a document-retrieval skill with a workflow that instructs the agent to search the knowledge base before attempting to answer domain-specific questions. The agent doesn’t need to know retrieval semantics on every request — only when the query pattern suggests the answer might live in a document store.

LangChain’s agent executor pattern is also instructive: the executor doesn’t load all tools into the model’s context simultaneously. It gives the model a list of tool descriptions and lets the model decide which tool to call. Skills work the same way — the orchestrator gives the agent a list of available skill names and descriptions, and the agent selects based on relevance.

OpenTelemetry: Debugging Distributed Traces

OpenTelemetry collector pipelines generate structured telemetry — traces, metrics, and logs — that require specialised knowledge to interpret. Debugging a latency spike in a distributed system means correlating spans across multiple services, identifying the slowest segments, and tracing the call back to a specific code path.

A distributed-tracing skill encapsulates this workflow:

# Skill: Distributed Trace Latency Analysis

## When to Activate

User is investigating:
- Request latency above SLO threshold
- Timeout errors in a specific service
- Unexplained P99 spikes in dashboards

## Workflow

1. Retrieve the trace ID from the error log or alert
2. Query the tracing backend (Jaeger/Zipkin/Tempo) for the full trace
3. Identify the slowest span by comparing duration to parent span budget
4. Inspect span attributes on the slowest segment for: DB query details, external HTTP calls, queue wait times
5. Cross-reference with metrics: did CPU, memory, or connection pool utilisation spike at the same timestamp?
6. Produce a ranked list of suspected causes with supporting evidence from the trace data

Without this skill, an agent asked to “debug this latency spike” would need to reason about distributed tracing concepts from scratch. With it, the agent has a procedure to follow and knows what to look for.

Kubernetes: Troubleshooting Playbooks

The kubectl project and its surrounding ecosystem have accumulated extensive troubleshooting playbooks. The Kubernetes documentation alone contains detailed diagnostic procedures for dozens of failure modes. These are exactly the kind of domain knowledge that belongs in a skill rather than an instruction file.

A cluster-diagnostics skill might package:

Node health checks (kubectl get nodes, kubectl describe node)
Control plane component status
etcd health probes
Certificate expiry checks
Network policy conflict detection

Each of these is a discrete sub-workflow. The skill’s workflow.md orchestrates them into a coherent diagnostic runbook that the agent can execute against a real cluster.

The key insight from Kubernetes tooling is that good diagnostic workflows are ordered — you check the obvious things first (is the node ready?) before moving to the exotic ones (is the CNI plugin healthy?). Encoding that order into a skill means the agent produces systematic diagnostics rather than ad hoc guesses.

Terraform: Infrastructure Drift Detection

Terraform drift — the divergence between what your state file says exists and what actually exists in the cloud — is a common source of subtle infrastructure bugs. The workflow for diagnosing it is well-defined:

# Check for drift
terraform plan -detailed-exitcode

# Refresh state without applying
terraform refresh

# Identify specific drifted resources
terraform show -json | jq '.values.root_module.resources[] | select(.values.lifecycle_meta.replace_triggered_by)'

A terraform-drift skill packages this procedure, adds your organisation’s specific provider configurations, and tells the agent how to interpret the output in the context of your infrastructure topology. It can also encode your team’s policy on how to resolve drift — whether that’s terraform apply to reconcile to code, a manual cloud console fix, or an incident ticket for review.

Terraform’s own documentation on drift detection is a ready-made workflow that can be adapted almost directly into a skill.

Skills vs. Instruction Files: When to Use Which

The decision between a skill and an instruction file comes down to one question: does this knowledge apply to every task, or only some tasks?

Characteristic	Instruction File	Skill
Loading	Always loaded	Loaded on demand
Scope	Repository-wide conventions	Domain-specific workflows
Content	What to do / not do generally	How to do a specific thing
Best for	Language conventions, architecture rules, team norms	Diagnostic runbooks, generation workflows, tool-specific procedures
Cost	Always consumes context tokens	Only consumes tokens when activated
Example	“Use `slog` for structured logging in Go”	“Kubernetes pod failure diagnosis workflow”

The failure mode for over-using instruction files is context bloat: every session carries the weight of specialised knowledge that’s irrelevant to the task at hand. The failure mode for over-using skills is missed activation: the agent doesn’t recognise that a skill is relevant and proceeds without it.

The remedy for missed activation is precise trigger descriptions in your skill.md. Be specific about the conditions under which the skill applies. “Activate when the user mentions Kubernetes” is too broad. “Activate when the user is investigating a pod in CrashLoopBackOff, OOMKilled, or Pending state” is appropriately precise.

A practical rule of thumb:

If you’d want the agent to know it on more than 70% of tasks → instruction file
If you’d want the agent to know it on fewer than 30% of tasks → skill
If it’s somewhere in between → write it as a skill with broad trigger conditions, and monitor whether the agent is activating it too rarely or too often

Preventing Context Bloat with Skills

Context bloat is a real performance problem, not just an aesthetic one. An agent with a bloated context is slower (more tokens to process), more expensive (token costs scale with context length), and less accurate (relevant information gets buried in noise).

Skills prevent context bloat through a simple mechanism: lazy loading. The agent maintains a registry of skill names and their one-paragraph descriptions. This registry is cheap — a few hundred tokens at most. When the agent determines a skill is relevant, it loads that skill’s full content. When it moves on to a different task, the skill’s content drops out of the active context.

This is analogous to how a good software engineer handles specialised knowledge. You don’t memorise every detail of every library you might ever need. You know enough to recognise when a library is relevant, then you read the documentation. Skills give the agent the same two-level structure: shallow awareness across many domains, deep knowledge within the active domain.

The compound benefit appears in long-running sessions. Without skills, a session that touches Kubernetes, Terraform, and distributed tracing in sequence would accumulate all three bodies of specialised knowledge in the context simultaneously. With skills, each body of knowledge is loaded when needed and can be evicted when no longer relevant, keeping the active context lean throughout.

Building Your First Skill

The minimal viable skill is three files: a skill.md with a description and trigger conditions, a workflow.md with the step-by-step procedure, and at least one example. Start there, get the agent to use it on a real task, and observe where it fails.

Common iteration patterns:

The agent isn’t activating the skill. Your trigger conditions are too narrow. Add more synonyms for the failure mode, more entry-point phrasings, more ways the user might describe the problem.

The agent activates the skill for irrelevant tasks. Your trigger conditions are too broad. Be more specific about the context — require a specific tool, a specific error type, or a specific environment.

The agent follows the workflow but produces wrong results. Your workflow is missing environment-specific context. Add details about your infrastructure topology, your specific tool versions, or your team’s naming conventions.

The agent skips steps. Add explicit ordering language (“Before doing X, always do Y first”) and explain why the order matters. Agents are more likely to follow ordered procedures when they understand the reasoning behind the order.

The investment in building a good skill pays dividends every time the agent diagnoses a pod failure, detects infrastructure drift, or profiles a performance regression without you having to walk it through the procedure from scratch.

Conclusion

Skills are the mechanism that makes AI agents practically useful for complex engineering workflows. They solve the context bloat problem that instruction files create when overloaded with domain-specific knowledge. They encode the kind of systematic, repeatable procedures that experienced engineers follow instinctively — and make that knowledge available to every member of your team, consistently, on demand.

Start with the workflows you repeat most often. The Kubernetes debugging runbook you follow every incident. The Terraform drift check before every production deployment. The distributed trace correlation procedure you walk through every latency spike. Package each one as a skill, tune the trigger conditions until the activation rate feels right, and let the agent carry the procedure so you can focus on the interpretation.

The best skills are the ones you stop noticing — because the agent just handles the diagnostic correctly, every time, without being asked.