Beyond Localhost: Building a Secure Kubernetes Sandbox for AI Coding Agents with Eclipse Che and GitHub Copilot

READER BEWARE: THE FOLLOWING WRITTEN MOSTLY BY AI WITH LIMITED HUMAN EDITING TO GET ALTERNATIVES DEFINED.

Introduction

Running an autonomous AI coding agent on your laptop is a liability. The agent has unrestricted access to your filesystem, your shell history, your cloud credentials, and your network interfaces. A misunderstood instruction, a malformed code-generation loop, or a subtle prompt-injection attack in a dependency’s README can cause real damage — and you won’t know until after it’s happened.

Local environments also suffer from configuration drift. The first time you run the agent it works perfectly. Three weeks later, after a brew upgrade and a Node version manager update, the agent silently starts generating subtly wrong code because the Ruby version it thinks it’s targeting is no longer the one installed. Reproducing the original environment is now a debugging exercise in its own right.

The fix isn’t to make the agent more careful. The fix is to give it a dedicated, disposable, hermetically sealed environment — one where the worst case is destroying a container, not corrupting a developer’s machine.

Eclipse Che is the right tool for this job. It is a Kubernetes-native development environment server that provisions isolated, spec-driven workspaces on demand. Each workspace is a Pod with a defined container image, resource limits, mounted secrets, and a running Code-OSS (VS Code) server. The workspace spec is a plain YAML file — a Devfile — that lives in version control alongside your code. When the AI breaks something, you delete the Pod and re-provision an identical one in minutes.

Pair that with GitHub Copilot’s Agent Mode running headless inside the workspace, and you have an architecture where the AI can build, test, and iterate on Ruby code inside a reproducible Kubernetes sandbox — without touching anything outside it.


The Architecture Overview

The three layers of the system interact as follows:

┌─────────────────────────────────────────────────────────────────┐
│                      Orchestrator / Agent Framework             │
│  (LLM + tool-calling loop; runs outside the cluster)            │
│                                                                 │
│  tools: { kubectl_exec, che_api, workspace_lifecycle }          │
└───────────────────────────┬─────────────────────────────────────┘
                            │  REST / WebSocket
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Eclipse Che Server                         │
│  (Kubernetes Operator; manages workspace lifecycle)             │
│                                                                 │
│  Devfile Registry ──► DevWorkspace CR ──► Pod Scheduling        │
└───────────────────────────┬─────────────────────────────────────┘
                            │  Pod API / exec pipe
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│               Target Container Workspace (Pod)                  │
│                                                                 │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────┐  │
│  │  ruby-dev        │  │  Code-OSS server │  │  Copilot     │  │
│  │  container       │  │  (port 3000)     │  │  Extension   │  │
│  │  Ruby 3.3        │  │                  │  │  (headless)  │  │
│  │  bundler, rake   │  │  ruby-lsp        │  │              │  │
│  └──────────────────┘  └──────────────────┘  └──────────────┘  │
│                                                                 │
│  Source code volume ──► /projects/app                           │
│  GITHUB_TOKEN ──────► injected from K8s Secret                  │
└─────────────────────────────────────────────────────────────────┘

The Orchestrator is your agent framework — think LangChain, AutoGen, or a custom tool-calling loop. It sits outside the cluster and communicates with the workspace via the Che API for lifecycle management and via kubectl exec (or the Che websocket terminal) for command execution. The orchestrator never touches the production cluster directly; it can only reach what is reachable from inside the workspace pod.

Eclipse Che Server translates a Devfile declaration into a running DevWorkspace custom resource. The Che Operator schedules the required containers, injects the Code-OSS server sidecar, mounts secrets, and exposes the HTTP endpoint. The orchestrator calls Che’s REST API to create, start, stop, and delete workspaces programmatically — no human clicks required.

The Target Workspace is the AI’s entire world. It has bounded CPU and memory (enforced by Kubernetes resource limits), network access scoped to what your NetworkPolicy allows, and a clean Ruby 3.3 environment rebuilt from scratch every time it is provisioned. The VS Code server running on port 3000 means a human can connect their desktop VS Code to the exact same environment the agent is working in — a genuinely useful debugging capability.


Step-by-Step Implementation Guide

Step 1: Provision the Workspace with a Devfile

The Devfile is the single source of truth for the workspace. Commit it to your repository root as devfile.yaml.

schemaVersion: 2.2.0

metadata:
  name: ruby-ai-agent-workspace
  version: 1.0.0

components:
  - name: ruby-dev
    container:
      image: quay.io/devfile/ruby:3.3
      memoryLimit: 2Gi
      memoryRequest: 512Mi
      cpuLimit: "2"
      cpuRequest: "500m"
      mountSources: true
      endpoints:
        - name: http-app
          exposure: public
          protocol: http
          targetPort: 3000
      env:
        - name: GITHUB_TOKEN
          valueFrom:
            secretKeyRef:
              name: copilot-token
              key: token
        - name: BUNDLE_PATH
          value: /home/user/.bundle

  - name: ruby-lsp
    plugin:
      registry: https://open-vsx.org/api
      id: shopify/ruby-lsp
      version: latest

  - name: github-copilot
    plugin:
      registry: https://open-vsx.org/api
      id: github/copilot
      version: latest

commands:
  - id: install-deps
    exec:
      label: Install Ruby dependencies
      component: ruby-dev
      workingDir: /projects/app
      commandLine: bundle install
      group:
        kind: build
        isDefault: true

  - id: run-tests
    exec:
      label: Run test suite
      component: ruby-dev
      workingDir: /projects/app
      commandLine: bundle exec rake test
      group:
        kind: test
        isDefault: true

  - id: start-server
    exec:
      label: Start application server
      component: ruby-dev
      workingDir: /projects/app
      commandLine: bundle exec ruby app.rb -p 3000
      group:
        kind: run
        isDefault: true

A few design decisions worth explaining:

  • memoryLimit: 2Gi and cpuLimit: "2" — hard limits enforced by the kubelet. A runaway loop { fork } or an accidental while true in agent-generated code cannot consume more than this. The node is protected.
  • mountSources: true — Che automatically mounts a persistent volume at /projects. The agent’s work survives pod restarts without requiring an explicit PVC definition.
  • GITHUB_TOKEN from a Secret — detailed in Step 3 below. Never hardcode tokens in the Devfile.

Apply the workspace to your Che instance via the API:

# CHE_TOKEN is a Bearer token obtained from your Che OIDC provider
# (e.g. `chectl auth:login` writes it to ~/.config/chectl/config.json)
curl -X POST https://che.example.com/api/workspace \
  -H "Authorization: Bearer ${CHE_TOKEN}" \
  -H "Content-Type: application/x-yaml" \
  --data-binary @devfile.yaml

Or use the chectl CLI:

chectl workspace:create --devfile=devfile.yaml
chectl workspace:start <workspace-id>

Step 2: Agent-to-Workspace Interactivity

Once the workspace Pod is running, the orchestrator issues commands by piping through kubectl exec. This is equivalent to the agent having a terminal inside the container without any additional infrastructure.

Here is a conceptual Python snippet showing how an agent tool wraps kubectl exec:

import subprocess
import json
from typing import Any

NAMESPACE = "eclipse-che"

def run_in_workspace(
    pod_name: str,
    command: list[str],
    container: str = "ruby-dev",
    timeout: int = 120,
) -> dict[str, Any]:
    """
    Execute a command inside a Che workspace container.

    Returns stdout, stderr, and the exit code so the orchestrator
    can decide whether to retry, abort, or continue.
    """
    kubectl_cmd = [
        "kubectl", "exec", pod_name,
        "--namespace", NAMESPACE,
        "--container", container,
        "--",
        *command,
    ]

    result = subprocess.run(
        kubectl_cmd,
        capture_output=True,
        text=True,
        timeout=timeout,
    )

    return {
        "stdout": result.stdout,
        "stderr": result.stderr,
        "exit_code": result.returncode,
        "success": result.returncode == 0,
    }


# Example: orchestrator installs dependencies, runs tests, and reads results
def agent_iteration(pod_name: str) -> None:
    steps = [
        (["bundle", "install"], "dependency installation"),
        (["bundle", "exec", "rake", "test"], "test suite"),
    ]

    for command, label in steps:
        result = run_in_workspace(pod_name, command)

        if not result["success"]:
            # Surface the failure back to the LLM for diagnosis
            raise RuntimeError(
                f"{label} failed (exit {result['exit_code']}):\n{result['stderr']}"
            )

        print(f"[{label}] ✓\n{result['stdout']}")

The orchestrator’s LLM loop calls run_in_workspace as a tool. The LLM receives the stdout and stderr as tool output, diagnoses failures in context, generates a patch, writes it into the workspace with another kubectl exec call (or via the Che file API), and then re-runs the failing step. This is the core agentic loop: observe → diagnose → patch → verify.

Because the orchestrator communicates through a narrow, well-defined interface (kubectl exec + Che REST API), it has no ambient access to the rest of your infrastructure. If you need to restrict further, scope the orchestrator’s service account to only the eclipse-che namespace.

Step 3: Handling Headless Authentication

GitHub Copilot’s standard authentication flow opens a browser tab to complete an OAuth device flow. That flow does not work in a headless container. The workaround is to pre-authorise using a GitHub Personal Access Token with the appropriate Copilot scopes, store it as a Kubernetes Secret, and inject it into the workspace via the Devfile environment variable you already saw in Step 1.

Create the Kubernetes Secret before the workspace starts:

kubectl create secret generic copilot-token \
  --namespace eclipse-che \
  --from-literal=token="${GITHUB_TOKEN}"

For token rotation in a production environment, use External Secrets Operator to sync the token from your secret store (Vault, AWS Secrets Manager, or GCP Secret Manager) rather than managing it manually:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: copilot-token
  namespace: eclipse-che
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: copilot-token
    creationPolicy: Owner
  data:
    - secretKey: token
      remoteRef:
        key: secret/copilot
        property: github_token

With GITHUB_TOKEN available as an environment variable inside the container, Copilot’s VS Code extension (and its language server process) will authenticate silently on startup without requiring any interactive browser step. The token is never present in the Devfile YAML, in a ConfigMap, or in any log output — it exists only inside the Pod’s environment and is sourced from the Secret at runtime.

Token scope requirements: The PAT needs the copilot scope (for GitHub Copilot API access) and read:org if your Copilot subscription is managed at the organisation level.


End-to-End Walkthrough: From Task Assignment to Merged PR

The three steps above describe how to provision and wire up the environment. This section walks through a complete agentic task lifecycle — from the moment the orchestrator receives a coding ticket to the moment the workspace is suspended while humans review the pull request.

Phase 1 — Workspace Provisioning

The orchestrator receives a task, for example: “Add rate-limit middleware to the Sinatra app and cover it with tests.” Before writing a single line of code, it creates a dedicated workspace for this task:

import httpx
import os
import shlex
import subprocess
import time
import json
import base64

CHE_API = "https://che.example.com/api"
CHE_TOKEN = os.environ["CHE_TOKEN"]

def provision_workspace(devfile_path: str) -> dict:
    """Create and start a Che workspace. Returns workspace_id and pod_name."""
    with open(devfile_path) as f:
        devfile_content = f.read()

    # Create the workspace from the Devfile
    resp = httpx.post(
        f"{CHE_API}/workspace",
        headers={
            "Authorization": f"Bearer {CHE_TOKEN}",
            "Content-Type": "application/x-yaml",
        },
        content=devfile_content,
        timeout=30,
    )
    resp.raise_for_status()
    workspace_id = resp.json()["id"]

    # Start the workspace
    httpx.post(
        f"{CHE_API}/workspace/{workspace_id}/runtime",
        headers={"Authorization": f"Bearer {CHE_TOKEN}"},
    ).raise_for_status()

    # Poll until the pod is Running
    for _ in range(60):
        status = httpx.get(
            f"{CHE_API}/workspace/{workspace_id}",
            headers={"Authorization": f"Bearer {CHE_TOKEN}"},
        ).json()
        if status.get("status") == "RUNNING":
            # Resolve the Kubernetes pod name for kubectl exec calls
            pod_name = subprocess.check_output(
                [
                    "kubectl", "get", "pods",
                    "--namespace", "eclipse-che",
                    "--selector",
                    f"controller.devfile.io/devworkspace-id={workspace_id}",
                    "--output", "jsonpath={.items[0].metadata.name}",
                ],
                text=True,
            ).strip()
            return {"workspace_id": workspace_id, "pod_name": pod_name}
        time.sleep(5)

    raise TimeoutError("Workspace did not reach RUNNING state in 5 minutes")

Phase 2 — Cloning, Implementing, and Testing

With the workspace running, the orchestrator clones the repository, calls the GitHub Copilot chat completions API to generate the implementation, writes the result into the workspace, and drives the test loop.

The Copilot completions API (api.githubcopilot.com) accepts the same GITHUB_TOKEN used for workspace auth. The orchestrator calls it from outside the cluster — no kubectl exec needed for this step:

COPILOT_API = "https://api.githubcopilot.com"
REPO = "git@github.com:your-org/your-app.git"
BRANCH = "agent/rate-limit-middleware"
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

def copilot_generate(prompt: str, context_files: dict[str, str]) -> str:
    """
    Ask the Copilot chat completions API to produce code.

    context_files: mapping of filename -> current file content,
    so Copilot can see what already exists before generating changes.
    Returns the generated content as a string.
    """
    messages = [
        {
            "role": "system",
            "content": (
                "You are an expert Ruby developer. "
                "Respond with only the complete file content requested. "
                "No prose, no markdown fences."
            ),
        },
    ]
    for filename, content in context_files.items():
        messages.append({
            "role": "user",
            "content": f"Existing file `{filename}`:\n{content}",
        })
    messages.append({"role": "user", "content": prompt})

    resp = httpx.post(
        f"{COPILOT_API}/chat/completions",
        headers={
            "Authorization": f"Bearer {GITHUB_TOKEN}",
            "Content-Type": "application/json",
            # "vscode-chat" is the integration ID required by the Copilot
            # chat completions endpoint. Use your own registered ID in production.
            "Copilot-Integration-Id": "vscode-chat",
        },
        json={"model": "gpt-4o", "messages": messages},
        timeout=120,
    )
    resp.raise_for_status()
    return resp.json()["choices"][0]["message"]["content"]


def write_file_to_workspace(pod_name: str, remote_path: str, content: str) -> None:
    """
    Write content to a file inside the workspace via kubectl exec + stdin.

    Passing content through stdin avoids shell metacharacter injection
    that would occur if the content were interpolated into a shell command.
    """
    encoded = base64.b64encode(content.encode()).decode()
    run_in_workspace(
        pod_name,
        [
            "bash", "-c",
            f"mkdir -p $(dirname {shlex.quote(remote_path)}) && "
            f"echo {encoded} | base64 --decode > {shlex.quote(remote_path)}",
        ],
    )


MAX_FIX_ATTEMPTS = 3

def run_task(pod_name: str, task_description: str) -> None:
    # run_in_workspace is the kubectl exec helper from "Step-by-Step Implementation Guide".
    # It runs a command inside the workspace pod and returns {stdout, stderr, exit_code, success}.
    # Clone the repo and create a feature branch
    run_in_workspace(pod_name, ["git", "clone", REPO, "/projects/app"])
    run_in_workspace(pod_name, ["git", "-C", "/projects/app", "checkout", "-b", BRANCH])

    # Install dependencies
    run_in_workspace(pod_name, ["bundle", "install"], workdir="/projects/app")

    # Read existing files to give Copilot context
    app_rb = run_in_workspace(pod_name, ["cat", "/projects/app/app.rb"])
    gemfile = run_in_workspace(pod_name, ["cat", "/projects/app/Gemfile"])

    # Generate the middleware implementation
    middleware_code = copilot_generate(
        prompt=(
            f"Task: {task_description}\n"
            "Write a new file `lib/rate_limit_middleware.rb` that implements "
            "a Rack middleware for rate limiting, and update `app.rb` to use it. "
            "Return the content of `lib/rate_limit_middleware.rb` only."
        ),
        context_files={
            "app.rb": app_rb["stdout"],
            "Gemfile": gemfile["stdout"],
        },
    )

    # Write the generated file safely via stdin (avoids shell metacharacter issues)
    write_file_to_workspace(
        pod_name, "/projects/app/lib/rate_limit_middleware.rb", middleware_code
    )

    # Generate the test file
    test_code = copilot_generate(
        prompt=(
            "Write a minitest test file `test/rate_limit_middleware_test.rb` "
            "that covers the RateLimitMiddleware class. Return only the file content."
        ),
        context_files={"lib/rate_limit_middleware.rb": middleware_code},
    )
    write_file_to_workspace(
        pod_name, "/projects/app/test/rate_limit_middleware_test.rb", test_code
    )
    # Run the test suite; surface failures back to Copilot for a fix iteration
    for attempt in range(MAX_FIX_ATTEMPTS):
        test_result = run_in_workspace(
            pod_name, ["bundle", "exec", "rake", "test"], workdir="/projects/app"
        )
        if test_result["success"]:
            print(f"Tests passed on attempt {attempt + 1}")
            return

        # Ask Copilot to diagnose and patch
        fix = copilot_generate(
            prompt=(
                f"The test suite failed. Fix the implementation.\n"
                f"Test output:\n{test_result['stdout']}\n{test_result['stderr']}\n"
                "Return only the corrected content of `lib/rate_limit_middleware.rb`."
            ),
            context_files={"lib/rate_limit_middleware.rb": middleware_code},
        )
        write_file_to_workspace(
            pod_name, "/projects/app/lib/rate_limit_middleware.rb", fix
        )
        middleware_code = fix

    raise RuntimeError("Tests still failing after 3 fix attempts")

The orchestrator calls the Copilot API directly from outside the cluster, then writes generated files into the workspace via kubectl exec. This keeps the token-handling logic in the orchestrator, not inside the container.

Phase 3 — Committing and Opening the Pull Request

Once tests pass, the orchestrator commits the changes and opens a pull request using the gh CLI, which is pre-installed in the workspace image and authenticated via the same GITHUB_TOKEN:

def push_and_open_pr(pod_name: str, branch: str, task_description: str) -> str:
    """Commit all changes, push the branch, and open a PR. Returns the PR URL."""
    workdir = "/projects/app"

    run_in_workspace(pod_name, ["git", "-C", workdir, "add", "--all"])
    run_in_workspace(
        pod_name,
        ["git", "-C", workdir, "commit", "-m", f"feat: {task_description}"],
    )
    run_in_workspace(pod_name, ["git", "-C", workdir, "push", "origin", branch])

    pr_result = run_in_workspace(
        pod_name,
        [
            "gh", "pr", "create",
            "--title", f"feat: {task_description}",
            "--body", "Implemented by AI agent. Tests passing. Awaiting human review.",
            "--base", "main",
            "--head", branch,
        ],
        workdir=workdir,
    )

    if not pr_result["success"]:
        raise RuntimeError(f"Failed to open PR:\n{pr_result['stderr']}")

    pr_url = pr_result["stdout"].strip()
    print(f"PR opened: {pr_url}")
    return pr_url

At this point the code is on GitHub, the PR is open, and the workspace has done its job.

Phase 4 — Suspending the Workspace During Review

A workspace that sits idle while waiting for a code review is pure waste: CPU throttled but memory still allocated, the container image still occupying node resources. Eclipse Che supports workspace suspension — stopping the pod but preserving the persistent volume so the source tree survives intact.

def suspend_workspace(workspace_id: str) -> None:
    """Stop the workspace pod while preserving /projects on the PV."""
    httpx.delete(
        f"{CHE_API}/workspace/{workspace_id}/runtime",
        headers={"Authorization": f"Bearer {CHE_TOKEN}"},
    ).raise_for_status()
    print(f"Workspace {workspace_id} suspended. /projects volume retained.")

Or via chectl if you prefer the CLI:

chectl workspace:stop <workspace-id>

The workspace transitions to STOPPED state. The pod is deleted. The persistent volume remains bound to the DevWorkspace custom resource. No source code is lost.

Phase 5 — Resuming for Review Feedback

If the reviewer requests changes, the orchestrator receives a webhook from GitHub (or polls the PR), starts the workspace back up, and re-enters the implement-test-commit loop without reprovisioning from scratch:

def resume_workspace(workspace_id: str) -> str:
    """Restart a stopped workspace and return its exec URL once Running."""
    httpx.post(
        f"{CHE_API}/workspace/{workspace_id}/runtime",
        headers={"Authorization": f"Bearer {CHE_TOKEN}"},
    ).raise_for_status()

    for _ in range(60):
        status = httpx.get(
            f"{CHE_API}/workspace/{workspace_id}",
            headers={"Authorization": f"Bearer {CHE_TOKEN}"},
        ).json()
        if status.get("status") == "RUNNING":
            return subprocess.check_output(
                [
                    "kubectl", "get", "pods",
                    "--namespace", "eclipse-che",
                    "--selector",
                    f"controller.devfile.io/devworkspace-id={workspace_id}",
                    "--output", "jsonpath={.items[0].metadata.name}",
                ],
                text=True,
            ).strip()
        time.sleep(5)

    raise TimeoutError("Workspace did not resume in 5 minutes")

Resume time is significantly faster than cold-start provisioning because the container image is already cached on the node and bundle install can skip gems already present in the persistent volume’s gem cache (/home/user/.bundle). Expect 60–90 seconds rather than three to five minutes.

Once the PR is merged and the branch is deleted, the orchestrator tears down the workspace entirely:

chectl workspace:delete <workspace-id>

The persistent volume is released. The cluster is back to baseline. The entire task lifecycle — from ticket to merged PR — leaves no orphaned resources behind.

The Complete Lifecycle at a Glance

Ticket assigned
      │
      ▼
provision_workspace()          ← cold start (~3–5 min)
      │
      ▼
run_task()                     ← clone → Copilot implement → test loop
      │
      ▼
push_and_open_pr()             ← git commit + gh pr create
      │
      ▼
suspend_workspace()            ← pod deleted, PV retained (~0 cost)
      │
 [PR under review]
      │
      ├── reviewer approves ──► workspace:delete  ←── done
      │
      └── changes requested ──► resume_workspace()
                                      │
                                      ▼
                                 run_task() [fix loop]
                                      │
                                      ▼
                                 push to same branch
                                      │
                                      ▼
                                 suspend_workspace()

The workspace is live only when work is actively happening. During human review — which can take hours or days — zero pod resources are consumed.


Alternative Architecture: Copilot as the In-Pod Agent

The walkthrough above has the orchestrator calling the Copilot API from outside the cluster and piping generated code into the workspace via kubectl exec. There is a second, equally valid approach: send only the task prompt into the pod and let Copilot — running directly inside the workspace — handle the code generation, file writing, and the test loop itself.

The architectural distinction is subtle but significant:

  • External orchestrator model — the orchestrator calls api.githubcopilot.com, receives generated code, and writes files into the pod via kubectl exec. The pod is a passive execution environment; the orchestrator is the reasoning layer.
  • In-pod agent model — the orchestrator delivers only the task description; a self-contained agent script running inside the pod calls Copilot, writes files directly to /projects, and runs the test loop. The pod is an active reasoning environment; the orchestrator is a thin lifecycle manager.

The in-pod model simplifies two pain points from the walkthrough. First, GITHUB_TOKEN is already present as an environment variable inside the container — there is no need for the orchestrator to hold or pass it. Second, file writes are plain Python open() calls against the local filesystem — no base64 encoding, no kubectl exec pipe, no shell metacharacter risk.

The In-Pod Agent Script

Place this script in your workspace image or write it into the pod at provisioning time. The orchestrator invokes it with a single kubectl exec call, passing the task description as a command-line argument.

#!/usr/bin/env python3
"""
in_pod_agent.py — executes INSIDE the workspace container.

The orchestrator triggers this script via a single kubectl exec call,
passing the task description as a positional argument. All Copilot API
calls, file writes, and test iterations happen inside the pod using the
GITHUB_TOKEN env var that the Devfile injects from the Kubernetes Secret.
"""
import os
import sys
import subprocess
import httpx

COPILOT_API = "https://api.githubcopilot.com"
APP_DIR = "/projects/app"
MAX_FIX_ATTEMPTS = 3


def call_copilot(prompt: str, context_files: dict[str, str]) -> str:
    token = os.environ.get("GITHUB_TOKEN")
    if not token:
        raise EnvironmentError("GITHUB_TOKEN is not set in the pod environment")
    messages = [
        {
            "role": "system",
            "content": (
                "You are an expert Ruby developer. "
                "Respond with only the complete file content requested. "
                "No prose, no markdown fences."
            ),
        }
    ]
    for filename, content in context_files.items():
        messages.append({
            "role": "user",
            "content": f"Existing file `{filename}`:\n{content}",
        })
    messages.append({"role": "user", "content": prompt})

    resp = httpx.post(
        f"{COPILOT_API}/chat/completions",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
            "Copilot-Integration-Id": "vscode-chat",
        },
        json={"model": "gpt-4o", "messages": messages},
        timeout=120,
    )
    resp.raise_for_status()
    return resp.json()["choices"][0]["message"]["content"]


def read_file(path: str) -> str:
    with open(path, encoding="utf-8") as f:
        return f.read()


def write_file(path: str, content: str) -> None:
    os.makedirs(os.path.dirname(path), exist_ok=True)
    with open(path, "w", encoding="utf-8") as f:
        f.write(content)


def run(command: list[str]) -> subprocess.CompletedProcess:
    return subprocess.run(command, cwd=APP_DIR, capture_output=True, text=True)


def main(task_description: str) -> None:
    # Read existing source files to give Copilot context
    app_rb = read_file(f"{APP_DIR}/app.rb")
    gemfile = read_file(f"{APP_DIR}/Gemfile")

    # Generate middleware implementation — Copilot API called from inside the pod
    middleware_code = call_copilot(
        prompt=(
            f"Task: {task_description}\n"
            "Write `lib/rate_limit_middleware.rb` implementing a Rack middleware "
            "for rate limiting, and return only that file's content."
        ),
        context_files={"app.rb": app_rb, "Gemfile": gemfile},
    )
    # Write directly to the local filesystem — no kubectl exec, no base64
    write_file(f"{APP_DIR}/lib/rate_limit_middleware.rb", middleware_code)

    # Generate test file
    test_code = call_copilot(
        prompt=(
            "Write `test/rate_limit_middleware_test.rb` with minitest coverage "
            "for RateLimitMiddleware. Return only the file content."
        ),
        context_files={"lib/rate_limit_middleware.rb": middleware_code},
    )
    write_file(f"{APP_DIR}/test/rate_limit_middleware_test.rb", test_code)

    # Test-fix loop — all inside the pod, no round-trips to the orchestrator
    for attempt in range(MAX_FIX_ATTEMPTS):
        result = run(["bundle", "exec", "rake", "test"])
        if result.returncode == 0:
            print(f"Tests passed on attempt {attempt + 1}", flush=True)
            return

        # Copilot diagnoses and patches without leaving the pod
        middleware_code = call_copilot(
            prompt=(
                "The test suite failed. Fix the implementation.\n"
                f"Test output:\n{result.stdout}\n{result.stderr}\n"
                "Return only the corrected `lib/rate_limit_middleware.rb` content."
            ),
            context_files={"lib/rate_limit_middleware.rb": middleware_code},
        )
        write_file(f"{APP_DIR}/lib/rate_limit_middleware.rb", middleware_code)

    print(f"Tests still failing after {MAX_FIX_ATTEMPTS} attempts", file=sys.stderr)
    sys.exit(1)


if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: in_pod_agent.py '<task description>'", file=sys.stderr)
        sys.exit(1)
    main(sys.argv[1])

The Simplified Orchestrator

With the reasoning logic inside the pod, the orchestrator shrinks to lifecycle management and task delivery. It no longer holds the GITHUB_TOKEN, calls the Copilot API, generates code, or manages file writes:

def run_task_in_pod(pod_name: str, task_description: str) -> None:
    """
    Deliver a task to the in-pod Copilot agent and wait for completion.
    The pod handles all code generation, file writing, and test iteration.
    """
    # The agent script can be baked into the container image (preferred) or
    # written to the pod once at provisioning time using write_file_to_workspace().

    result = run_in_workspace(
        pod_name,
        ["python3", "/usr/local/bin/in_pod_agent.py", task_description],
        timeout=600,
    )

    if not result["success"]:
        raise RuntimeError(
            f"In-pod agent failed:\n{result['stdout']}\n{result['stderr']}"
        )

    print("In-pod agent completed successfully.")

From here the workflow is identical to Phases 3–5: push_and_open_pr(), then suspend_workspace(). The orchestrator’s role throughout the entire task is: provision → deliver prompt → wait for exit code → commit and push → suspend. It never touches the Copilot API directly.

When to Choose Each Model

External orchestratorIn-pod Copilot agentVSCode bridge
Code generationOutside the podInside the podInside the pod (via vscode.lm)
File writeskubectl exec + base64 pipePlain open() on local filesystemkubectl exec + base64 pipe
Token managementOrchestrator holds GITHUB_TOKENPod’s injected Secret onlyPod’s injected Secret only (VS Code manages the session)
Orchestrator complexityHigher — manages Copilot callsLower — delivers prompt and waitsMedium — port-forwards bridge, no Copilot SDK
Pod autonomyPassive execution environmentActive reasoning environmentShared — extension reasons, orchestrator executes
Intermediate output visibilityAt orchestrator layerIn pod logs (kubectl logs)At orchestrator layer (bridge HTTP response)
Network egress from podNot required for code generationRequired (Copilot API calls)Required (Copilot API calls via bridge)
Direct Copilot API integrationOrchestrator Python codeIn-pod Python codeNone — VS Code extension host only

All three models provide identical sandboxing guarantees — the workspace pod is the security boundary in every case. The in-pod model keeps reasoning and execution collocated. The external model keeps every Copilot interaction observable at a single orchestrator layer. The VSCode bridge model eliminates Copilot API coupling from the orchestrator entirely, at the cost of a custom extension.


Third Alternative: Orchestrator → VSCode → Copilot via the Language Model API

Both previous models call api.githubcopilot.com directly — the external orchestrator from Python running outside the cluster, the in-pod agent from Python running inside the container. A third model removes those direct API calls entirely. Instead, the orchestrator sends task prompts to a bridge extension running inside the Code-OSS server, and the bridge uses VS Code’s built-in Language Model API (vscode.lm) to request completions from Copilot. VS Code’s Copilot extension — already authenticated via GITHUB_TOKEN from the Kubernetes Secret — handles the credential management and the API call. No Python code anywhere in the system calls api.githubcopilot.com.

The vscode.lm API, available since VS Code 1.85, lets any extension running in the Code-OSS extension host request chat completions from Copilot using whatever credentials the Copilot extension has already established at startup. Because the Devfile injects GITHUB_TOKEN as an environment variable, Copilot authenticates silently, and the bridge extension inherits that authenticated session transparently.

Orchestrator (Python, outside cluster)
    │  kubectl port-forward  →  HTTP POST /task
    ▼
Bridge Extension (TypeScript, Code-OSS extension host in pod)
    │  vscode.lm.selectChatModels + model.sendRequest
    ▼
Copilot Extension (authenticated via GITHUB_TOKEN from K8s Secret)
    │  HTTPS
    ▼
api.githubcopilot.com

From the orchestrator’s perspective, Copilot is a black box reached through VS Code. The orchestrator has no Copilot SDK dependency, no GITHUB_TOKEN in its environment, and no network route to api.githubcopilot.com — that egress happens entirely inside the pod.

The Bridge Extension

Add this TypeScript extension to your workspace image or declare it as a plugin component in the Devfile. It starts an HTTP server on 127.0.0.1:9001 when Code-OSS activates it and forwards incoming task requests to Copilot via vscode.lm.

// extension.ts — runs inside the Code-OSS extension host in the workspace pod
import * as vscode from 'vscode';
import * as http from 'http';

let server: http.Server | undefined;

export function activate(context: vscode.ExtensionContext): void {
    server = http.createServer(async (req, res) => {
        if (req.method !== 'POST' || req.url !== '/task') {
            res.writeHead(404);
            res.end();
            return;
        }

        const body = await readBody(req);
        const { prompt, context_files } = JSON.parse(body) as {
            prompt: string;
            context_files: Record<string, string>;
        };

        try {
            const content = await runCopilotTask(prompt, context_files);
            res.writeHead(200, { 'Content-Type': 'application/json' });
            res.end(JSON.stringify({ success: true, content }));
        } catch (err: unknown) {
            res.writeHead(500, { 'Content-Type': 'application/json' });
            res.end(JSON.stringify({ success: false, error: String(err) }));
        }
    });

    server.listen(9001, '127.0.0.1', () =>
        console.log('Orchestrator bridge listening on 127.0.0.1:9001'),
    );
    context.subscriptions.push({ dispose: () => server?.close() });
}

async function runCopilotTask(
    prompt: string,
    contextFiles: Record<string, string>,
): Promise<string> {
    // vscode.lm is VS Code's Language Model API — no token needed here.
    // The Copilot extension, already authenticated via GITHUB_TOKEN, provides the model.
    const [model] = await vscode.lm.selectChatModels({
        vendor: 'copilot',
        family: 'gpt-4o',
    });

    if (!model) {
        throw new Error('No Copilot chat model available — is the Copilot extension active?');
    }

    const messages: vscode.LanguageModelChatMessage[] = [
        vscode.LanguageModelChatMessage.Assistant(
            'You are an expert Ruby developer. ' +
            'Respond with only the complete file content requested. ' +
            'No prose, no markdown fences.',
        ),
    ];

    for (const [filename, content] of Object.entries(contextFiles)) {
        messages.push(
            vscode.LanguageModelChatMessage.User(
                `Existing file \`${filename}\`:\n${content}`,
            ),
        );
    }
    messages.push(vscode.LanguageModelChatMessage.User(prompt));

    const cts = new vscode.CancellationTokenSource();
    const response = await model.sendRequest(messages, {}, cts.token);

    const chunks: string[] = [];
    for await (const chunk of response.text) {
        chunks.push(chunk);
    }
    return chunks.join('');
}

function readBody(req: http.IncomingMessage): Promise<string> {
    return new Promise((resolve, reject) => {
        const chunks: Buffer[] = [];
        req.on('data', (chunk: Buffer) => chunks.push(chunk));
        req.on('end', () => resolve(Buffer.concat(chunks).toString()));
        req.on('error', reject);
    });
}

export function deactivate(): void {
    server?.close();
}

The bridge activates automatically when Code-OSS starts. No orchestrator action is required to initialise it.

The Orchestrator: No Copilot Dependency

Replace every copilot_generate() call in the walkthrough with a call to run_via_vscode(). The orchestrator opens a kubectl port-forward tunnel to the bridge’s local port, sends the task as a JSON body, and receives the generated content in the response. It never imports an HTTP client pointing at api.githubcopilot.com, and GITHUB_TOKEN never appears in its environment:

import subprocess
import time
import httpx

def run_via_vscode(
    pod_name: str,
    prompt: str,
    context_files: dict[str, str],
    namespace: str = "eclipse-che",
    local_port: int = 19001,
) -> str:
    """
    Forward port 9001 from the workspace pod and call the bridge extension.

    VS Code's Copilot extension handles authentication internally.
    The orchestrator never reads or transmits GITHUB_TOKEN.
    """
    proc = subprocess.Popen(
        [
            "kubectl", "port-forward", pod_name,
            f"{local_port}:9001",
            "--namespace", namespace,
        ],
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )
    try:
        time.sleep(2)  # Allow the tunnel to establish
        resp = httpx.post(
            f"http://127.0.0.1:{local_port}/task",
            json={"prompt": prompt, "context_files": context_files},
            timeout=300,
        )
        resp.raise_for_status()
        return resp.json()["content"]
    finally:
        proc.terminate()
        proc.wait()

Everything else in the walkthrough — write_file_to_workspace() via kubectl exec, the test loop, push_and_open_pr(), suspend_workspace() — is unchanged. Only the code generation step is delegated to VS Code.

Devfile Addition

Expose port 9001 from the workspace container so that kubectl port-forward can reach the bridge:

components:
  - name: ruby-dev
    container:
      # ... existing image, limits, env ...
      endpoints:
        - name: http-app
          exposure: public
          protocol: http
          targetPort: 3000
        - name: orchestrator-bridge
          exposure: none          # internal only — not published via Che's ingress
          protocol: http
          targetPort: 9001

exposure: none keeps the bridge off the public ingress while still making it reachable via kubectl port-forward for the orchestrator.

Trade-offs of the VSCode Bridge Model

The bridge model makes sense when you want to keep the orchestrator completely decoupled from Copilot’s authentication and API surface. The orchestrator becomes a pure lifecycle and task-delivery mechanism; all credential handling stays inside the pod with VS Code.

The cost is an additional component to build and maintain: a TypeScript VS Code extension, its build toolchain, packaging it into the workspace image (or distributing it via Open VSX), and verifying it activates correctly when Code-OSS starts. The vscode.lm API is also extension-host-only — it cannot be called from a plain Node.js script or a terminal command, so the bridge extension is a genuine requirement, not an optional optimisation.

For teams that already build internal VS Code extensions and want a clean separation between “orchestration logic” and “AI credential management,” this model is the strongest architectural choice. For teams that want to minimise the number of moving parts, the in-pod agent model achieves most of the same credential isolation with a single Python script.


The Pros and Cons of This Approach

Every architectural decision is a trade-off. Here is an honest assessment of this one.

Pros

Bulletproof Sandboxing and Security

Kubernetes resource limits are enforced at the kernel level by cgroups. An agent that generates and executes loop { fork } in Ruby will hit the pod’s CPU throttle and PID limit — it cannot cascade to other workloads on the node. The container’s filesystem is ephemeral by default; any destructive file operation is scoped to the pod. NetworkPolicy lets you cut off egress to the public internet entirely if your task does not require it. You get defence-in-depth without any agent-specific security code.

Standardised State and Instant Recovery

The Devfile is the environment. When the AI corrupts the workspace — a broken gem install, a botched configuration change, an infinite process that exhausted disk — you issue one API call to delete the workspace and one to create a new one. In three to five minutes (image pull + pod init + bundle install) you have a byte-for-byte identical environment with no configuration drift and no manual cleanup step. That restart time is your recovery cost. For long-running investigations, the persistent volume at /projects survives workspace deletion by default, so the source code is not lost — only the transient runtime state is reset.

Seamless Human-AI Collaboration

Because every workspace runs a full Code-OSS server, a human engineer can connect their desktop VS Code to the remote workspace via the VS Code Remote - Tunnels extension or directly through Che’s workspace URL. The human sees exactly what the agent sees: the same files, the same terminal history, the same Copilot extension state. There is no “export the agent’s work to review it” step. The human can open the Copilot Chat panel, read the agent’s conversation history, and intervene directly — or simply observe and let the agent continue. This makes human-in-the-loop review a first-class operation rather than an afterthought.

Cons

Resource Overhead

A minimal Che workspace with the Code-OSS server, the ruby-lsp language server, and the Copilot extension active consumes roughly 600–900 MB of RAM before your application’s own dependencies are loaded. Add bundle install and a running Rails app and you are comfortably above 1.5 GB. If you are running multiple concurrent agent workspaces — for example, one per open pull request — the cluster memory cost scales linearly. This is a meaningful cost compared to running a bare Ruby process locally. You will need to size your node pools accordingly and enforce workspace idle-timeout policies to avoid paying for dormant pods.

Cold Start Latency

The first time a workspace starts, the container image must be pulled. A reasonably sized Ruby development image (1–2 GB) can take 60–90 seconds on a cold node. Add pod scheduling, secret injection, language server initialisation, and Copilot extension startup, and the total cold start time can reach three to five minutes. That latency is acceptable for a workspace that will run for hours. It is not acceptable for a high-frequency task loop where the agent creates and destroys a workspace for every small task. The mitigation is to keep workspaces warm between tasks, pulling them back to a known state with a reset command rather than a full teardown-and-rebuild.

Authentication Complexity

GitHub Copilot tokens tied to an individual user account create a single point of failure. If that account’s Copilot subscription lapses, every agent workspace that depends on it fails simultaneously. Organisation-managed tokens help, but they require org-admin cooperation and careful IAM scoping. Kubernetes Secret rotation adds another operational surface: secrets need to be refreshed before they expire, the pod needs to pick up the new value without a restart (or your automation needs to handle the restart), and the rotation event needs to be audited. External Secrets Operator handles the sync, but it introduces another Kubernetes controller to operate, monitor, and upgrade. None of this is insurmountable, but it is not zero-cost infrastructure.


Conclusion and Looking Forward

Running AI agents on a laptop is a prototype-level architecture. Running them inside Eclipse Che workspaces on Kubernetes is production-grade.

The combination reframes what “AI-assisted development” means. The agent is not a plugin that lives in your IDE session. It is an autonomous peer that occupies its own fully specified, resource-bounded, version-controlled development environment. Humans and agents share the same infrastructure, the same tooling, and the same reproducibility guarantees. The Devfile is not a configuration file — it is the agent’s contract with the team about what kind of environment it is allowed to operate in.

The next evolution of this pattern is Model Context Protocol in remote containers. MCP defines a standard interface through which an AI agent can discover and invoke tools — filesystem access, terminal execution, browser control, API clients — without those tools being hardcoded into the agent’s implementation. As MCP servers become deployable as Kubernetes sidecar containers, the architecture described in this post gains a powerful extension mechanism: drop an MCP server sidecar into the Devfile, and the agent automatically gains a new, well-scoped capability without any changes to the orchestrator. A database introspection MCP server. A GitHub Actions log reader. A Datadog metrics query client. All available as declarative additions to the Devfile, all scoped to the workspace pod, all cleanly revocable by removing the sidecar definition.

That is the trajectory: from “AI agent with a terminal” to “AI agent with a declarative, composable, auditable capability surface” — running in a container, on Kubernetes, next to your production workloads, but safely isolated from them.