Getting Started with Runme: Executable Documentation for Incident Management, Infrastructure, DevOps, and Security

READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.

Introduction

In the fast-paced world of DevOps, incident management, and security operations, documentation often becomes outdated the moment it’s written. Teams struggle with runbooks that contain commands copied into terminals, scripts scattered across multiple repositories, and processes that work on one engineer’s machine but fail elsewhere. Enter Runme.dev—a revolutionary approach to documentation that makes your markdown files executable.

Runme transforms traditional static documentation into interactive, executable runbooks. Rather than copying commands from documentation into your terminal, Runme allows you to run commands directly from your markdown files with a single click, all while maintaining the context and explanations that make documentation valuable.

This guide is designed for teams just starting to adopt Runme, with a focus on practical use cases in incident management, infrastructure operations, DevOps workflows, and security operations. We’ll explore how Runme integrates with external systems like AWS, how authentication works, where code executes, and how state persists across sessions.

What is Runme?

Runme is a tool that bridges the gap between documentation and execution. It works as:

VS Code Extension: The primary interface, turning your VS Code editor into an interactive notebook experience for markdown files
CLI Tool: A command-line interface for running markdown-based runbooks in CI/CD pipelines or directly from the terminal
Notebook Interface: A cell-based execution environment similar to Jupyter notebooks, but for operational tasks

At its core, Runme parses markdown files and makes code blocks executable. Instead of:

## Restart the service
Copy and run this command:
\`\`\`bash
kubectl rollout restart deployment/my-app -n production
\`\`\`

With Runme, users click a “Run” button next to the code block, and the command executes in a managed environment with proper context, logging, and state tracking.

Understanding Runme’s Architecture

Before diving into use cases, it’s crucial to understand three fundamental aspects of Runme: authentication, runtime environment, and persistence.

Authentication: How Runme Handles Credentials

Runme itself does not store or manage credentials. This is a critical security feature that makes Runme safe for sensitive operations.

Local Execution Model

When you run commands through Runme, they execute in your local shell environment with your existing credentials. This means:

AWS Credentials: Commands use your configured AWS CLI credentials (~/.aws/credentials, environment variables, or SSO sessions)
Kubernetes Contexts: kubectl commands use your current kubeconfig context (~/.kube/config)
SSH Keys: SSH-based operations use your local SSH agent and keys
API Tokens: Environment variables and credential files on your system are accessible

Example: If you have AWS SSO configured with multiple profiles:

# This command runs in your local shell with your AWS credentials
aws s3 ls --profile production

Runme executes this exactly as if you typed it in your terminal, using the credentials associated with the production profile.

Environment-Based Authentication

Runme supports environment variables within notebook cells, allowing you to:

Set context-specific variables: Define AWS profiles, Kubernetes namespaces, or API endpoints per runbook
Inherit from parent shell: Commands inherit environment variables from the shell where Runme was launched
Scope credentials per cell: Different cells can use different credential contexts

Example runbook with environment configuration:

## Configuration
\`\`\`bash {"name":"config"}
export AWS_PROFILE=production
export AWS_REGION=us-east-1
export KUBE_CONTEXT=prod-eks-cluster
\`\`\`

## Check AWS Resources
\`\`\`bash {"name":"check-aws"}
# This uses the AWS_PROFILE set above
aws ec2 describe-instances --region $AWS_REGION
\`\`\`

## Check Kubernetes Pods
\`\`\`bash {"name":"check-k8s"}
kubectl config use-context $KUBE_CONTEXT
kubectl get pods -n production
\`\`\`

Integration with Credential Managers

Runme integrates seamlessly with enterprise credential management systems:

AWS SSO: Run aws sso login in a Runme cell before AWS operations
HashiCorp Vault: Use Vault CLI commands to fetch secrets dynamically
1Password/LastPass: Use CLI tools to inject secrets at runtime
Cloud IAM: Leverage cloud provider IAM roles when running in cloud environments

Runtime: Where Does Code Execute?

Understanding where your code runs is crucial for security and operational planning.

Local Execution (Default)

By default, Runme executes commands on your local machine in a shell session. This means:

File system access: Commands can read/write files on your local disk
Network access: Commands make network requests from your IP address
Process isolation: Each cell runs as a subprocess of the Runme process
Resource limits: Commands are subject to your machine’s CPU, memory, and network limits

Security Implications:

Commands have the same permissions as your user account
Malicious runbooks could potentially harm your system (always review before running)
Network policies and firewalls apply as they would for any local process

Shell Sessions and State

Runme manages shell sessions intelligently:

Persistent Sessions: By default, cells share a single shell session, meaning environment variables and directory changes persist between cells
Named Sessions: You can create multiple named sessions to isolate different contexts
Session Cleanup: Sessions terminate when you close the runbook or explicitly end them

Example showing session persistence:

## Navigate to Project Directory
\`\`\`bash
cd /opt/projects/my-app
export APP_ENV=production
\`\`\`

## Build Application
\`\`\`bash
# This runs in the same directory and has access to APP_ENV
npm run build:$APP_ENV
\`\`\`

Remote Execution (Advanced)

While Runme primarily runs locally, it can orchestrate remote execution:

SSH into remote hosts: Use standard SSH commands in cells
Cloud shell integration: Execute commands in AWS CloudShell, GCP Cloud Shell, or Azure Cloud Shell
Container execution: Run commands inside Docker containers or Kubernetes pods
CI/CD integration: Runme CLI can run runbooks in CI/CD pipeline runners

Example remote execution pattern:

## Execute on Production Server
\`\`\`bash
ssh production-server << 'EOF'
  cd /var/www/app
  sudo systemctl restart nginx
  curl -sf http://localhost/health
EOF
\`\`\`

## Execute in Kubernetes Pod
\`\`\`bash
kubectl exec -n production deploy/api-server -- \
  python manage.py check_health
\`\`\`

Persistence: How State is Maintained

Runme provides multiple persistence mechanisms to maintain context across time and sessions.

Cell Output History

Every cell execution is logged with:

Stdout/stderr capture: Complete output from commands
Exit codes: Success/failure status
Execution timestamps: When commands ran
Execution duration: How long commands took

This history is stored in:

VS Code: In-memory during the session, with optional disk persistence
Runme CLI: Output can be saved to files or streamed to logging systems

Environment Variables and Context

Runme can persist context between sessions through:

Markdown front matter: Store variables in YAML front matter at the top of runbooks
Environment files: Load .env files or export to persist state
Named cells: Reference outputs from previous cells by name
Session files: Export session state to files for later restoration

Example with persistent configuration:

---
runme:
  version: v3
shell: bash
env:
  AWS_PROFILE: production
  AWS_REGION: us-west-2
  CLUSTER_NAME: prod-eks-01
---

# Production Operations Runbook

## Configuration is loaded automatically from front matter

\`\`\`bash {"name":"verify-config"}
echo "AWS Profile: $AWS_PROFILE"
echo "Region: $AWS_REGION"
echo "Cluster: $CLUSTER_NAME"
\`\`\`

State Management Patterns

For complex workflows, consider these patterns:

Checkpoint cells: Save state to files that subsequent cells can load
Idempotent operations: Design commands to be safely re-runnable
State verification cells: Include cells that check system state before operations

Example stateful incident response:

## Step 1: Capture Incident Context
\`\`\`bash {"name":"capture-context"}
INCIDENT_ID="INC-$(date +%Y%m%d-%H%M%S)"
INCIDENT_DIR="/tmp/incidents/$INCIDENT_ID"
mkdir -p "$INCIDENT_DIR"

echo "Incident ID: $INCIDENT_ID" | tee "$INCIDENT_DIR/metadata.txt"
echo "Started: $(date)" | tee -a "$INCIDENT_DIR/metadata.txt"

# Export for other cells
export INCIDENT_ID
export INCIDENT_DIR
\`\`\`

## Step 2: Gather Logs (uses context from Step 1)
\`\`\`bash {"name":"gather-logs"}
kubectl logs -n production -l app=api --tail=1000 \
  > "$INCIDENT_DIR/api-logs.txt"
  
aws logs tail /aws/ecs/api-service --since 1h \
  > "$INCIDENT_DIR/ecs-logs.txt"
\`\`\`

## Step 3: Generate Report (persists to disk)
\`\`\`bash {"name":"generate-report"}
cat > "$INCIDENT_DIR/summary.md" << EOF
# Incident Report: $INCIDENT_ID

Started: $(date)
Status: In Progress

## Symptoms Observed
- API latency increased to >2s
- Error rate at 5%

## Data Collected
- API pod logs: $(wc -l < "$INCIDENT_DIR/api-logs.txt") lines
- ECS task logs: $(wc -l < "$INCIDENT_DIR/ecs-logs.txt") lines
EOF

echo "Report saved to $INCIDENT_DIR/summary.md"
\`\`\`

Use Case 1: Incident Management

Incident response requires speed, accuracy, and clear communication. Runme transforms incident runbooks from static documents into interactive response tools.

Incident Response Runbook

Here’s a complete incident response runbook demonstrating Runme’s capabilities:

# API Service Incident Response Runbook

## 1. Initial Assessment

### Check Service Health
\`\`\`bash {"name":"health-check"}
# Check endpoint availability
curl -sf https://api.example.com/health || echo "❌ Health check failed"

# Check response time
time curl -sf https://api.example.com/health > /dev/null
\`\`\`

### Check Error Rates
\`\`\`bash {"name":"error-rates"}
# Query last 5 minutes of errors from CloudWatch
aws cloudwatch get-metric-statistics \
  --namespace "API/Production" \
  --metric-name ErrorRate \
  --start-time $(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 60 \
  --statistics Average,Maximum \
  --dimensions Name=Environment,Value=production
\`\`\`

## 2. Identify Root Cause

### Check Pod Status
\`\`\`bash {"name":"pod-status"}
kubectl get pods -n production -l app=api-service -o wide
\`\`\`

### Review Recent Logs
\`\`\`bash {"name":"recent-logs"}
# Get logs from last 10 minutes
kubectl logs -n production -l app=api-service \
  --since=10m \
  --tail=100 \
  | grep -i "error\|exception\|fatal"
\`\`\`

### Check Dependencies
\`\`\`bash {"name":"check-deps"}
# Database connectivity
kubectl exec -n production deploy/api-service -- \
  timeout 5 nc -zv postgres-service 5432

# Redis connectivity
kubectl exec -n production deploy/api-service -- \
  timeout 5 redis-cli -h redis-service ping
\`\`\`

### Check Recent Deployments
\`\`\`bash {"name":"recent-deploys"}
# Check rollout history
kubectl rollout history deployment/api-service -n production

# Check recent events
kubectl get events -n production \
  --sort-by='.lastTimestamp' \
  --field-selector involvedObject.name=api-service \
  | tail -20
\`\`\`

## 3. Mitigation Actions

### Scale Up Pods
\`\`\`bash {"name":"scale-up"}
# Increase replica count
kubectl scale deployment/api-service -n production --replicas=10

# Wait for new pods to be ready
kubectl wait --for=condition=Ready pod \
  -l app=api-service \
  -n production \
  --timeout=300s
\`\`\`

### Restart Deployment (if needed)
\`\`\`bash {"name":"restart-deployment"}
# Rolling restart
kubectl rollout restart deployment/api-service -n production

# Monitor rollout status
kubectl rollout status deployment/api-service -n production
\`\`\`

### Rollback to Previous Version (if regression)
\`\`\`bash {"name":"rollback"}
# Rollback to previous revision
kubectl rollout undo deployment/api-service -n production

# Verify rollback
kubectl rollout status deployment/api-service -n production
\`\`\`

## 4. Verification

### Verify Service Recovery
\`\`\`bash {"name":"verify-recovery"}
# Check health endpoint
for i in {1..5}; do
  echo "Attempt $i:"
  curl -sf https://api.example.com/health && echo "✓ OK" || echo "✗ Failed"
  sleep 2
done
\`\`\`

### Monitor Error Rates Post-Fix
\`\`\`bash {"name":"monitor-errors"}
# Real-time error monitoring for 1 minute
timeout 60 watch -n 5 'kubectl logs -n production -l app=api-service --tail=50 | grep -c ERROR'
\`\`\`

## 5. Documentation

### Generate Incident Report
\`\`\`bash {"name":"incident-report"}
INCIDENT_ID="INC-$(date +%Y%m%d-%H%M%S)"

cat > "/tmp/incident-$INCIDENT_ID.md" << EOF
# Incident Report

**Incident ID**: $INCIDENT_ID
**Date**: $(date)
**Duration**: [TO BE FILLED]
**Severity**: [TO BE FILLED]

## Timeline
- $(date): Incident detected
- $(date): Initial assessment completed
- $(date): Mitigation applied
- $(date): Service recovered

## Root Cause
[TO BE FILLED AFTER INVESTIGATION]

## Actions Taken
1. Scaled deployment from 5 to 10 replicas
2. Restarted pods
3. Verified service health

## Next Steps
- [ ] Post-incident review
- [ ] Update monitoring alerts
- [ ] Document lessons learned
EOF

echo "Report created: /tmp/incident-$INCIDENT_ID.md"
cat "/tmp/incident-$INCIDENT_ID.md"
\`\`\`

Benefits for Incident Management

Speed: One-click execution eliminates typing errors and command lookup time
Consistency: Everyone follows the same tested procedure
Audit Trail: Complete log of actions taken during incident response
Collaboration: Team members can see what commands were run and their results
Learning: New team members can execute runbooks with guidance, building expertise
Version Control: Runbooks are versioned in Git, with history of improvements

Use Case 2: Infrastructure Management

Infrastructure teams manage cloud resources, server configurations, and deployment pipelines. Runme makes infrastructure operations repeatable and auditable.

AWS Infrastructure Management Runbook

# AWS Infrastructure Audit and Management

## Environment Setup
\`\`\`bash {"name":"setup-env"}
export AWS_PROFILE=production
export AWS_REGION=us-east-1
export ENVIRONMENT=production

# Verify credentials
aws sts get-caller-identity
\`\`\`

## 1. Inventory Check

### EC2 Instances
\`\`\`bash {"name":"ec2-inventory"}
# List all EC2 instances with key details
aws ec2 describe-instances \
  --region $AWS_REGION \
  --query 'Reservations[].Instances[].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0],PrivateIpAddress,LaunchTime]' \
  --output table
\`\`\`

### RDS Databases
\`\`\`bash {"name":"rds-inventory"}
# List all RDS instances
aws rds describe-db-instances \
  --region $AWS_REGION \
  --query 'DBInstances[].[DBInstanceIdentifier,Engine,EngineVersion,DBInstanceClass,DBInstanceStatus,AllocatedStorage]' \
  --output table
\`\`\`

### EKS Clusters
\`\`\`bash {"name":"eks-inventory"}
# List EKS clusters
aws eks list-clusters --region $AWS_REGION

# Get details for each cluster
for cluster in $(aws eks list-clusters --region $AWS_REGION --query 'clusters[]' --output text); do
  echo "\n=== Cluster: $cluster ==="
  aws eks describe-cluster --name $cluster --region $AWS_REGION \
    --query 'cluster.[status,version,endpoint]' \
    --output table
done
\`\`\`

## 2. Cost Analysis

### Monthly Cost by Service
\`\`\`bash {"name":"cost-analysis"}
# Get cost breakdown for current month
START_DATE=$(date +%Y-%m-01)
END_DATE=$(date +%Y-%m-%d)

aws ce get-cost-and-usage \
  --time-period Start=$START_DATE,End=$END_DATE \
  --granularity MONTHLY \
  --metrics "UnblendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[0].Groups[].[Keys[0],Metrics.UnblendedCost.Amount]' \
  --output table
\`\`\`

### Identify Unused Resources
\`\`\`bash {"name":"unused-resources"}
# Find unattached EBS volumes
echo "=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
  --region $AWS_REGION \
  --filters Name=status,Values=available \
  --query 'Volumes[].[VolumeId,Size,VolumeType,CreateTime]' \
  --output table

# Find unused Elastic IPs
echo "\n=== Unallocated Elastic IPs ==="
aws ec2 describe-addresses \
  --region $AWS_REGION \
  --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
  --output table
\`\`\`

## 3. Security Audit

### Check Security Groups
\`\`\`bash {"name":"security-groups"}
# Find security groups with overly permissive rules (0.0.0.0/0)
aws ec2 describe-security-groups \
  --region $AWS_REGION \
  --query 'SecurityGroups[?IpPermissions[?IpRanges[?CidrIp==`0.0.0.0/0`]]].{ID:GroupId,Name:GroupName,VPC:VpcId}' \
  --output table
\`\`\`

### Check IAM Password Policy
\`\`\`bash {"name":"iam-policy"}
# Verify password policy meets requirements
aws iam get-account-password-policy
\`\`\`

### Check S3 Bucket Encryption
\`\`\`bash {"name":"s3-encryption"}
# Check which buckets lack encryption
for bucket in $(aws s3api list-buckets --query 'Buckets[].Name' --output text); do
  encryption=$(aws s3api get-bucket-encryption --bucket $bucket 2>&1)
  if echo "$encryption" | grep -q "ServerSideEncryptionConfigurationNotFoundError"; then
    echo "❌ $bucket: No encryption"
  else
    echo "✓ $bucket: Encrypted"
  fi
done
\`\`\`

## 4. Maintenance Operations

### Update Auto Scaling Groups
\`\`\`bash {"name":"update-asg"}
ASG_NAME="production-api-asg"

# Get current configuration
aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names $ASG_NAME \
  --query 'AutoScalingGroups[0].[MinSize,MaxSize,DesiredCapacity]' \
  --output table

# Update desired capacity (uncomment to execute)
# aws autoscaling set-desired-capacity \
#   --auto-scaling-group-name $ASG_NAME \
#   --desired-capacity 5
\`\`\`

### Rotate Access Keys (Audit)
\`\`\`bash {"name":"access-key-audit"}
# List access keys older than 90 days
for user in $(aws iam list-users --query 'Users[].UserName' --output text); do
  echo "\n=== User: $user ==="
  aws iam list-access-keys --user-name $user \
    --query 'AccessKeyMetadata[].[AccessKeyId,CreateDate,Status]' \
    --output table
done
\`\`\`

### Snapshot Critical Volumes
\`\`\`bash {"name":"snapshot-volumes"}
# Create snapshots of production volumes
VOLUME_IDS=$(aws ec2 describe-volumes \
  --region $AWS_REGION \
  --filters "Name=tag:Environment,Values=production" "Name=tag:Backup,Values=true" \
  --query 'Volumes[].VolumeId' \
  --output text)

for volume in $VOLUME_IDS; do
  echo "Creating snapshot for $volume..."
  aws ec2 create-snapshot \
    --volume-id $volume \
    --description "Manual backup $(date +%Y-%m-%d)" \
    --tag-specifications "ResourceType=snapshot,Tags=[{Key=CreatedBy,Value=Runme},{Key=Date,Value=$(date +%Y-%m-%d)}]"
done
\`\`\`

## 5. Compliance Reporting

### Generate Compliance Report
\`\`\`bash {"name":"compliance-report"}
REPORT_FILE="/tmp/aws-compliance-$(date +%Y%m%d).txt"

{
  echo "AWS Infrastructure Compliance Report"
  echo "Generated: $(date)"
  echo "Account: $(aws sts get-caller-identity --query Account --output text)"
  echo "Region: $AWS_REGION"
  echo ""
  
  echo "=== CloudTrail Status ==="
  aws cloudtrail describe-trails --region $AWS_REGION
  
  echo "\n=== Config Recorder Status ==="
  aws configservice describe-configuration-recorder-status --region $AWS_REGION
  
  echo "\n=== GuardDuty Status ==="
  aws guardduty list-detectors --region $AWS_REGION
  
  echo "\n=== Security Hub Status ==="
  aws securityhub describe-hub --region $AWS_REGION 2>&1
  
} > "$REPORT_FILE"

echo "Compliance report saved to: $REPORT_FILE"
cat "$REPORT_FILE"
\`\`\`

Infrastructure Benefits

Consistency: Infrastructure operations follow standardized procedures
Safety: Review commands before execution, with clear documentation
Efficiency: Complex multi-step operations in a single runbook
Knowledge Sharing: Junior engineers can execute senior engineer runbooks
Compliance: Auditable record of who ran what commands and when

Use Case 3: DevOps Workflows

DevOps teams orchestrate deployments, manage CI/CD pipelines, and maintain development environments. Runme streamlines these workflows.

Deployment Runbook

# Production Deployment Runbook - API Service v2.5.0

## Pre-Deployment Checklist

### Verify Prerequisites
\`\`\`bash {"name":"verify-prereqs"}
echo "Checking prerequisites..."

# Verify kubectl access
kubectl cluster-info | grep "Kubernetes control plane"

# Verify AWS access
aws sts get-caller-identity

# Verify Docker registry access
docker login registry.example.com

echo "✓ All prerequisites met"
\`\`\`

### Backup Current Configuration
\`\`\`bash {"name":"backup-config"}
BACKUP_DIR="/tmp/deployment-backup-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BACKUP_DIR"

# Backup current Kubernetes manifests
kubectl get deployment api-service -n production -o yaml > "$BACKUP_DIR/deployment.yaml"
kubectl get service api-service -n production -o yaml > "$BACKUP_DIR/service.yaml"
kubectl get configmap api-config -n production -o yaml > "$BACKUP_DIR/configmap.yaml"

echo "✓ Backup saved to $BACKUP_DIR"
export BACKUP_DIR
\`\`\`

## Deployment Steps

### 1. Build and Push Docker Image
\`\`\`bash {"name":"build-image"}
VERSION="2.5.0"
IMAGE_NAME="registry.example.com/api-service"
IMAGE_TAG="$VERSION"

# Build image
cd /path/to/api-service
docker build -t "$IMAGE_NAME:$IMAGE_TAG" .

# Tag as latest
docker tag "$IMAGE_NAME:$IMAGE_TAG" "$IMAGE_NAME:latest"

# Push to registry
docker push "$IMAGE_NAME:$IMAGE_TAG"
docker push "$IMAGE_NAME:latest"

echo "✓ Image pushed: $IMAGE_NAME:$IMAGE_TAG"
\`\`\`

### 2. Update Kubernetes Manifests
\`\`\`bash {"name":"update-manifests"}
VERSION="2.5.0"
IMAGE_NAME="registry.example.com/api-service:$VERSION"

# Update deployment with new image
kubectl set image deployment/api-service \
  api-service="$IMAGE_NAME" \
  -n production

# Annotate with change cause
kubectl annotate deployment/api-service \
  kubernetes.io/change-cause="Deploy version $VERSION" \
  -n production

echo "✓ Deployment updated to $VERSION"
\`\`\`

### 3. Monitor Rollout
\`\`\`bash {"name":"monitor-rollout"}
# Watch rollout status
kubectl rollout status deployment/api-service -n production --timeout=5m

# Verify new pods are running
kubectl get pods -n production -l app=api-service -o wide

echo "✓ Rollout completed successfully"
\`\`\`

### 4. Smoke Tests
\`\`\`bash {"name":"smoke-tests"}
# Get service endpoint
SERVICE_URL="https://api.example.com"

# Test health endpoint
echo "Testing health endpoint..."
curl -sf "$SERVICE_URL/health" | jq '.'

# Test version endpoint
echo "\nTesting version endpoint..."
curl -sf "$SERVICE_URL/version" | jq '.'

# Test sample API call
echo "\nTesting sample API call..."
curl -sf "$SERVICE_URL/api/v1/status" | jq '.'

echo "✓ All smoke tests passed"
\`\`\`

### 5. Performance Validation
\`\`\`bash {"name":"performance-test"}
SERVICE_URL="https://api.example.com"

# Run quick load test
echo "Running performance test (100 requests)..."
ab -n 100 -c 10 "$SERVICE_URL/api/v1/status"

# Check response times
echo "\nChecking p95 response time..."
# (Results from ab command above)
\`\`\`

## Post-Deployment

### Update Monitoring
\`\`\`bash {"name":"update-monitoring"}
# Add annotation to Datadog
DEPLOY_TIME=$(date +%s)
VERSION="2.5.0"

curl -X POST "https://api.datadoghq.com/api/v1/events" \
  -H "DD-API-KEY: $DATADOG_API_KEY" \
  -H "Content-Type: application/json" \
  -d @- << EOF
{
  "title": "API Service Deployed",
  "text": "Version $VERSION deployed to production",
  "priority": "normal",
  "tags": ["environment:production", "service:api", "version:$VERSION"],
  "alert_type": "info"
}
EOF

echo "✓ Monitoring updated"
\`\`\`

### Notify Team
\`\`\`bash {"name":"notify-team"}
VERSION="2.5.0"
DEPLOY_TIME=$(date)

# Post to Slack
curl -X POST "$SLACK_WEBHOOK_URL" \
  -H "Content-Type: application/json" \
  -d @- << EOF
{
  "text": "✅ Production Deployment Complete",
  "blocks": [
    {
      "type": "section",
      "text": {
        "type": "mrkdwn",
        "text": "*API Service v$VERSION* deployed to production\n*Time:* $DEPLOY_TIME\n*Status:* ✅ Success"
      }
    }
  ]
}
EOF

echo "✓ Team notified"
\`\`\`

## Rollback Procedure (If Needed)

### Quick Rollback
\`\`\`bash {"name":"rollback"}
echo "⚠️  Initiating rollback..."

# Rollback to previous revision
kubectl rollout undo deployment/api-service -n production

# Wait for rollback to complete
kubectl rollout status deployment/api-service -n production

# Restore backed up config if needed
if [ -n "$BACKUP_DIR" ]; then
  kubectl apply -f "$BACKUP_DIR/"
fi

echo "✅ Rollback completed"
\`\`\`

DevOps Workflow Benefits

Reproducibility: Same deployment process every time
Visibility: Everyone can see the deployment steps and current status
Safety: Built-in checkpoints and rollback procedures
Speed: One-click deployments instead of manual command execution
Training: New team members learn deployment process by running runbooks

Use Case 4: Security Operations

Security teams need to respond to threats, audit systems, and maintain compliance. Runme provides secure, auditable security operations.

Security Incident Response Runbook

# Security Incident Response - Compromised AWS Account

## Phase 1: Containment

### Immediate Actions - Stop Active Threat
\`\`\`bash {"name":"emergency-stop"}
# Set up incident tracking
INCIDENT_ID="SEC-$(date +%Y%m%d-%H%M%S)"
INCIDENT_DIR="/tmp/security-incident-$INCIDENT_ID"
mkdir -p "$INCIDENT_DIR"

echo "Security Incident: $INCIDENT_ID" | tee "$INCIDENT_DIR/timeline.txt"
echo "Started: $(date)" | tee -a "$INCIDENT_DIR/timeline.txt"

export INCIDENT_ID
export INCIDENT_DIR
\`\`\`

### Disable Compromised User Access
\`\`\`bash {"name":"disable-user"}
COMPROMISED_USER="suspicious-user"

# Disable console access
aws iam delete-login-profile --user-name "$COMPROMISED_USER" 2>/dev/null

# Deactivate all access keys
aws iam list-access-keys --user-name "$COMPROMISED_USER" \
  --query 'AccessKeyMetadata[].AccessKeyId' \
  --output text | while read key; do
  aws iam update-access-key --user-name "$COMPROMISED_USER" --access-key-id "$key" --status Inactive
  echo "$(date): Deactivated access key $key for $COMPROMISED_USER" | tee -a "$INCIDENT_DIR/timeline.txt"
done

echo "✓ User access disabled"
\`\`\`

### Revoke Active Sessions
\`\`\`bash {"name":"revoke-sessions"}
COMPROMISED_USER="suspicious-user"

# Attach policy to deny all actions
cat > /tmp/deny-all-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*"
    }
  ]
}
EOF

# Create and attach deny policy
POLICY_ARN=$(aws iam create-policy \
  --policy-name "DenyAll-$INCIDENT_ID" \
  --policy-document file:///tmp/deny-all-policy.json \
  --query 'Policy.Arn' \
  --output text)

aws iam attach-user-policy --user-name "$COMPROMISED_USER" --policy-arn "$POLICY_ARN"

echo "$(date): Attached deny-all policy to $COMPROMISED_USER" | tee -a "$INCIDENT_DIR/timeline.txt"
echo "✓ Active sessions effectively revoked"
\`\`\`

## Phase 2: Investigation

### Collect CloudTrail Logs
\`\`\`bash {"name":"collect-cloudtrail"}
COMPROMISED_USER="suspicious-user"
START_TIME=$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%S)
END_TIME=$(date -u +%Y-%m-%dT%H:%M:%S)

# Query CloudTrail for user activity
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue="$COMPROMISED_USER" \
  --start-time "$START_TIME" \
  --end-time "$END_TIME" \
  --max-results 50 \
  > "$INCIDENT_DIR/cloudtrail-events.json"

# Extract key information
jq -r '.Events[] | "\(.EventTime) \(.EventName) \(.SourceIPAddress)"' \
  "$INCIDENT_DIR/cloudtrail-events.json" \
  | tee "$INCIDENT_DIR/event-summary.txt"

echo "✓ CloudTrail logs collected"
\`\`\`

### Identify Affected Resources
\`\`\`bash {"name":"identify-resources"}
# List resources created by compromised user in last 24 hours
echo "=== EC2 Instances ===" | tee -a "$INCIDENT_DIR/affected-resources.txt"
aws ec2 describe-instances \
  --filters "Name=tag:CreatedBy,Values=$COMPROMISED_USER" \
  --query 'Reservations[].Instances[].[InstanceId,LaunchTime,State.Name]' \
  --output table | tee -a "$INCIDENT_DIR/affected-resources.txt"

echo "\n=== S3 Buckets ===" | tee -a "$INCIDENT_DIR/affected-resources.txt"
for bucket in $(aws s3api list-buckets --query 'Buckets[].Name' --output text); do
  tags=$(aws s3api get-bucket-tagging --bucket $bucket 2>/dev/null || echo "")
  if echo "$tags" | grep -q "$COMPROMISED_USER"; then
    echo "$bucket" | tee -a "$INCIDENT_DIR/affected-resources.txt"
  fi
done

echo "\n=== IAM Resources ===" | tee -a "$INCIDENT_DIR/affected-resources.txt"
aws iam list-users --query "Users[?contains(UserName, '$COMPROMISED_USER')]" \
  | tee -a "$INCIDENT_DIR/affected-resources.txt"

echo "✓ Affected resources identified"
\`\`\`

### Check for Data Exfiltration
\`\`\`bash {"name":"check-exfiltration"}
# Query CloudTrail for potential data exfiltration events
jq -r '.Events[] | select(.EventName=="GetObject" or .EventName=="DownloadDBSnapshot" or .EventName=="CreateSnapshot") | "\(.EventTime) \(.EventName) \(.Resources[0].ResourceName)"' \
  "$INCIDENT_DIR/cloudtrail-events.json" \
  | tee "$INCIDENT_DIR/potential-exfiltration.txt"

# Check VPC Flow Logs for unusual outbound traffic
echo "\n=== Checking VPC Flow Logs ==="
# (Requires VPC Flow Logs to be enabled and stored in CloudWatch/S3)
aws logs filter-log-events \
  --log-group-name "/aws/vpc/flowlogs" \
  --start-time $(date -d '24 hours ago' +%s)000 \
  --filter-pattern "[version, account, eni, source, destination, srcport, destport, protocol, packets, bytes, start, end, action=ACCEPT, status]" \
  --query 'events[*].message' \
  --output text \
  | grep -E "ACCEPT.*OUTBOUND" \
  > "$INCIDENT_DIR/vpc-outbound-traffic.txt"

echo "✓ Exfiltration check complete"
\`\`\`

## Phase 3: Eradication

### Terminate Unauthorized Resources
\`\`\`bash {"name":"terminate-resources"}
# Terminate EC2 instances created by compromised user
INSTANCE_IDS=$(aws ec2 describe-instances \
  --filters "Name=tag:CreatedBy,Values=$COMPROMISED_USER" "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' \
  --output text)

if [ -n "$INSTANCE_IDS" ]; then
  echo "Terminating instances: $INSTANCE_IDS"
  aws ec2 terminate-instances --instance-ids $INSTANCE_IDS
  echo "$(date): Terminated instances: $INSTANCE_IDS" | tee -a "$INCIDENT_DIR/timeline.txt"
else
  echo "No unauthorized instances found"
fi

echo "✓ Unauthorized resources terminated"
\`\`\`

### Remove Malicious IAM Policies
\`\`\`bash {"name":"remove-policies"}
# List and detach suspicious policies
aws iam list-policies --scope Local \
  --query "Policies[?contains(PolicyName, 'temp') || contains(PolicyName, 'test')]" \
  --output json > "$INCIDENT_DIR/suspicious-policies.json"

# Review and manually remove if confirmed malicious
cat "$INCIDENT_DIR/suspicious-policies.json"

echo "⚠️  Review suspicious policies before removal"
\`\`\`

### Rotate Credentials
\`\`\`bash {"name":"rotate-credentials"}
# Force rotation of potentially exposed credentials
echo "=== Credentials to Rotate ===" | tee "$INCIDENT_DIR/credential-rotation.txt"

# List IAM users who may have been compromised
aws iam list-users --query 'Users[].UserName' --output text | while read user; do
  last_used=$(aws iam get-user --user-name "$user" --query 'User.PasswordLastUsed' --output text 2>/dev/null)
  if [ "$last_used" != "None" ]; then
    echo "User: $user - Last password use: $last_used" | tee -a "$INCIDENT_DIR/credential-rotation.txt"
  fi
done

echo "\n⚠️  Manually rotate credentials for affected users"
echo "⚠️  Consider rotating AWS root account credentials"
\`\`\`

## Phase 4: Recovery

### Restore Access for Legitimate Users
\`\`\`bash {"name":"restore-access"}
# Remove deny-all policy after threat is contained
aws iam detach-user-policy \
  --user-name "$COMPROMISED_USER" \
  --policy-arn "$POLICY_ARN"

aws iam delete-policy --policy-arn "$POLICY_ARN"

echo "$(date): Removed containment policy" | tee -a "$INCIDENT_DIR/timeline.txt"
echo "✓ Containment policy removed (verify threat eliminated first)"
\`\`\`

### Implement Additional Security Controls
\`\`\`bash {"name":"security-controls"}
# Enable MFA requirement for sensitive operations
cat > /tmp/require-mfa-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "BoolIfExists": {
          "aws:MultiFactorAuthPresent": "false"
        }
      }
    }
  ]
}
EOF

# Create MFA requirement policy
aws iam create-policy \
  --policy-name "RequireMFA" \
  --policy-document file:///tmp/require-mfa-policy.json

echo "✓ Additional security controls implemented"
\`\`\`

## Phase 5: Post-Incident

### Generate Incident Report
\`\`\`bash {"name":"generate-report"}
cat > "$INCIDENT_DIR/incident-report.md" << EOF
# Security Incident Report: $INCIDENT_ID

## Executive Summary
**Incident Type**: Compromised AWS Account
**Detected**: $(head -2 "$INCIDENT_DIR/timeline.txt" | tail -1)
**Contained**: $(date)
**Severity**: HIGH

## Timeline
\`\`\`
$(cat "$INCIDENT_DIR/timeline.txt")
\`\`\`

## Impact Assessment
- **Compromised User**: $COMPROMISED_USER
- **Affected Resources**: $(wc -l < "$INCIDENT_DIR/affected-resources.txt") resources identified
- **Data Exfiltration**: See $INCIDENT_DIR/potential-exfiltration.txt

## Actions Taken
1. Disabled compromised user access
2. Revoked active sessions
3. Collected forensic evidence
4. Terminated unauthorized resources
5. Rotated credentials
6. Implemented additional security controls

## Root Cause
[TO BE COMPLETED AFTER FULL INVESTIGATION]

## Preventive Measures
- Implement MFA for all users
- Enable AWS CloudTrail across all regions
- Configure GuardDuty for threat detection
- Review and update IAM policies
- Implement least-privilege access

## Lessons Learned
[TO BE COMPLETED IN POST-INCIDENT REVIEW]

## Evidence Location
All evidence stored in: $INCIDENT_DIR
EOF

echo "✓ Incident report generated: $INCIDENT_DIR/incident-report.md"
cat "$INCIDENT_DIR/incident-report.md"
\`\`\`

### Archive Evidence
\`\`\`bash {"name":"archive-evidence"}
# Create encrypted archive of evidence
ARCHIVE_FILE="/secure/evidence/incident-$INCIDENT_ID.tar.gz.gpg"

tar -czf - "$INCIDENT_DIR" | gpg --encrypt --recipient security-team@example.com > "$ARCHIVE_FILE"

echo "✓ Evidence archived: $ARCHIVE_FILE"
echo "$(date): Evidence archived to $ARCHIVE_FILE" | tee -a "$INCIDENT_DIR/timeline.txt"
\`\`\`

Security Operations Benefits

Speed: Rapid response to security incidents with pre-tested procedures
Forensics: Complete audit trail of actions taken during incident response
Consistency: Standard operating procedures followed every time
Collaboration: Security team can work together using same runbooks
Compliance: Demonstrates security incident response capability for audits

Integration with External Systems

Runme excels at orchestrating interactions with external systems. Let’s explore common integration patterns.

AWS Integration

Runme uses the AWS CLI, which respects standard AWS credential mechanisms:

## AWS SSO Authentication
\`\`\`bash {"name":"aws-sso-login"}
# Authenticate with AWS SSO
aws sso login --profile production

# Verify authentication
aws sts get-caller-identity --profile production
\`\`\`

## Multi-Account Operations
\`\`\`bash {"name":"multi-account"}
# Iterate through multiple AWS accounts
for profile in dev staging production; do
  echo "\n=== Account: $profile ==="
  aws s3 ls --profile $profile
done
\`\`\`

## AssumeRole for Cross-Account Access
\`\`\`bash {"name":"assume-role"}
# Assume role in another account
ROLE_ARN="arn:aws:iam::123456789012:role/CrossAccountAdmin"

# Get temporary credentials
CREDENTIALS=$(aws sts assume-role \
  --role-arn "$ROLE_ARN" \
  --role-session-name "runme-session" \
  --query 'Credentials.[AccessKeyId,SecretAccessKey,SessionToken]' \
  --output text)

# Export credentials for subsequent commands
export AWS_ACCESS_KEY_ID=$(echo $CREDENTIALS | cut -d' ' -f1)
export AWS_SECRET_ACCESS_KEY=$(echo $CREDENTIALS | cut -d' ' -f2)
export AWS_SESSION_TOKEN=$(echo $CREDENTIALS | cut -d' ' -f3)

# Use assumed role
aws ec2 describe-instances
\`\`\`

Kubernetes Integration

## Switch Contexts
\`\`\`bash {"name":"switch-context"}
# List available contexts
kubectl config get-contexts

# Switch to production cluster
kubectl config use-context prod-eks-cluster

# Verify current context
kubectl config current-context
\`\`\`

## Multi-Cluster Operations
\`\`\`bash {"name":"multi-cluster"}
# Run command across all clusters
for context in $(kubectl config get-contexts -o name); do
  echo "\n=== Cluster: $context ==="
  kubectl --context=$context get nodes -o wide
done
\`\`\`

API Integration with Authentication

## OAuth 2.0 Flow
\`\`\`bash {"name":"oauth-flow"}
# Get OAuth token
TOKEN=$(curl -sf -X POST https://auth.example.com/oauth/token \
  -H "Content-Type: application/json" \
  -d '{"client_id":"'$CLIENT_ID'","client_secret":"'$CLIENT_SECRET'","grant_type":"client_credentials"}' \
  | jq -r '.access_token')

export API_TOKEN=$TOKEN

echo "✓ Authenticated, token expires in 3600s"
\`\`\`

## Use Token for API Calls
\`\`\`bash {"name":"api-calls"}
# Make authenticated API call
curl -sf -X GET https://api.example.com/v1/resources \
  -H "Authorization: Bearer $API_TOKEN" \
  | jq '.'
\`\`\`

Database Interactions

## PostgreSQL Query
\`\`\`bash {"name":"postgres-query"}
# Connect and query database
PGPASSWORD=$DB_PASSWORD psql -h postgres.example.com -U admin -d production << 'EOF'
  SELECT 
    table_name,
    pg_size_pretty(pg_total_relation_size(table_name::text)) as size
  FROM information_schema.tables
  WHERE table_schema = 'public'
  ORDER BY pg_total_relation_size(table_name::text) DESC
  LIMIT 10;
EOF
\`\`\`

Getting Started: Adopting Runme in Your Organization

Installation

VS Code Extension (Recommended for beginners):

Open VS Code
Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X)
Search for “Runme”
Click Install
Open any .md file to see Runme in action

CLI Installation:

# macOS
brew install runme

# Linux
curl -fsSL https://download.runme.dev/install.sh | sh

# Verify installation
runme --version

Creating Your First Runbook

Create a markdown file: ops-runbook.md
Add code blocks with commands
Open in VS Code with Runme extension
Click “Run” buttons to execute

Example first runbook:

# My First Runbook

## Check System Status
\`\`\`bash
date
whoami
uname -a
\`\`\`

## List Running Processes
\`\`\`bash
ps aux | head -10
\`\`\`

## Check Disk Usage
\`\`\`bash
df -h
\`\`\`

Best Practices for Adoption

Start Small: Begin with simple operational tasks
Version Control: Store runbooks in Git alongside code
Document Context: Add explanatory text between code blocks
Use Named Cells: Give cells meaningful names for better logs
Test Thoroughly: Run runbooks in safe environments before production use
Review Before Running: Always review commands before execution
Set up Sessions: Use named sessions to isolate different workflows
Add Safety Checks: Include verification steps before destructive operations

Runbook Template

---
runme:
  version: v3
shell: bash
env:
  ENVIRONMENT: staging
---

# [Task Name] Runbook

**Purpose**: [Brief description]
**Owner**: [Team/Person]
**Last Updated**: [Date]

## Prerequisites

- [ ] Access to [system/service]
- [ ] Required tools installed
- [ ] Credentials configured

## Safety Checks

### Verify Environment
\`\`\`bash {"name":"verify-env"}
echo "Environment: $ENVIRONMENT"
echo "Current user: $(whoami)"
echo "Current directory: $(pwd)"

# Add checks specific to your task
\`\`\`

## Procedure

### Step 1: [Description]
\`\`\`bash {"name":"step-1"}
# Your commands here
\`\`\`

### Step 2: [Description]
\`\`\`bash {"name":"step-2"}
# Your commands here
\`\`\`

## Verification

### Verify Results
\`\`\`bash {"name":"verify"}
# Commands to verify success
\`\`\`

## Rollback (If Needed)

### Rollback Procedure
\`\`\`bash {"name":"rollback"}
# Commands to rollback changes
\`\`\`

## Next Steps

- [ ] Update monitoring
- [ ] Notify stakeholders
- [ ] Document any issues

Security Considerations

Credential Storage: Never hardcode credentials in runbooks. Use environment variables or credential managers
Access Control: Use Git permissions to control who can modify runbooks
Audit Trail: Enable logging to track runbook executions
Code Review: Require peer review for runbook changes
Sensitive Operations: Add confirmation prompts for destructive operations
Encryption: Encrypt runbooks containing sensitive information
Least Privilege: Run runbooks with minimal required permissions

Conclusion

Runme represents a paradigm shift in operational documentation. By making markdown files executable, it eliminates the gap between documentation and reality. For teams managing incidents, infrastructure, DevOps workflows, and security operations, Runme provides:

Speed: Execute complex procedures with one click
Consistency: Everyone follows the same tested procedures
Auditability: Complete record of what was executed and when
Collaboration: Share operational knowledge through versioned runbooks
Safety: Review before execution, with built-in rollback procedures

The key to success with Runme is starting small, building comprehensive runbooks gradually, and fostering a culture where executable documentation becomes the norm. As your team gains experience, you’ll find that Runme becomes an indispensable tool for operational excellence.

Whether you’re responding to a production incident at 2 AM, deploying a critical security patch, or onboarding a new team member, Runme ensures that your operations are fast, reliable, and well-documented.

Start your journey with Runme today by converting your most frequently used runbook into an executable markdown file. Your future self (and your team) will thank you.