Port vs Backstage: Developer Portals for AWS EKS and AI Agent Self-Service
READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.
Introduction
As platform engineering matures, developer portals have become the front door to internal developer platforms (IDPs). Two tools dominate this space: Backstage (the open-source framework from Spotify, now a CNCF incubating project) and Port (a SaaS-first developer portal). Both promise to reduce cognitive load on developers, standardize workflows, and enable self-service — but they differ significantly in philosophy, architecture, and operational cost.
This post compares Port and Backstage with a focus on:
- Integration with AWS EKS (Elastic Kubernetes Service)
- Self-service capabilities for deploying AI agents
- The broader toolchain needed to build a complete AI agent self-service stack on AWS EKS
Overview: Port vs Backstage
Backstage
Backstage is an open-source framework released by Spotify in 2020. It is highly extensible and designed to be self-hosted. Organizations adopt Backstage by building their own instance, installing plugins, and writing custom React-based frontend components.
Key Characteristics:
- Open-source (Apache 2.0 license), CNCF Incubating project
- Self-hosted — you own and operate the deployment
- Plugin-based architecture with 200+ community plugins
- Software catalog at its core
- Techdocs for internal documentation
- Scaffolder for template-driven self-service workflows
- Strong community ecosystem (Spotify, Roadie, Frontside, ThoughtWorks)
Official Documentation: https://backstage.io/docs/
Port
Port is a SaaS developer portal platform founded in 2021. It provides a fully managed backend, a flexible data model, and a no-code/low-code approach to building developer portals. Teams get up and running quickly without having to manage infrastructure.
Key Characteristics:
- SaaS with a generous free tier and enterprise plans
- No infrastructure to manage — Port runs the backend
- Flexible data model (Blueprints and Entities)
- Built-in RBAC, audit logs, and scorecards
- Self-service actions powered by webhooks, GitHub Actions, and more
- Catalog, scorecards, and self-service actions in a single product
- Strong integrations with cloud providers and Kubernetes out of the box
Official Documentation: https://docs.getport.io/
Architecture and Deployment
Backstage on AWS EKS
Backstage is a Node.js application that consists of a frontend (React) and a backend (Express). It requires a PostgreSQL database and can be containerized and deployed on Kubernetes.
Sample Kubernetes Deployment for Backstage on EKS:
apiVersion: apps/v1
kind: Deployment
metadata:
name: backstage
namespace: backstage
spec:
replicas: 2
selector:
matchLabels:
app: backstage
template:
metadata:
labels:
app: backstage
spec:
serviceAccountName: backstage
containers:
- name: backstage
image: 123456789012.dkr.ecr.us-east-1.amazonaws.com/backstage:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 7007
envFrom:
- secretRef:
name: backstage-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
readinessProbe:
httpGet:
path: /healthcheck
port: 7007
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthcheck
port: 7007
initialDelaySeconds: 60
periodSeconds: 20
---
apiVersion: v1
kind: Service
metadata:
name: backstage
namespace: backstage
spec:
selector:
app: backstage
ports:
- port: 80
targetPort: 7007
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backstage
namespace: backstage
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/xxx
spec:
rules:
- host: backstage.internal.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backstage
port:
number: 80
Backstage also needs an IAM Role for Service Account (IRSA) to call AWS APIs:
apiVersion: v1
kind: ServiceAccount
metadata:
name: backstage
namespace: backstage
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/backstage-role
The corresponding IAM role needs policies to read from ECR, SSM Parameter Store (for secrets), and optionally to call EKS APIs.
Connecting Backstage to EKS:
Install the Kubernetes plugin and configure it in app-config.yaml:
kubernetes:
serviceLocatorMethod:
type: multiTenant
clusterLocatorMethods:
- type: config
clusters:
- url: https://<EKS_ENDPOINT>
name: production-eks
authProvider: aws
caData: <base64-encoded-CA>
With IRSA configured correctly, Backstage can list pods, deployments, and other resources across EKS clusters directly in the service catalog.
Port on AWS EKS
Port’s architecture is SaaS — the Port backend is hosted by Port. What you deploy on EKS is Port’s Kubernetes Exporter, a lightweight agent that reads your cluster state and pushes it into the Port catalog.
Installing Port’s Kubernetes Exporter via Helm:
helm repo add port-labs https://port-labs.github.io/helm-charts
helm repo update
helm install port-k8s-exporter port-labs/port-k8s-exporter \
--namespace port-k8s-exporter \
--create-namespace \
--set secret.secrets.portClientId=<PORT_CLIENT_ID> \
--set secret.secrets.portClientSecret=<PORT_CLIENT_SECRET> \
--set clusterName=production-eks
Sample values.yaml for the Port Kubernetes Exporter:
secret:
secrets:
portClientId: "<PORT_CLIENT_ID>"
portClientSecret: "<PORT_CLIENT_SECRET>"
clusterName: "production-eks"
resources:
- kind: v1/namespaces
selector:
query: .metadata.name | startswith("prod-")
- kind: apps/v1/deployments
- kind: apps/v1/replicasets
- kind: v1/pods
- kind: v1/services
- kind: batch/v1/jobs
- kind: batch/v1/cronjobs
# Map Kubernetes resources to Port Blueprints
mappings:
- kind: apps/v1/deployments
port:
entity:
mappings:
identifier: .metadata.name + "-" + .metadata.namespace + "-" + env.CLUSTER_NAME
title: .metadata.name
blueprint: '"deployment"'
properties:
createdAt: .metadata.creationTimestamp
createdBy: .metadata.annotations."kubectl.kubernetes.io/last-applied-configuration"
namespace: .metadata.namespace
replicas: .spec.replicas
image: .spec.template.spec.containers[0].image
This approach is simpler operationally — there is no Backstage application to maintain, upgrade, or scale. Port’s exporter is a small stateless agent.
AWS EKS Integration Comparison
| Feature | Backstage | Port |
|---|---|---|
| Deployment model | Self-hosted on EKS | SaaS (exporter agent on EKS) |
| EKS cluster visibility | Kubernetes plugin (config-based) | Port Kubernetes Exporter (Helm chart) |
| Multi-cluster support | Yes, via plugin config | Yes, multiple exporters |
| Real-time data | Polling (configurable) | Near real-time push from exporter |
| IRSA / AWS IAM support | Yes, via aws auth provider | Not required (exporter uses API key) |
| AWS Service Catalog | Community plugins | Built-in AWS integrations |
| Cost to run | ~$100-500/month (EKS nodes, RDS, ALB) | Free tier + $X/user/month for enterprise |
| Operational overhead | High (maintain Backstage app, plugins, DB) | Low (SaaS backend, only manage exporter) |
| Customization | Unlimited (React plugins, TypeScript) | High (Blueprints, Actions, Pages) |
Self-Service Capabilities for AI Agent Deployment
One of the most compelling use cases for developer portals in 2025 and beyond is enabling teams to self-serve the deployment of AI agents — LLM-powered workloads that need compute, storage, model access, and often persistent state.
What Does “AI Agent Self-Service” Require?
A self-service AI agent deployment workflow typically needs to:
- Provision a Kubernetes namespace with appropriate RBAC
- Create an IAM role (for Bedrock, SageMaker, or S3 access)
- Deploy a containerized agent workload (Deployment or Job)
- Optionally provision a vector database (e.g., OpenSearch or pgvector on RDS)
- Wire up secrets (API keys, model endpoints) from AWS Secrets Manager
- Configure autoscaling (KEDA for event-driven or HPA for load-based)
- Set up observability (CloudWatch, Datadog, or OpenTelemetry)
Backstage Scaffolder for AI Agent Self-Service
Backstage’s Scaffolder is a template engine that lets platform engineers define software templates. Users fill out a form in the Backstage UI, and the template executes actions (create Git repos, call APIs, run scripts).
Sample template.yaml for AI Agent Deployment:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: deploy-ai-agent
title: Deploy AI Agent on EKS
description: Self-service template to deploy an AI agent workload on AWS EKS
tags:
- ai
- eks
- aws
spec:
owner: platform-team
type: service
parameters:
- title: Agent Configuration
required:
- agentName
- agentImage
- namespace
- awsRegion
- bedrockModelId
properties:
agentName:
title: Agent Name
type: string
description: Unique name for the AI agent
pattern: "^[a-z][a-z0-9-]{2,30}$"
agentImage:
title: Container Image
type: string
description: ECR image URI for the agent (e.g., 123456789.dkr.ecr.us-east-1.amazonaws.com/my-agent:latest)
namespace:
title: Kubernetes Namespace
type: string
description: Target EKS namespace
awsRegion:
title: AWS Region
type: string
default: us-east-1
enum:
- us-east-1
- us-west-2
- eu-west-1
bedrockModelId:
title: Bedrock Model ID
type: string
default: anthropic.claude-3-5-sonnet-20241022-v2:0
enum:
- anthropic.claude-3-5-sonnet-20241022-v2:0
- amazon.nova-pro-v1:0
- meta.llama3-70b-instruct-v1:0
replicaCount:
title: Initial Replicas
type: integer
default: 1
minimum: 1
maximum: 10
enableVectorDb:
title: Enable Vector Database (OpenSearch)
type: boolean
default: false
steps:
- id: fetch-template
name: Fetch Agent Template
action: fetch:template
input:
url: ./skeleton
values:
agentName: ${{ parameters.agentName }}
agentImage: ${{ parameters.agentImage }}
namespace: ${{ parameters.namespace }}
awsRegion: ${{ parameters.awsRegion }}
bedrockModelId: ${{ parameters.bedrockModelId }}
replicaCount: ${{ parameters.replicaCount }}
- id: create-github-pr
name: Create GitHub PR with Agent Manifests
action: publish:github:pull-request
input:
repoUrl: github.com?repo=eks-ai-agents&owner=myorg
title: "feat: deploy ai-agent ${{ parameters.agentName }}"
branchName: "agent/${{ parameters.agentName }}"
description: "Automated PR to deploy AI agent ${{ parameters.agentName }} to EKS namespace ${{ parameters.namespace }}"
- id: create-iam-role
name: Create IAM Role via AWS CDK
action: aws:cloudformation:deploy
input:
stackName: "ai-agent-${{ parameters.agentName }}-iam"
templatePath: ./iam/agent-role-template.yaml
parameters:
AgentName: ${{ parameters.agentName }}
BedrockModelId: ${{ parameters.bedrockModelId }}
output:
links:
- title: GitHub PR
url: ${{ steps['create-github-pr'].output.remoteUrl }}
- title: View in Backstage
entityRef: ${{ steps['register'].output.entityRef }}
Port Self-Service Actions for AI Agent Deployment
Port’s self-service model is based on Actions — JSON-defined workflows triggered from the Port UI. Actions can call webhooks, trigger GitHub Actions, invoke AWS Lambda, or send Kafka messages.
Sample Port Action JSON for AI Agent Deployment:
{
"identifier": "deploy_ai_agent",
"title": "Deploy AI Agent on EKS",
"description": "Self-service action to deploy an AI agent workload to AWS EKS",
"trigger": {
"type": "self-service",
"operation": "CREATE",
"userInputs": {
"properties": {
"agentName": {
"type": "string",
"title": "Agent Name",
"pattern": "^[a-z][a-z0-9-]{2,30}$"
},
"agentImage": {
"type": "string",
"title": "Container Image URI"
},
"namespace": {
"type": "string",
"title": "Kubernetes Namespace"
},
"bedrockModelId": {
"type": "string",
"title": "Bedrock Model ID",
"enum": [
"anthropic.claude-3-5-sonnet-20241022-v2:0",
"amazon.nova-pro-v1:0",
"meta.llama3-70b-instruct-v1:0"
],
"default": "anthropic.claude-3-5-sonnet-20241022-v2:0"
},
"replicaCount": {
"type": "number",
"title": "Initial Replicas",
"default": 1,
"minimum": 1,
"maximum": 10
},
"enableVectorDb": {
"type": "boolean",
"title": "Enable Vector Database",
"default": false
}
},
"required": ["agentName", "agentImage", "namespace", "bedrockModelId"]
}
},
"invocationMethod": {
"type": "GITHUB",
"org": "myorg",
"repo": "eks-ai-agents",
"workflow": "deploy-ai-agent.yaml",
"omitPayload": false,
"reportWorkflowStatus": true
}
}
Sample GitHub Actions Workflow Triggered by Port Action:
# .github/workflows/deploy-ai-agent.yaml
name: Deploy AI Agent to EKS
on:
workflow_dispatch:
inputs:
port_payload:
required: true
description: "Port's payload including action and general context"
jobs:
deploy-ai-agent:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Parse Port Payload
id: parse
run: |
echo "AGENT_NAME=$(echo '${{ inputs.port_payload }}' | jq -r '.payload.properties.agentName')" >> $GITHUB_ENV
echo "AGENT_IMAGE=$(echo '${{ inputs.port_payload }}' | jq -r '.payload.properties.agentImage')" >> $GITHUB_ENV
echo "NAMESPACE=$(echo '${{ inputs.port_payload }}' | jq -r '.payload.properties.namespace')" >> $GITHUB_ENV
echo "BEDROCK_MODEL=$(echo '${{ inputs.port_payload }}' | jq -r '.payload.properties.bedrockModelId')" >> $GITHUB_ENV
echo "REPLICAS=$(echo '${{ inputs.port_payload }}' | jq -r '.payload.properties.replicaCount')" >> $GITHUB_ENV
echo "RUN_ID=$(echo '${{ inputs.port_payload }}' | jq -r '.context.runId')" >> $GITHUB_ENV
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-role
aws-region: us-east-1
- name: Setup EKS Kubeconfig
run: |
aws eks update-kubeconfig --name production-eks --region us-east-1
- name: Create Namespace and RBAC
run: |
kubectl create namespace $AGENT_NAME --dry-run=client -o yaml | kubectl apply -f -
kubectl label namespace $AGENT_NAME team=ai-agents environment=production
- name: Create IAM Role for Agent (IRSA)
run: |
aws iam create-role \
--role-name ai-agent-$AGENT_NAME \
--assume-role-policy-document file://iam/trust-policy.json || true
aws iam attach-role-policy \
--role-name ai-agent-$AGENT_NAME \
--policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess
- name: Deploy AI Agent to EKS
run: |
helm upgrade --install $AGENT_NAME ./charts/ai-agent \
--namespace $AGENT_NAME \
--set image=$AGENT_IMAGE \
--set bedrockModelId=$BEDROCK_MODEL \
--set replicaCount=$REPLICAS \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/ai-agent-$AGENT_NAME \
--wait --timeout 5m
- name: Update Port Run Status (Success)
if: success()
uses: port-labs/port-github-action@v1
with:
clientId: ${{ secrets.PORT_CLIENT_ID }}
clientSecret: ${{ secrets.PORT_CLIENT_SECRET }}
operation: PATCH_RUN
runId: ${{ env.RUN_ID }}
status: SUCCESS
logMessage: "AI agent ${{ env.AGENT_NAME }} deployed successfully to EKS namespace ${{ env.NAMESPACE }}"
- name: Update Port Run Status (Failure)
if: failure()
uses: port-labs/port-github-action@v1
with:
clientId: ${{ secrets.PORT_CLIENT_ID }}
clientSecret: ${{ secrets.PORT_CLIENT_SECRET }}
operation: PATCH_RUN
runId: ${{ env.RUN_ID }}
status: FAILURE
logMessage: "Failed to deploy AI agent ${{ env.AGENT_NAME }}"
Self-Service Capabilities Comparison
| Capability | Backstage | Port |
|---|---|---|
| Self-service mechanism | Scaffolder Templates (YAML + actions) | Self-Service Actions (JSON + invocation methods) |
| Template authoring | YAML with Nunjucks templating, TypeScript actions | JSON with JQ mappings, built-in UI form builder |
| Approval workflows | Community plugin (backstage-plugin-approvals) | Built-in multi-step approvals with RBAC |
| Trigger types | Template form submission | Webhook, GitHub Actions, AWS Lambda, Kafka, GitLab CI |
| Status feedback | Basic (via scaffolder logs) | Rich — real-time run logs visible in Port UI |
| RBAC for actions | Plugin-based, complex to configure | Native — role and team-based action visibility |
| Auditability | Requires external setup | Built-in audit log |
| Dry run / preview | Not built-in | Not built-in |
| AI agent use case fit | High (flexible, code-based) | High (rapid iteration, less code) |
Comparing Key Platform Features
Software Catalog
Both tools maintain a catalog of services, resources, and components.
Backstage:
- Entities defined in
catalog-info.yamlfiles in each repo - Pull-based: Backstage reads from Git
- Relationships via
spec.dependsOnandspec.providesApis - Custom entity types via plugins
Port:
- Blueprints define the schema; Entities are instances
- Push-based: integrations push data to Port
- Relationships via “Relations” between Blueprints
- Highly flexible data model — works like a graph database
Scorecards
Backstage: Available via third-party plugins (e.g., Backstage Plugin Scorecard by Roadie). Limited out of the box.
Port: Scorecards are a first-class feature. You can define rules across any Blueprint property and visualize compliance across your entire catalog. This is extremely valuable for ensuring AI agent workloads meet standards (e.g., “all AI agents must have a cost tag”, “all agents must have autoscaling enabled”).
AI-Specific Integrations
Neither Backstage nor Port has native integrations for LLM providers like Bedrock, OpenAI, or Anthropic out of the box. However:
- Backstage can integrate via custom plugins (TypeScript/React)
- Port can integrate via custom Blueprints and webhook-based actions
For AI agent deployments on EKS, both platforms serve as the orchestration layer — they trigger the actual provisioning via IaC tools (Terraform, CDK, Helm) and CI/CD pipelines (GitHub Actions, ArgoCD).
The Full AI Agent Self-Service Stack on AWS EKS
To build a production-grade self-service platform for AI agents on AWS EKS, you need more than just a developer portal. Here is the full reference architecture:
┌─────────────────────────────────────────────────────────────────┐
│ Developer Portal Layer │
│ (Port SaaS or Backstage on EKS) │
│ Catalog | Self-Service Actions | Scorecards | Documentation │
└─────────────────────────┬───────────────────────────────────────┘
│ Triggers
▼
┌─────────────────────────────────────────────────────────────────┐
│ CI/CD & GitOps Layer │
│ GitHub Actions | ArgoCD | Atlantis | Argo Workflows │
│ Helm Charts | Kustomize | GitOps Repositories │
└─────────────────────────┬───────────────────────────────────────┘
│ Deploys to
▼
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer (AWS) │
│ EKS Cluster | VPC | IAM (IRSA) | AWS Secrets Manager │
│ ECR | S3 | OpenSearch | RDS (pgvector) | Bedrock │
└─────────────────────────────────────────────────────────────────┘
Layer 1: Developer Portal (Port or Backstage)
The portal is the entry point for developers. It provides:
- A catalog of available AI agent templates
- Self-service forms to configure and deploy agents
- Visibility into running agents (via EKS exporter/plugin)
- Scorecards for agent health and compliance
Layer 2: Infrastructure as Code
Terraform or AWS CDK provisions the foundational AWS resources:
# terraform/modules/ai-agent-namespace/main.tf
resource "aws_iam_role" "ai_agent" {
name = "ai-agent-${var.agent_name}"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = "arn:aws:iam::${var.account_id}:oidc-provider/${var.oidc_provider}"
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${var.oidc_provider}:sub" = "system:serviceaccount:${var.agent_name}:${var.agent_name}"
"${var.oidc_provider}:aud" = "sts.amazonaws.com"
}
}
}]
})
}
resource "aws_iam_role_policy" "bedrock_access" {
name = "bedrock-access"
role = aws_iam_role.ai_agent.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream",
"bedrock:GetFoundationModel"
]
Resource = "arn:aws:bedrock:${var.aws_region}::foundation-model/${var.bedrock_model_id}"
},
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue"
]
Resource = "arn:aws:secretsmanager:${var.aws_region}:${var.account_id}:secret:ai-agent/${var.agent_name}/*"
}
]
})
}
resource "kubernetes_namespace" "agent" {
metadata {
name = var.agent_name
labels = {
"app.kubernetes.io/managed-by" = "terraform"
"agent-name" = var.agent_name
"environment" = var.environment
}
}
}
Layer 3: GitOps with ArgoCD
ArgoCD watches Git repositories and reconciles the desired state to EKS. It is the ideal complement to both Backstage and Port because the developer portal triggers a Git commit (PR or direct push), and ArgoCD handles the actual deployment.
Sample ArgoCD Application for AI Agents:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: ai-agent-research-assistant
namespace: argocd
labels:
agent-type: research
spec:
project: ai-agents
source:
repoURL: https://github.com/myorg/eks-ai-agents
targetRevision: main
path: agents/research-assistant
helm:
valueFiles:
- values.yaml
- values-production.yaml
destination:
server: https://kubernetes.default.svc
namespace: research-assistant
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
ArgoCD ApplicationSet for multi-agent management:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: ai-agents
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/myorg/eks-ai-agents
revision: main
directories:
- path: agents/*
template:
metadata:
name: "{{path.basename}}"
spec:
project: ai-agents
source:
repoURL: https://github.com/myorg/eks-ai-agents
targetRevision: main
path: "{{path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{path.basename}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Layer 4: AI Agent Helm Chart
A shared Helm chart reduces duplication across AI agent deployments:
# charts/ai-agent/values.yaml
replicaCount: 1
image:
repository: ""
tag: "latest"
pullPolicy: IfNotPresent
serviceAccount:
create: true
annotations: {}
bedrockModelId: "anthropic.claude-3-5-sonnet-20241022-v2:0"
awsRegion: "us-east-1"
resources:
requests:
memory: "512Mi"
cpu: "250m"
nvidia.com/gpu: "0"
limits:
memory: "2Gi"
cpu: "2000m"
autoscaling:
enabled: true
provider: keda # keda or hpa
keda:
triggers:
- type: aws-sqs-queue
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789012/ai-agent-tasks"
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: pod
env:
- name: BEDROCK_MODEL_ID
value: "{{ .Values.bedrockModelId }}"
- name: AWS_DEFAULT_REGION
value: "{{ .Values.awsRegion }}"
secrets:
- name: AGENT_API_KEY
secretName: "ai-agent-secrets"
secretKey: "api-key"
persistence:
enabled: false
storageClass: "gp3"
size: "10Gi"
monitoring:
enabled: true
serviceMonitor: true
prometheusRule: true
Layer 5: KEDA for Event-Driven AI Agent Scaling
AI agents often process tasks from queues (SQS, Kafka). KEDA enables them to scale to zero when idle and scale up when work arrives:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: research-assistant-scaler
namespace: research-assistant
spec:
scaleTargetRef:
name: research-assistant
pollingInterval: 15
cooldownPeriod: 300
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-aws-credentials
metadata:
queueURL: "https://sqs.us-east-1.amazonaws.com/123456789012/research-tasks"
queueLength: "5"
awsRegion: us-east-1
identityOwner: pod
Scaling AI agents to zero is especially important for cost control — LLM inference agents typically have bursty workloads.
Layer 6: Observability
AWS CloudWatch Container Insights provides basic EKS metrics. For richer observability of AI agents, combine with:
- OpenTelemetry Collector on EKS to collect traces from LLM calls
- Datadog or Grafana + Prometheus for dashboards and alerting
- AWS Cost Explorer tags for per-agent cost tracking
Sample OpenTelemetry Collector config for AI agent tracing:
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: observability
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
resource:
attributes:
- action: insert
key: cloud.provider
value: aws
- action: insert
key: cloud.platform
value: aws_eks
exporters:
otlp/datadog:
endpoint: https://trace.agent.datadoghq.com
headers:
DD-API-KEY: "${DD_API_KEY}"
awsxray:
region: us-east-1
debug:
verbosity: normal
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, resource]
exporters: [otlp/datadog, awsxray]
Layer 7: Secret Management with AWS Secrets Manager and External Secrets Operator
AI agents need API keys, model credentials, and database passwords. Use External Secrets Operator (ESO) to sync AWS Secrets Manager into Kubernetes Secrets:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: research-assistant-secrets
namespace: research-assistant
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secretsmanager
kind: ClusterSecretStore
target:
name: research-assistant-secrets
creationPolicy: Owner
data:
- secretKey: openai-api-key
remoteRef:
key: ai-agents/research-assistant
property: openai_api_key
- secretKey: pinecone-api-key
remoteRef:
key: ai-agents/research-assistant
property: pinecone_api_key
Additional Tools for the Complete Stack
Here is a summary of all the tools recommended for a production AI agent self-service stack on AWS EKS:
Developer Portal
| Tool | Purpose |
|---|---|
| Port | SaaS developer portal — catalog, self-service, scorecards |
| Backstage | Self-hosted developer portal — highly customizable |
Infrastructure Provisioning
| Tool | Purpose |
|---|---|
| Terraform | IaC for AWS resources (EKS, IAM, VPC, RDS, OpenSearch) |
| AWS CDK | IaC alternative, native AWS construct library |
| Crossplane | Kubernetes-native infrastructure provisioning (AWS provider) |
| Helm | Kubernetes application packaging and templating |
GitOps & CI/CD
| Tool | Purpose |
|---|---|
| ArgoCD | GitOps continuous delivery for EKS workloads |
| Argo Workflows | Kubernetes-native workflow engine for complex agent pipelines |
| GitHub Actions | CI/CD triggered by Port/Backstage self-service actions |
| Atlantis | Terraform PR automation |
AI Agent Runtime
| Tool | Purpose |
|---|---|
| AWS Bedrock | Managed LLM API (Claude, Nova, Llama models) |
| LangChain / LangGraph | Agent orchestration framework |
| LlamaIndex | Data ingestion and RAG pipeline framework |
| Haystack | Enterprise NLP and RAG pipelines |
Scaling
| Tool | Purpose |
|---|---|
| KEDA | Event-driven pod autoscaling (SQS, Kafka triggers) |
| Karpenter | Node-level autoscaling for right-sized compute |
| AWS SQS / Kafka | Task queue for agent workloads |
Data & Vector Storage
| Tool | Purpose |
|---|---|
| Amazon OpenSearch | Managed vector database for RAG |
| Amazon RDS (pgvector) | PostgreSQL with vector extension |
| Amazon S3 | Document storage and agent state |
| Amazon DynamoDB | Agent session state and memory storage |
Secret Management
| Tool | Purpose |
|---|---|
| AWS Secrets Manager | Centralized secret storage |
| External Secrets Operator | Sync AWS secrets to Kubernetes |
| AWS KMS | Encryption key management |
Observability
| Tool | Purpose |
|---|---|
| OpenTelemetry Collector | Vendor-neutral telemetry collection |
| AWS CloudWatch | Native AWS monitoring and logs |
| Datadog | Full-stack observability and APM |
| Grafana + Prometheus | Open-source metrics and dashboards |
| AWS X-Ray | Distributed tracing for LLM call chains |
Security & Policy
| Tool | Purpose |
|---|---|
| Kyverno | Kubernetes-native policy engine |
| AWS IAM + IRSA | Fine-grained pod-level AWS permissions |
| Cosign + Rekor | Container image signing and provenance |
| Falco | Runtime security for agent containers |
Decision Framework: Port vs Backstage
Choose Port if:
- You want to get started quickly without managing infrastructure
- Your team lacks TypeScript/React expertise for plugin development
- You need built-in RBAC, audit logs, and scorecards out of the box
- You want native integrations with AWS, GitHub, Datadog, PagerDuty, etc.
- You prefer a SaaS model with predictable operational costs
- You need real-time feedback on self-service action runs
Choose Backstage if:
- You need complete control over the portal’s UI and behavior
- You have React/TypeScript developers who can build and maintain plugins
- Your organization requires on-premises or air-gapped deployment
- You want to leverage the large community ecosystem of 200+ plugins
- You need deep integration with internal tools that are not supported by Port
- You have a dedicated platform engineering team (3+ engineers) to maintain it
For AI agent self-service on AWS EKS specifically, Port offers a faster path to production due to its built-in GitHub Actions integration, real-time action feedback, and Kubernetes exporter. Backstage offers more power for teams willing to invest in customization.
Conclusion
Both Port and Backstage are excellent developer portals for enabling AI agent self-service on AWS EKS. Port reduces operational overhead and accelerates time-to-value; Backstage provides maximum flexibility for teams with platform engineering resources.
The real differentiation for AI agent deployments comes from the surrounding toolchain: ArgoCD for GitOps delivery, KEDA for event-driven scaling, External Secrets Operator for secret management, and Terraform/CDK for infrastructure provisioning. These tools work equally well with both portals.
The winning strategy is to treat the developer portal as the control plane — the single pane of glass where developers discover, configure, and deploy AI agents — while delegating the actual provisioning and deployment to specialized tools that each do one thing well.
Further Reading: