Backstage Portal: Progressively Complex Application Deployments on AWS
READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.
Introduction
Every application starts with an idea. In those early days, speed and simplicity matter most — you need to validate your concept without drowning in infrastructure complexity. But as the application gains traction, as users multiply and requirements deepen, the infrastructure must evolve alongside it. The challenge is bridging those two worlds: the lean startup mindset of “ship it fast” and the enterprise reality of “it must never go down.”
Backstage, the open-source developer portal framework originally built by Spotify and now a CNCF incubating project, offers a compelling answer. By centralizing software templates, service catalogs, and self-service workflows, Backstage enables teams to codify their infrastructure progression — allowing developers to deploy a simple AWS Lambda function on day one, then graduate to AWS Fargate, and eventually to a fully managed AWS EKS cluster, all through the same portal interface.
This post explores how Backstage can be used to manage progressively more complex application deployments on AWS, touching on compute, storage, messaging, caching, and the cost and reliability trade-offs at each stage.
Official Backstage Documentation: https://backstage.io/docs/
The Progressive Deployment Philosophy
Modern applications rarely start complex. A product idea begins as a proof of concept, grows into a minimum viable product, then evolves into a production-grade service with demanding reliability, scalability, and security requirements. Infrastructure should follow the same arc.
The problem is that jumping straight to Kubernetes for a new application is wasteful, slow, and operationally expensive. Conversely, staying on Lambda forever means hitting concurrency limits, cold start latency problems, and architectural constraints as the application grows.
Backstage’s Software Templates (also called Scaffolder templates) allow platform teams to encode this progression as a series of self-service actions that developers can invoke directly from the portal. Rather than filing JIRA tickets or waiting for a DevOps engineer to manually provision infrastructure, developers choose the right tier for their current needs and Backstage handles the provisioning.
The result is a continuous spectrum:
| Stage | Compute | Complexity | Best For |
|---|---|---|---|
| 1 — Prototype | AWS Lambda | Very Low | Idea validation, event-driven scripts |
| 2 — Early Production | AWS Fargate | Low–Medium | Containerized services, small teams |
| 3 — Scale | AWS EKS | Medium–High | High traffic, multi-service, platform teams |
Stage 1: AWS Lambda — Start Small, Ship Fast
Why Lambda First?
AWS Lambda is the natural starting point for a new application. There is no infrastructure to provision, no containers to build, and no scaling configuration to tune. You write code, deploy it, and pay only for what you use.
Lambda shines for:
- API backends with intermittent or unpredictable traffic
- Event-driven workflows triggered by S3 uploads, DynamoDB streams, or SQS messages
- Scheduled jobs using EventBridge (CloudWatch Events)
- Prototype validation before committing to a heavier compute layer
Backstage Template: Lambda Deployment
A Backstage Software Template for Lambda might look like this:
# templates/lambda-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: aws-lambda-service
title: AWS Lambda Service
description: Deploy a new serverless service on AWS Lambda
spec:
owner: platform-team
type: service
parameters:
- title: Service Configuration
required: [name, runtime, region]
properties:
name:
title: Service Name
type: string
description: Unique name for the Lambda function
runtime:
title: Runtime
type: string
enum: [python3.12, nodejs20.x, java21]
default: python3.12
region:
title: AWS Region
type: string
default: us-east-1
memorySize:
title: Memory (MB)
type: integer
default: 256
enum: [128, 256, 512, 1024, 2048]
timeout:
title: Timeout (seconds)
type: integer
default: 30
steps:
- id: fetch-template
name: Fetch Lambda Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
runtime: ${{ parameters.runtime }}
region: ${{ parameters.region }}
memorySize: ${{ parameters.memorySize }}
timeout: ${{ parameters.timeout }}
- id: create-repo
name: Create GitHub Repository
action: publish:github
input:
repoUrl: github.com?owner=my-org&repo=${{ parameters.name }}
- id: register
name: Register in Backstage Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
This template creates a GitHub repository with a pre-configured Lambda function, an IaC definition (Terraform or AWS CDK), a GitHub Actions pipeline for deployment, and a catalog entry in Backstage so the service is immediately discoverable.
Supporting Services at Lambda Stage
At this stage, keep dependencies minimal:
Storage: Use Amazon S3 for file storage and Amazon DynamoDB for structured data. Both are serverless and require zero management. DynamoDB on-demand pricing means you only pay for actual reads and writes.
Messaging: Use Amazon SQS for decoupled messaging between Lambda functions. A dead-letter queue (DLQ) is a must-have for reliability.
Email: Use Amazon SES in sandbox mode initially. SES integrates directly with Lambda via the AWS SDK and requires no additional infrastructure.
Cost Profile at Stage 1
AWS Lambda pricing is among the most cost-effective options for low-to-moderate workloads:
- Compute: $0.0000166667 per GB-second. A function with 256 MB RAM running for 100ms costs roughly $0.000000042 per invocation.
- Requests: $0.20 per 1 million requests (first 1 million free each month).
- DynamoDB on-demand: $1.25 per million write request units, $0.25 per million read request units.
For a prototype handling tens of thousands of requests per month, the total AWS bill is often under $5/month.
Limitations of Lambda
Lambda works until it doesn’t:
- Cold starts introduce latency spikes (100ms–3s for JVM runtimes) that hurt user experience under light load.
- 15-minute execution limit rules out long-running processes.
- Concurrency limits (default 1,000 per region) can cause throttling under sudden traffic spikes.
- Stateless architecture requires externalizing all state, which adds complexity.
- Container image size limits (10 GB) and ephemeral storage (512 MB default, up to 10 GB) constrain large ML or data-processing workloads.
When you start hitting these limits, it’s time to consider Stage 2.
Stage 2: AWS Fargate — Containers Without the Cluster
Why Fargate?
AWS Fargate provides serverless compute for containers. You package your application into a Docker container, define the CPU and memory you need, and Fargate runs it — without you managing EC2 instances, AMI updates, or node groups.
Fargate fits the gap between Lambda’s strict execution model and the full control (and complexity) of Kubernetes. It works well for:
- Long-running HTTP services that need consistent latency without cold starts
- Background workers that process queues or run scheduled batch jobs
- Stateful-ish services that benefit from container persistence (sidecars, shared memory within a task)
- Teams without Kubernetes expertise who need more than Lambda offers
Backstage Template: Fargate Service
# templates/fargate-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: aws-fargate-service
title: AWS Fargate Service
description: Deploy a containerized service on AWS Fargate (ECS)
spec:
owner: platform-team
type: service
parameters:
- title: Service Configuration
required: [name, region, cpu, memory]
properties:
name:
title: Service Name
type: string
region:
title: AWS Region
type: string
default: us-east-1
cpu:
title: vCPU Units
type: integer
enum: [256, 512, 1024, 2048, 4096]
default: 512
memory:
title: Memory (MB)
type: integer
enum: [512, 1024, 2048, 4096, 8192]
default: 1024
desiredCount:
title: Desired Task Count
type: integer
default: 2
enableAutoScaling:
title: Enable Auto Scaling
type: boolean
default: true
minCapacity:
title: Minimum Task Count
type: integer
default: 1
maxCapacity:
title: Maximum Task Count
type: integer
default: 10
- title: Networking
properties:
enableAlb:
title: Enable Application Load Balancer
type: boolean
default: true
enableHttps:
title: Enable HTTPS (requires ACM certificate)
type: boolean
default: true
steps:
- id: fetch-template
name: Fetch Fargate Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
cpu: ${{ parameters.cpu }}
memory: ${{ parameters.memory }}
- id: create-repo
name: Create GitHub Repository
action: publish:github
input:
repoUrl: github.com?owner=my-org&repo=${{ parameters.name }}
- id: register
name: Register in Backstage Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
The generated repository includes a Dockerfile, ECS task definition, Application Load Balancer configuration, and a GitHub Actions pipeline for building and deploying container images to Amazon ECR.
Supporting Services at Fargate Stage
At this stage, the application has real users and real traffic. Infrastructure choices become more consequential.
Storage:
- Amazon RDS (PostgreSQL/MySQL) replaces DynamoDB for relational workloads. Use Multi-AZ deployments for production. Aurora Serverless v2 offers a good middle ground — it scales automatically and you don’t pay for idle capacity.
- Amazon S3 continues to serve as object storage.
- Amazon EFS can provide shared file storage across Fargate tasks when needed.
Caching:
- Amazon ElastiCache for Redis dramatically reduces database load and improves response times for read-heavy workloads. A single-node Redis instance can serve as a session store and query cache.
- For even simpler caching, DynamoDB Accelerator (DAX) adds a microsecond-latency cache in front of DynamoDB tables.
Messaging:
- Amazon SQS scales naturally to Fargate workers consuming messages.
- Amazon SNS provides pub/sub fanout — one SNS topic can fan out to multiple SQS queues, Lambda functions, or email endpoints via SES.
- Amazon EventBridge replaces polling patterns with event-driven routing between services.
Email:
- Amazon SES moves out of sandbox mode once the application has a real domain and verified sending addresses. SES supports transactional email, bulk campaigns, and inbound email processing.
Cost Profile at Stage 2
Fargate pricing is based on vCPU and memory consumed per second:
- vCPU: $0.04048 per vCPU-hour
- Memory: $0.004445 per GB-hour
- A task with 0.5 vCPU and 1 GB memory running continuously costs ~$22/month.
- Two tasks behind an ALB (for high availability) costs ~$44/month for compute + ~$18/month for the ALB itself.
Add Aurora Serverless v2 ($0.12/ACU-hour minimum) and ElastiCache Redis (t4g.small at ~$25/month) and a typical Stage 2 stack runs $100–$300/month depending on traffic.
This is significantly higher than Lambda but provides consistent performance, no cold starts, and no execution time limits.
Limitations of Fargate
Fargate removes the operational burden of EC2 but introduces its own constraints:
- Multi-service orchestration becomes complex — each service has its own ECS cluster (or shared cluster), and service discovery requires AWS Cloud Map or a service mesh.
- No shared memory between tasks. Each task is isolated.
- Limited networking primitives compared to Kubernetes — no native pod networking, no admission webhooks, no custom resource definitions.
- Resource density is lower than Kubernetes — Fargate tasks cannot bin-pack as efficiently as Kubernetes pods on shared nodes.
When the application has multiple teams contributing services, when traffic patterns require sophisticated autoscaling, or when the engineering org needs a common platform for dozens of services, it’s time for Stage 3.
Stage 3: AWS EKS — Enterprise-Grade Kubernetes
Why EKS?
Amazon Elastic Kubernetes Service (EKS) is the AWS-managed Kubernetes control plane. It’s the most operationally complex option but also the most powerful — providing fine-grained control over networking, scheduling, resource isolation, security policies, and observability.
EKS is the right choice when:
- The engineering organization has multiple teams deploying dozens of services
- You need advanced autoscaling (Karpenter for nodes, KEDA for event-driven pod scaling)
- Service mesh capabilities (Istio, Linkerd, or AWS App Mesh) are required for traffic management and mTLS
- Custom operators and Kubernetes-native tooling are part of the platform strategy
- Compliance requirements mandate strict network policies and pod security standards
- The application runs stateful workloads (databases, message brokers, ML model servers) requiring persistent volumes
Backstage Template: EKS Application
# templates/eks-service/template.yaml
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: eks-application
title: EKS Application
description: Deploy a production-grade application on AWS EKS
spec:
owner: platform-team
type: service
parameters:
- title: Application Configuration
required: [name, namespace, region]
properties:
name:
title: Application Name
type: string
namespace:
title: Kubernetes Namespace
type: string
default: default
region:
title: AWS Region
type: string
default: us-east-1
replicas:
title: Initial Replica Count
type: integer
default: 3
- title: Resource Limits
properties:
cpuRequest:
title: CPU Request
type: string
default: "250m"
cpuLimit:
title: CPU Limit
type: string
default: "1000m"
memoryRequest:
title: Memory Request
type: string
default: "256Mi"
memoryLimit:
title: Memory Limit
type: string
default: "1Gi"
- title: Autoscaling
properties:
enableHpa:
title: Enable Horizontal Pod Autoscaler
type: boolean
default: true
enableKeda:
title: Enable KEDA (event-driven scaling)
type: boolean
default: false
enableKarpenter:
title: Enable Karpenter Node Provisioning
type: boolean
default: true
minReplicas:
title: Minimum Replicas
type: integer
default: 2
maxReplicas:
title: Maximum Replicas
type: integer
default: 20
- title: Networking
properties:
ingressClass:
title: Ingress Class
type: string
enum: [aws-alb, nginx, traefik]
default: aws-alb
enableServiceMesh:
title: Enable Istio Service Mesh
type: boolean
default: false
steps:
- id: fetch-template
name: Fetch EKS Application Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
namespace: ${{ parameters.namespace }}
replicas: ${{ parameters.replicas }}
- id: create-repo
name: Create GitHub Repository
action: publish:github
input:
repoUrl: github.com?owner=my-org&repo=${{ parameters.name }}
- id: create-argocd-app
name: Register ArgoCD Application
action: argocd:create-resources
input:
appName: ${{ parameters.name }}
argoInstance: prod-argocd
namespace: ${{ parameters.namespace }}
repoUrl: ${{ steps['create-repo'].output.remoteUrl }}
- id: register
name: Register in Backstage Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
The generated Kubernetes manifests include a Deployment, Service, HorizontalPodAutoscaler, PodDisruptionBudget, NetworkPolicy, resource quotas, and an ArgoCD Application for GitOps-driven continuous delivery.
Supporting Services at EKS Stage
At this stage, the infrastructure mirrors what large enterprises run in production.
Storage:
- Amazon Aurora PostgreSQL (Multi-AZ, Global Database) for transactional data with cross-region read replicas.
- Amazon S3 with intelligent tiering for cost-optimized object storage.
- Amazon EBS persistent volumes for stateful Kubernetes workloads.
- Amazon EFS for shared ReadWriteMany storage across pods.
Caching:
- Amazon ElastiCache for Redis (Cluster Mode) provides horizontal sharding across multiple Redis nodes, handling millions of operations per second. Combined with read replicas, it provides both high throughput and high availability.
- Amazon ElastiCache for Memcached for simple, high-speed caching when Redis data structures are not needed.
Messaging & Eventing:
- Amazon SQS FIFO queues for ordered, exactly-once message processing.
- Amazon SNS with SQS fanout for event-driven microservice communication.
- Amazon MSK (Managed Streaming for Apache Kafka) for high-throughput event streaming when SQS is insufficient.
- Amazon EventBridge as a fully managed event bus for routing events across services and AWS accounts.
Email & Notifications:
- Amazon SES with dedicated IP addresses for high-volume transactional and marketing email.
- Amazon SNS for SMS notifications and mobile push notifications.
- Amazon Pinpoint for sophisticated multi-channel customer engagement campaigns.
Security & Compliance:
- AWS IAM Roles for Service Accounts (IRSA) to grant Kubernetes pods fine-grained AWS permissions without storing credentials.
- AWS Secrets Manager or External Secrets Operator for injecting secrets into pods at runtime.
- Kyverno or OPA Gatekeeper for Kubernetes admission control and policy enforcement.
Cost Profile at Stage 3
EKS has a fixed control plane cost plus the underlying EC2 or Fargate compute:
- EKS control plane: $0.10/hour ($72/month) per cluster.
- Managed node groups (m5.xlarge): ~$175/month per node. A production cluster with 5 nodes costs ~$875/month.
- Karpenter can reduce this by right-sizing nodes and using Spot instances, potentially cutting compute costs by 60–70%.
- Aurora PostgreSQL Multi-AZ (db.r6g.large): ~$370/month.
- ElastiCache Redis Cluster (cache.r6g.large × 3): ~$555/month.
- MSK (kafka.m5.large × 3): ~$430/month.
A full production EKS stack runs $2,000–$10,000+/month depending on traffic, replication factor, and data volumes. This is justified when the application serves significant business value and reliability requirements cannot be met by simpler alternatives.
Backstage as the Control Plane
The power of Backstage isn’t just the scaffolding — it’s the unified control plane that ties all three stages together. Once services are registered in the Backstage Software Catalog, teams get:
Software Catalog: Unified Visibility
Every Lambda function, Fargate service, and EKS application appears in a single, searchable catalog. Metadata like ownership, deployment stage, linked AWS resources, API definitions, and runbooks are co-located in catalog-info.yaml:
# catalog-info.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payment-service
title: Payment Service
description: Processes customer payments via Stripe
annotations:
backstage.io/techdocs-ref: dir:.
github.com/project-slug: my-org/payment-service
aws.amazon.com/lambda-function-name: payment-service-prod
aws.amazon.com/region: us-east-1
tags:
- payments
- critical
- stage-2
links:
- url: https://console.aws.amazon.com/lambda
title: AWS Lambda Console
icon: dashboard
- url: https://grafana.example.com/d/payment-service
title: Grafana Dashboard
icon: dashboard
spec:
type: service
lifecycle: production
owner: payments-team
system: e-commerce-platform
dependsOn:
- resource:default/payments-db
- resource:default/payments-sqs-queue
- resource:default/ses-transactional
TechDocs: Living Documentation
Backstage’s TechDocs feature renders Markdown documentation directly in the portal, sourced from the same repository as the code. Architecture diagrams, runbooks, API references, and onboarding guides stay in sync with the service.
Service Upgrade Workflows
One of Backstage’s most powerful patterns is using Software Templates as upgrade workflows. When a team is ready to move from Lambda to Fargate, they invoke an upgrade template that:
- Reads the existing
catalog-info.yamlfor the service - Provisions the Fargate task definition and ECS service
- Configures an ALB with gradual traffic shifting
- Updates the catalog entry to reflect the new deployment stage
- Archives the Lambda function after traffic is fully migrated
This makes infrastructure evolution a deliberate, tracked, self-service action rather than an ad-hoc operational task.
Cost, Reliability, and Scalability Trade-offs
Cost
| Stage | Monthly Cost (Typical) | Cost Driver |
|---|---|---|
| Lambda | $5–$50 | Pay per invocation, near-zero at low volume |
| Fargate | $100–$500 | Always-on tasks, ALB, managed DB |
| EKS | $2,000–$10,000+ | Control plane, nodes, stateful services |
The key insight: don’t over-provision. Running EKS on day one for a prototype is a $2,000/month mistake. Running Lambda for a service handling 10 million daily active users is an architectural mistake. Backstage helps teams stay at the right tier by making transitions explicit and low-friction.
Reliability
| Stage | Availability Target | HA Mechanism |
|---|---|---|
| Lambda | 99.95% (AWS SLA) | Automatic multi-AZ, managed by AWS |
| Fargate | 99.99% (with Multi-AZ) | Multiple tasks across AZs behind ALB |
| EKS | 99.99%+ | PodDisruptionBudgets, Karpenter, multi-AZ node groups |
Each tier offers higher reliability ceiling but requires more configuration to achieve it. Backstage templates encode the right reliability defaults for each tier — developers don’t need to know the details of PodDisruptionBudgets or ALB health check tuning.
Scalability
| Stage | Scale Trigger | Scale Speed | Limits |
|---|---|---|---|
| Lambda | Automatic (invocation-based) | Milliseconds | 1,000 concurrent (soft limit) |
| Fargate | CPU/memory metrics via ECS Service Auto Scaling | 1–3 minutes | Task quotas per region |
| EKS + Karpenter | Pod pending state → node provisioning | 30–90 seconds | EC2 capacity limits |
| EKS + KEDA | External events (SQS depth, Kafka lag) | Seconds (pods) | Cluster node capacity |
Developer Experience: The Backstage Advantage
Without Backstage, moving from Lambda to Fargate to EKS requires navigating the AWS console, writing Terraform modules, configuring GitHub Actions pipelines, and updating scattered documentation. The cognitive load falls entirely on the developer or DevOps engineer.
With Backstage:
- A developer opens the portal and selects “New Service” or “Upgrade Service Tier”
- They fill in a form with service-specific parameters (name, runtime, scaling preferences)
- Backstage provisions the infrastructure, creates the repository, configures the CI/CD pipeline, and registers the service in the catalog — all automatically
- The developer starts writing business logic within minutes
This self-service model has measurable impact:
- Reduced time-to-production from days or weeks to hours
- Consistent infrastructure — no snowflake services built from tribal knowledge
- Lower cognitive load — developers focus on product, not platform
- Audit trail — every template invocation is logged, providing a history of infrastructure changes
Conclusion
Application infrastructure is not static. The right deployment strategy depends on the application’s lifecycle stage, traffic patterns, team size, and reliability requirements. Starting on AWS Lambda, graduating to Fargate, and eventually operating on EKS is a natural progression that mirrors the growth of the application itself.
Backstage makes this progression explicit, self-service, and safe. By encoding deployment tiers as Software Templates, registering services in the catalog, and linking TechDocs for every component, platform teams give developers a unified portal that grows with their applications — from a weekend prototype costing $5/month to a multi-region production platform costing thousands.
The investment in a Backstage-based internal developer platform pays dividends not just in infrastructure consistency, but in developer velocity, operational visibility, and the organizational confidence to evolve infrastructure without fear.
Start simple. Grow deliberately. Let Backstage guide the journey.