Comparing AWS ElastiCache Options: Redis vs Serverless Redis - Constraints, Costs, and Functionality

READER BEWARE: THE FOLLOWING WRITTEN ENTIRELY BY AI WITHOUT HUMAN EDITING.

Introduction

AWS ElastiCache is a fully managed in-memory caching service that supports two open-source engines: Redis and Memcached. As applications scale and demand sub-millisecond response times, choosing the right caching strategy becomes critical. With the introduction of ElastiCache Serverless, AWS has expanded the options for deploying Redis, creating new considerations for architecture, cost, and operational complexity.

This comprehensive guide compares the different ElastiCache options available, with a focus on Redis implementations. We’ll examine the constraints, costs, and functionality of each option to help you make an informed decision for your use case.

Overview of ElastiCache Options

AWS ElastiCache offers several deployment options:

ElastiCache for Redis (Node-based) - Traditional cluster deployment with manual node management
ElastiCache Serverless for Redis - Fully serverless, automatically scaling Redis
ElastiCache for Memcached - Simpler caching engine for specific use cases

This guide primarily focuses on Redis options, as Redis has become the de facto standard for in-memory data stores due to its rich feature set and versatility.

ElastiCache for Redis (Node-based)

Overview

ElastiCache for Redis is the traditional deployment model where you provision and manage specific node types. You have full control over node sizing, cluster configuration, and replication topology.

Architecture Options

Cluster Mode Disabled (Single Shard):

Single primary node with optional read replicas
Maximum 5 read replicas per primary
Up to 250 GB of memory per node (depending on instance type)
Best for workloads that fit within a single shard

Cluster Mode Enabled (Sharded):

Horizontal scaling across multiple shards (up to 500 shards)
Each shard has a primary and optional replicas
Data is partitioned across shards using hash slots
Better for large datasets and high throughput requirements

Key Features

Data Structures:

Strings, Lists, Sets, Sorted Sets, Hashes
Bitmaps, HyperLogLogs, Geospatial indexes
Streams (for event sourcing and messaging)
JSON support (with Redis Stack)

Advanced Capabilities:

Pub/Sub messaging
Lua scripting
Transactions
Geospatial queries
Time series data support
Search and query capabilities (with Redis Stack)

High Availability:

Automatic failover with Multi-AZ deployment
Manual failover for planned maintenance
Backup and restore (RDB snapshots, AOF logs)
Point-in-time recovery

Security:

Encryption at rest using KMS
Encryption in transit (TLS)
Redis AUTH for authentication
RBAC (Role-Based Access Control) with Redis 6.0+
VPC isolation

Constraints

Sizing Limitations:

Must choose instance type upfront (cache.t3.micro to cache.r7g.16xlarge)
Maximum memory depends on instance type (up to 317 GB for r7g.16xlarge)
Cannot exceed 500 shards in cluster mode
Limited to 5 read replicas per shard

Operational Overhead:

Manual scaling requires changing instance types or adding shards
Downtime may be required for some configuration changes
Need to monitor memory usage and eviction policies
Manual capacity planning required

Configuration Complexity:

Must understand cluster mode implications
Need to configure parameter groups
Manual replication lag monitoring
Complex migration between cluster modes

Network:

Only accessible within VPC (no public endpoints)
Cross-region replication requires Global Datastore (additional cost)
Maximum 6.1 Gbps network throughput per node (varies by instance)

Versioning:

Must manage Redis version upgrades manually
Some features require specific Redis versions
Backwards compatibility considerations

Cost Structure

Pricing Components:

Node Hours:
- Charged per node per hour
- Varies by instance type and region
- Example pricing (us-east-1):
  - cache.t3.micro: ~~$0.017/hour (~~$12.24/month)
  - cache.m7g.large: ~~$0.149/hour (~~$107.28/month)
  - cache.r7g.xlarge: ~~$0.334/hour (~~$240.48/month)
  - cache.r7g.16xlarge: ~~$5.344/hour (~~$3,847.68/month)
Backup Storage:
- $0.085/GB per month for automatic backups
- No charge for one active backup per cluster
- Additional backups charged at standard rate
Data Transfer:
- Data transfer IN: Free
- Data transfer OUT to internet: $0.09/GB (first 10 TB/month)
- Data transfer OUT to same region: Free
- Data transfer OUT cross-region: $0.02/GB
Global Datastore (Cross-Region Replication):
- Additional charge of ~30% of base node cost
- Cross-region data transfer charges apply

Example Monthly Costs:

Small Development Environment:

Configuration: 1x cache.t3.micro (cluster mode disabled)
Node cost: $12.24/month
Backup storage (1 GB): $0.09/month
Total: ~$12-15/month

Medium Production Environment:

Configuration: 1 primary + 2 replicas, cache.m7g.large (Multi-AZ)
Node cost: 3 × $107.28 = $321.84/month
Backup storage (10 GB): $0.85/month
Total: ~$325-350/month

Large Production Environment:

Configuration: 10 shards × 3 nodes (primary + 2 replicas), cache.r7g.xlarge
Node cost: 30 × $240.48 = $7,214.40/month
Backup storage (100 GB): $8.50/month
Total: ~$7,250-7,500/month

Enterprise Multi-Region:

Configuration: 2 regions, 10 shards × 3 nodes, cache.r7g.4xlarge
Primary region: 30 × $961.92 = $28,857.60/month
Secondary region (Global Datastore): 30 × $961.92 × 1.3 = $37,514.88/month
Cross-region transfer (1 TB/month): $20/month
Total: ~$66,000-68,000/month

Best Use Cases

Ideal For:

Predictable, consistent workload patterns
Applications requiring maximum performance and low latency (<1ms)
Scenarios where you need full control over Redis configuration
Workloads requiring specific instance types for cost optimization
Applications with steady-state traffic that you can capacity plan for
Use cases requiring advanced Redis features (Lua scripts, complex data structures)
High-throughput applications (>1M requests per second)

Not Ideal For:

Highly variable or unpredictable traffic patterns
Small or intermittent workloads with long idle periods
Development/testing environments with sporadic usage
Applications requiring instant, automatic scaling
Teams without Redis expertise or capacity planning experience

ElastiCache Serverless for Redis

Overview

ElastiCache Serverless is a fully serverless deployment option introduced in 2023. It automatically scales capacity based on application traffic patterns, eliminating the need for manual capacity planning and node management.

Architecture

Serverless Model:

No nodes to provision or manage
Automatic scaling from minimal to maximum capacity
Scales in 1 ECPU (ElastiCache Processing Unit) increments
Storage automatically allocated based on data size

Capacity Units:

ECPU (ElastiCache Processing Unit): Measures compute capacity
Storage: Measured in GB, automatically provisioned
Scales independently: compute and storage can scale separately
Minimum: 1 ECPU, Maximum: configurable (up to 5,000 ECPUs)

Key Features

Automatic Scaling:

Scales up within seconds based on traffic
Scales down during low traffic periods
No downtime during scaling operations
Configurable maximum capacity limits

High Availability:

Built-in Multi-AZ replication (always enabled)
Automatic failover
Continuous backups with point-in-time recovery
99.99% SLA for multi-AZ deployments

Data Structures (Supported Subset):

Strings, Lists, Sets, Sorted Sets, Hashes
Bitmaps, HyperLogLogs
Streams (limited functionality)
JSON support

Security:

Encryption at rest (mandatory)
Encryption in transit (mandatory)
VPC isolation
IAM authentication support
RBAC support

Constraints

Feature Limitations:

Lua scripting not supported
Pub/Sub limited to basic functionality
Some Redis commands restricted or limited
Redis Stack features not available
No MULTI/EXEC transactions
Limited Streams functionality

Compatibility:

Compatible with Redis 7.1+ API
Not all Redis commands supported
Some client libraries may require updates
Module support is limited

Scaling:

Cannot manually control specific node types
Scaling is automatic but may not be instant for extreme spikes
Cannot guarantee specific latency SLAs
Cold start penalty for completely idle caches

Network:

VPC-only access (like node-based)
No Global Datastore support currently
Cross-region replication not available
Maximum throughput depends on ECPUs allocated

Operational:

Less visibility into underlying infrastructure
Limited control over Redis configuration parameters
Cannot export/import RDB files directly
Backup management is automatic (less control)

Size Limitations:

Maximum 5,000 ECPUs per cache
Maximum storage: 5 TB per cache
Request size limits may apply

Cost Structure

Pricing Components:

ECPU Hours:
- Charged per ECPU-hour consumed
- Example pricing (us-east-1): ~$0.125/ECPU-hour
- Minimum 1 ECPU-hour per hour of operation
Storage:
- Charged per GB-hour
- Example pricing (us-east-1): ~~$0.125/GB-hour (~~$90/GB-month)
Data Transfer:
- Same as node-based ElastiCache
- Data transfer IN: Free
- Data transfer OUT: Standard AWS rates
Backup Storage:
- Included in base pricing (continuous backups)
- No additional charge for backups

Example Monthly Costs:

Small Variable Workload:

Average: 2 ECPUs, 5 GB storage
ECPU cost: 2 × $0.125 × 730 hours = $182.50/month
Storage cost: 5 × $90 = $450/month
Total: ~$630-650/month

Medium Variable Workload:

Average: 10 ECPUs, 50 GB storage
ECPU cost: 10 × $0.125 × 730 hours = $912.50/month
Storage cost: 50 × $90 = $4,500/month
Total: ~$5,400-5,500/month

Large Spiky Workload:

Average: 50 ECPUs (spikes to 200), 200 GB storage
ECPU cost: 50 × $0.125 × 730 hours = $4,562.50/month
Storage cost: 200 × $90 = $18,000/month
Total: ~$22,500-23,000/month

Important Note: The storage pricing for Serverless is significantly higher than node-based ElastiCache. A cache.r7g.xlarge instance with 26 GB costs ~$240/month, while 26 GB in Serverless costs ~$2,340/month in storage alone.

Best Use Cases

Ideal For:

Variable, unpredictable traffic patterns
Development and testing environments
Applications with periodic spikes (hourly, daily, weekly patterns)
Startups and small teams without Redis operations expertise
Microservices architectures with many small caches
Applications that can tolerate feature limitations
Cost optimization for low-utilization environments

Not Ideal For:

Consistent, high-throughput workloads (cost inefficient)
Applications requiring advanced Redis features (Lua, Pub/Sub, transactions)
Latency-critical applications needing <1ms guarantees
Large datasets (>1 TB) due to storage costs
Workloads requiring maximum performance at lowest cost
Applications needing Global Datastore or cross-region replication

ElastiCache for Memcached (Brief Overview)

When to Consider Memcached

While this guide focuses on Redis, Memcached is still relevant for specific use cases:

Advantages:

Simpler, more straightforward caching
Multi-threaded architecture (better CPU utilization)
Horizontally scalable (up to 40 nodes)
Slightly lower latency for simple get/set operations

Limitations:

No data persistence
Limited data structures (only key-value)
No replication or failover
No backup and restore
No transactions or Pub/Sub

Cost:

Similar pricing to Redis node-based
Example: cache.m7g.large ~~$0.149/hour (~~$107.28/month)

Best For:

Pure caching use cases (not data store)
Applications where data loss on restart is acceptable
Workloads benefiting from multi-threading
Simple distributed caching without advanced features

Detailed Comparison Matrix

Functionality Comparison

Feature	Node-based Redis	Serverless Redis	Memcached
Data Structures
Strings, Lists, Sets, Hashes	✅ Full support	✅ Full support	⚠️ Key-value only
Sorted Sets	✅ Yes	✅ Yes	❌ No
Streams	✅ Full support	⚠️ Limited	❌ No
JSON	✅ Yes (Redis Stack)	✅ Yes	❌ No
Advanced Features
Lua Scripting	✅ Full support	❌ Not supported	❌ No
Pub/Sub	✅ Full support	⚠️ Limited	❌ No
Transactions (MULTI/EXEC)	✅ Yes	❌ Not supported	❌ No
Geospatial	✅ Yes	✅ Yes	❌ No
Persistence & Reliability
Data Persistence	✅ RDB + AOF	✅ Automatic	❌ None
Automatic Backups	✅ Yes	✅ Continuous	❌ No
Point-in-time Recovery	✅ Yes	✅ Yes	❌ No
Multi-AZ	✅ Optional	✅ Always enabled	❌ No
Automatic Failover	✅ Yes (Multi-AZ)	✅ Yes	❌ No
Scaling
Vertical Scaling	⚠️ Manual	✅ Automatic	⚠️ Manual
Horizontal Scaling	⚠️ Manual (cluster mode)	✅ Automatic	✅ Manual (up to 40 nodes)
Scale to Zero	❌ No	⚠️ Minimum 1 ECPU	❌ No
Operations
Capacity Planning	⚠️ Manual required	✅ Automatic	⚠️ Manual required
Node Management	⚠️ Manual	✅ None	⚠️ Manual
Version Upgrades	⚠️ Manual	✅ Automatic	⚠️ Manual
Configuration Control	✅ Full control	⚠️ Limited	✅ Full control
Performance
Latency	✅ <1ms typical	✅ Low single-digit ms	✅ <1ms typical
Max Throughput	✅ Very high	⚠️ High (depends on ECPUs)	✅ Very high
Memory Efficiency	✅ Excellent	✅ Good	✅ Excellent
Network & Regions
Cross-Region Replication	✅ Global Datastore	❌ Not available	❌ No
VPC Access	✅ Yes	✅ Yes	✅ Yes
Public Endpoint	❌ No	❌ No	❌ No
Security
Encryption at Rest	✅ Optional	✅ Mandatory	✅ Optional
Encryption in Transit	✅ Optional	✅ Mandatory	✅ Optional
Authentication	✅ AUTH + RBAC	✅ AUTH + RBAC + IAM	✅ SASL
Pricing Model
Compute	Node-hours	ECPU-hours	Node-hours
Storage	✅ Included	💰 Separate charge	✅ Included
Minimum Cost	~$12/month	~$200-300/month	~$12/month

Cost Comparison by Scenario

Scenario 1: Small Development Cache (5 GB, Low Traffic)

Node-based (cache.t3.micro):

Compute: $12.24/month
Storage: Included
Total: ~$12-15/month ✅ Winner for small, consistent workloads

Serverless:

Compute: 1 ECPU × $0.125 × 730 = $91.25/month
Storage: 5 GB × $90 = $450/month
Total: ~$540-550/month

Winner: Node-based (45x cheaper)

Scenario 2: Medium Production (50 GB, Moderate Traffic, 95% Time Low)

Node-based (cache.m7g.large with 1 replica):

Compute: 2 × $107.28 = $214.56/month
Storage: Included
Total: ~$215-225/month ✅ Winner if traffic is consistent

Serverless (scales 1-10 ECPUs, avg 3):

Compute: 3 ECPU × $0.125 × 730 = $273.75/month
Storage: 50 GB × $90 = $4,500/month
Total: ~$4,750-4,800/month

Winner: Node-based (21x cheaper)

Scenario 3: Large Spiky Workload (100 GB, High Spikes, Low Baseline)

Node-based (cache.r7g.xlarge, must size for peak):

Compute: $240.48/month (single node)
Or 3 nodes with replicas: $721.44/month
Storage: Included
Total: ~$240-750/month

Serverless (1-100 ECPUs, avg 20 ECPUs):

Compute: 20 ECPU × $0.125 × 730 = $1,825/month
Storage: 100 GB × $90 = $9,000/month
Total: ~$10,800-11,000/month

Winner: Node-based (even with over-provisioning)

Scenario 4: Intermittent Development/Testing (10 GB, Used 8 Hours/Day)

Node-based (cache.t3.small, running 24/7):

Compute: $24.48/month (cannot shut down)
Storage: Included
Total: ~$24-30/month

Serverless (1-5 ECPUs, avg 1.5, scales to 0 when idle):

Compute: 1.5 ECPU × $0.125 × 240 hours = $45/month
Storage: 10 GB × $90 × (240/730) = $296/month
Total: ~$340-350/month

Winner: Node-based (still cheaper despite 24/7 operation)

Key Insight: Serverless storage costs dominate the pricing equation, making it more expensive than node-based even for intermittent workloads.

Performance Comparison

Metric	Node-based Redis	Serverless Redis	Memcached
Latency (p50)	<1 ms	1-3 ms	<1 ms
Latency (p99)	~2 ms	3-8 ms	~2 ms
Max Throughput	1M+ ops/sec (large instance)	100K-500K ops/sec	1M+ ops/sec
Cold Start	None (always warm)	5-30 seconds	None
Scaling Speed	Minutes (manual)	Seconds (automatic)	Minutes (manual)
Memory Efficiency	100% (you pay for it)	85-95% (overhead)	100%

When to Choose Each Option

Choose Node-based Redis When:

✅ Performance is Critical:

You need <1ms latency consistently
High throughput requirements (>500K ops/sec)
Maximum performance per dollar

✅ Advanced Features Required:

Lua scripting for complex operations
Full Pub/Sub implementation
MULTI/EXEC transactions
Redis Stack features (Search, JSON, Time Series)

✅ Cost Optimization:

Predictable, consistent workload
Large memory requirements (>100 GB)
Long-running production workloads
You can capacity plan effectively

✅ Global Requirements:

Cross-region replication needed
Global Datastore for multi-region active-active

✅ Full Control:

Need specific Redis version
Custom parameter configurations
Specific instance types for workload

Choose Serverless Redis When:

✅ Variable Workloads:

Traffic spikes at unpredictable times
Seasonal or event-driven applications
Development/testing with sporadic usage

✅ Operational Simplicity:

Small team without Redis expertise
Want to avoid capacity planning
Prefer automatic scaling and management

✅ Small Memory Requirements:

Dataset < 10 GB
Storage costs are manageable

✅ Specific Constraints:

Can work within feature limitations
Don’t need Lua, complex Pub/Sub, or transactions
Acceptable latency is 1-5ms

✅ Experimentation:

Prototyping new applications
Testing different caching strategies
Proof of concept projects

Choose Memcached When:

✅ Pure Caching:

No need for persistence
Data loss on restart is acceptable
Only need simple key-value operations

✅ Multi-threaded Workloads:

Benefit from multi-core CPU utilization
Large objects being cached

✅ Simplicity:

Straightforward caching requirements
No advanced features needed

Migration and Transition Strategies

Migrating from Node-based to Serverless

Preparation:

Audit current Redis usage for unsupported features
Remove or refactor Lua scripts
Update applications to handle slightly higher latency
Test with Redis 7.1+ compatible clients

Migration Steps:

Create Serverless cache in parallel
Dual-write to both caches (temporary)
Gradually shift reads to Serverless
Monitor performance and costs
Decommission node-based cache after validation

Considerations:

Cannot use replication to migrate (different architecture)
May need application-level migration strategy
Watch for storage costs

Migrating from Serverless to Node-based

Reasons to Migrate:

Storage costs are prohibitive
Need advanced features (Lua, transactions)
Consistent high load makes node-based cheaper
Require cross-region replication

Migration Steps:

Provision node-based cluster
Application-level dual-write pattern
Warm up node-based cache
Switch reads to node-based
Decommission Serverless

Capacity Planning:

Use Serverless metrics to size nodes
Look at peak ECPU and storage usage
Add 20-30% buffer for growth

Best Practices

For Node-based Redis

Capacity Planning:

# Monitor key metrics
- Evictions: Should be zero or very low
- Memory usage: Stay below 80% to allow for overhead
- CPU utilization: Below 70% for headroom
- Network throughput: Monitor for saturation

High Availability:

Always use Multi-AZ for production
Use cluster mode for datasets > 100 GB
Configure automatic backups
Test failover scenarios regularly

Performance Optimization:

Use read replicas for read-heavy workloads
Enable cluster mode for horizontal scaling
Choose instance types matching workload (memory vs compute)
Use connection pooling in applications

Cost Optimization:

Use Reserved Instances for long-term workloads (save up to 55%)
Right-size instances based on actual usage
Consider Graviton-based instances (r7g, m7g) for better price/performance
Delete unused snapshots
Use S3 for backup storage (cheaper than ElastiCache backup storage)

Security:

Enable encryption at rest and in transit
Use AUTH and RBAC for access control
Rotate credentials regularly
Enable VPC security groups and NACLs
Use IAM roles where possible

For Serverless Redis

Capacity Planning:

Set appropriate maximum ECPU limits
Monitor scaling patterns
Watch for cold start impacts

Cost Management:

# Monitor and alert on costs
- Set CloudWatch alarms for ECPU usage
- Track storage growth
- Consider node-based if costs exceed threshold

Performance:

Use connection pooling (especially important with scaling)
Implement retry logic for scaling events
Cache warm-up strategies for predictable traffic

Feature Workarounds:

# Instead of Lua scripting
# Option 1: Application-level logic
# Option 2: DynamoDB Transactions for complex operations
# Option 3: Migrate to node-based if critical

# Instead of MULTI/EXEC
# Use application-level transactions
# Or consider Amazon DynamoDB for ACID requirements

Real-World Cost Examples

E-commerce Application

Requirements:

200 GB cache
Peak traffic: 500K requests/second
99.99% availability
Cross-region failover

Node-based Solution:

10 shards × 3 nodes (primary + 2 replicas)
Instance: cache.r7g.2xlarge (52 GB each)
Global Datastore for cross-region
Monthly Cost:
- Primary: 30 × $480.96 = $14,428.80
- Secondary: 30 × $480.96 × 1.3 = $18,757.44
- Total: ~$33,000/month

Serverless Solution:

Not recommended: Storage cost alone would be 200 GB × $90 = $18,000/month
ECPUs for 500K ops/sec would add ~$15,000+/month
Total: ~$33,000+/month (without cross-region support)

Winner: Node-based (same cost but better performance and features)

Startup API Gateway Cache

Requirements:

5 GB cache
Variable traffic: 100-10K requests/second
95% of time at low load
Development + Production

Node-based Solution:

Dev: 1x cache.t3.micro = $12/month
Prod: 3x cache.t3.small = $73/month
Total: ~$85/month

Serverless Solution:

Dev: 1 ECPU + 5 GB storage = $541/month
Prod: 5 ECPU + 5 GB storage = $906/month
Total: ~$1,450/month

Winner: Node-based (17x cheaper)

Microservices Architecture (10 Services)

Requirements:

10 separate caches
2 GB each (20 GB total)
Low-moderate traffic per service

Node-based Solution:

Option 1: 10x cache.t3.micro = $122/month (separate caches)
Option 2: 1x cache.m7g.large shared = $107/month
Total: ~$107-122/month

Serverless Solution:

10 caches × (1 ECPU + 2 GB storage)
10 × ($91 + $180) = $2,710/month
Total: ~$2,710/month

Winner: Node-based (22x cheaper)

Decision Framework

Questions to Guide Your Choice

1. What is your data size?

< 5 GB: Consider Serverless or small node
5-50 GB: Node-based typically cheaper
50-200 GB: Node-based cluster mode
200 GB: Definitely node-based

2. What is your traffic pattern?

Consistent 24/7: Node-based
Variable with spikes: Evaluate both (likely node-based still cheaper)
Intermittent (dev/test): Node-based (t3 instances)
Truly unpredictable: Consider Serverless (but watch costs)

3. What features do you need?

Lua scripting: Node-based only
Transactions: Node-based only
Basic data structures: Both work
Pub/Sub (full): Node-based
Cross-region: Node-based only

4. What is your performance requirement?

<1ms latency: Node-based
1-5ms acceptable: Both work
100K ops/sec: Node-based
<100K ops/sec: Both work

5. What is your team’s expertise?

Redis experts: Node-based (more control)
Small team: Serverless (less operations)
DevOps team: Either works
No Redis experience: Serverless (easier to start)

6. What is your budget?

Cost-sensitive: Node-based (almost always cheaper)
Willing to pay for simplicity: Serverless (if dataset is small)
Enterprise: Node-based with Reserved Instances

Decision Tree

Start
├─ Need Lua, transactions, or cross-region? → Node-based
├─ Dataset > 50 GB? → Node-based
├─ Consistent high traffic (>100K ops/sec)? → Node-based
├─ Development/test environment?
│  ├─ Intermittent usage? → Node-based (t3.micro)
│  └─ Very sporadic? → Consider Serverless
├─ Small dataset (<10 GB) + variable traffic?
│  ├─ Can tolerate 2-5ms latency? → Evaluate costs (likely Node-based)
│  └─ Need <1ms latency? → Node-based
└─ Default: Node-based (better cost/performance ratio)

Monitoring and Observability

Key Metrics to Monitor

Node-based Redis:

- CPUUtilization: Keep below 70%
- DatabaseMemoryUsagePercentage: Keep below 80%
- EvictionCount: Should be minimal
- ReplicationLag: For read replicas
- NetworkBytesIn/Out: Track bandwidth
- CacheHits vs CacheMisses: Cache effectiveness
- CommandCount: Operations per second

Serverless Redis:

- ElastiCacheProcessingUnits: Current ECPU usage
- BytesUsedForCache: Storage consumption
- CacheHits vs CacheMisses: Cache effectiveness
- CommandCount: Operations per second
- SuccessfulCommandLatency: Performance tracking

CloudWatch Alarms (Examples)

Node-based:

# Memory usage alarm
Metric: DatabaseMemoryUsagePercentage
Threshold: > 80%
Action: SNS notification + consider scaling

# CPU utilization alarm
Metric: CPUUtilization
Threshold: > 70%
Action: SNS notification + evaluate instance size

# Eviction alarm
Metric: Evictions
Threshold: > 100 per minute
Action: SNS notification + investigate memory pressure

Serverless:

# ECPU usage alarm
Metric: ElastiCacheProcessingUnits
Threshold: > 80% of configured maximum
Action: SNS notification + consider increasing max limit

# Storage cost alarm
Metric: BytesUsedForCache
Threshold: Custom (e.g., > 100 GB)
Action: SNS notification + evaluate cost implications

Conclusion

AWS ElastiCache offers multiple deployment options, each with distinct tradeoffs:

Summary of Key Findings

Node-based Redis:

✅ Best cost-performance ratio for most workloads
✅ Full Redis feature set
✅ Predictable costs
✅ Maximum performance and control
⚠️ Requires capacity planning and management
⚠️ Manual scaling

Serverless Redis:

✅ Automatic scaling
✅ Simplified operations
✅ Good for truly variable workloads
⚠️ Storage costs are very high
⚠️ Feature limitations
⚠️ More expensive for most use cases

Memcached:

✅ Simple caching
✅ Multi-threaded performance
⚠️ No persistence or advanced features
⚠️ Limited use cases

Final Recommendations

For 90% of use cases: Start with node-based ElastiCache for Redis

Better cost-performance ratio
More features and flexibility
Predictable costs
Mature and battle-tested

For specific scenarios: Consider Serverless Redis

True variable/unpredictable workloads
Small datasets (<10 GB)
Prototyping and experimentation
Teams without Redis expertise
When simplicity trumps cost

For simple caching: Consider Memcached

No persistence needed
Simple key-value caching only
Multi-threaded benefits

Storage Cost Reality Check

The most surprising finding is that Serverless storage costs (~$90/GB/month) are approximately 100x more expensive than the effective storage cost in node-based deployments. This makes Serverless impractical for anything beyond small datasets, even with its automatic scaling benefits.

Cost Optimization Tips

Always start with t3 instances for dev/test environments
Use Reserved Instances for production (save 30-55%)
Consider Graviton instances (r7g, m7g) for 20-40% better price/performance
Right-size regularly based on actual metrics
Use read replicas instead of larger instances for read-heavy workloads
Evaluate cluster mode for horizontal scaling vs vertical scaling
Monitor evictions as a signal to scale up
Clean up snapshots older than retention requirements

Looking Forward

AWS continues to enhance ElastiCache:

Redis 7.x features: Improved performance and functionality
Graviton3 instances: Better price-performance
Enhanced Serverless: Potential feature additions and cost optimizations
Integration improvements: Better AWS service integration

Choose based on your specific requirements, but for most production workloads, traditional node-based ElastiCache for Redis remains the most cost-effective and feature-rich option.

Resources and Further Reading

Official AWS Documentation

Redis Resources

Monitoring and Performance

Cost Optimization

By understanding the constraints, costs, and functionality of each ElastiCache option, you can make informed decisions that balance performance, operational complexity, and budget for your specific use case.