CLOUD

AWS: Cloud Infrastructure for Production Workloads

A deep technical guide to building, securing, and optimizing production infrastructure on Amazon Web Services. Covers compute (EC2, ECS, EKS, Lambda), Aurora RDS, S3 + CloudFront, SES, VPC networking, IAM security, CloudWatch monitoring, Route 53 DNS, Secrets Manager, EventBridge, cost optimization, and Infrastructure as Code.

1. Compute: EC2, ECS, EKS, Lambda
2. RDS Aurora: MySQL-Compatible High Availability
3. S3 and CloudFront: Storage + CDN
4. SES: Email Service at Scale
5. VPC Networking: Subnets, Security Groups, NACLs
6. IAM: Roles, Policies, and MFA
7. Cost Optimization: Reserved, Spot, Savings Plans
8. Infrastructure as Code: CloudFormation & Terraform
9. CloudWatch Monitoring and Observability
10. Route 53: DNS and Traffic Management
11. Secrets Manager
12. EventBridge: Event-Driven Architecture
13. Amazon Bedrock: Managed AI/ML
14. Latest AWS Updates (June 2026)
15. Real-World Experience

1. Compute: EC2, ECS, EKS, Lambda

EC2 (Elastic Compute Cloud)

EC2 provides full control over virtual machines. You select instance types, configure networking, and manage the OS. Best for workloads that need persistent state, GPU access, or specific kernel configurations.

Instance families: General purpose (t3, m6i), compute-optimized (c6i), memory-optimized (r6i), GPU (p4d, g5)
Pricing models: On-Demand, Reserved Instances (1yr/3yr commits for 40-60% savings), Spot Instances (up to 90% savings for fault-tolerant workloads)
Auto Scaling Groups with launch templates, target tracking policies, and predictive scaling
Placement groups for low-latency inter-instance communication (cluster, spread, partition)
EBS volume types: gp3 (general), io2 (high IOPS), st1 (throughput), sc1 (cold storage)

ECS (Elastic Container Service)

ECS is AWS's native container orchestration platform. It runs Docker containers without managing Kubernetes complexity. Two launch types: EC2 (you manage instances) and Fargate (serverless).

Task definitions: CPU/memory limits, container images, environment variables, secrets from SSM Parameter Store or Secrets Manager
Services: Desired count, deployment strategies (rolling update, blue/green via CodeDeploy), circuit breaker for failed deployments
Fargate pricing: Pay per vCPU-second and GB-second. No EC2 instance management. Best for variable workloads
Service Connect and Cloud Map for service discovery between microservices
ALB integration with path-based and host-based routing, health checks, and sticky sessions

# ECS task definition (key fields)
{
  "family": "api-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "containerDefinitions": [{
    "name": "api",
    "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest",
    "portMappings": [{"containerPort": 3000, "protocol": "tcp"}],
    "secrets": [
      {"name": "DB_PASSWORD", "valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/prod/db-password"}
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {"awslogs-group": "/ecs/api-service", "awslogs-region": "us-east-1"}
    }
  }]
}

EKS (Elastic Kubernetes Service)

EKS is AWS's managed Kubernetes service. AWS handles the control plane (API server, etcd, scheduler) while you manage worker nodes or use Fargate for serverless pods. Best for teams already invested in the Kubernetes ecosystem or running multi-cloud workloads.

Managed node groups: AWS provisions and manages EC2 instances as worker nodes. Automatic AMI updates and draining
Fargate profiles: Run pods without managing nodes. Define which pods run on Fargate by namespace/label selectors
Add-ons: CoreDNS, kube-proxy, VPC CNI, EBS CSI driver. Managed by AWS with automatic version updates
IAM Roles for Service Accounts (IRSA): Map Kubernetes service accounts to IAM roles. Fine-grained pod-level permissions
Cluster Autoscaler / Karpenter: Scale nodes based on pending pod resource requests. Karpenter is faster and more flexible

EKS costs $0.10/hour for the control plane plus worker node costs. For teams not already using Kubernetes, ECS is simpler and more cost-effective. Choose EKS when you need Kubernetes-specific features like Helm charts, custom operators, or multi-cloud portability.

Lambda (Serverless Functions)

Lambda runs code without provisioning servers. You pay only for execution time (billed per 1ms). Ideal for event-driven architectures, API endpoints with variable traffic, and scheduled tasks.

Triggers: API Gateway, S3 events, SQS, SNS, DynamoDB Streams, EventBridge, CloudWatch Events
Cold start mitigation: Provisioned Concurrency (keeps instances warm), SnapStart (Java 11+, Python 3.12+, .NET 8+ -- GA), smaller package sizes
Memory configuration: 128MB to 10,240MB. CPU scales proportionally with memory allocation
Layers: Share common code/dependencies across functions. Up to 5 layers per function
Limits: 15-minute max execution, 10GB max memory, 250MB deployment package (unzipped), 1000 concurrent executions (default)

Cold starts add 100ms-2s latency on first invocation. SnapStart (now GA for Python 3.12+ and .NET 8+, not just Java) reduces cold starts by up to 90% -- from 2s to under 200ms -- with minimal code changes. For latency-sensitive APIs, use SnapStart or Provisioned Concurrency; consider ECS Fargate for sustained traffic.

2. RDS Aurora: MySQL-Compatible High Availability

Aurora is a MySQL/PostgreSQL-compatible relational database built for the cloud. It separates compute from storage, replicates data 6 ways across 3 AZs, and delivers up to 5x the throughput of standard MySQL on the same hardware.

Storage: Auto-scales from 10GB to 128TB. No need to pre-provision. Data replicated 6 times across 3 AZs
Read replicas: Up to 15 replicas with sub-10ms replication lag. Auto-failover in <30 seconds
Aurora Serverless v2: Scales from 0 to 256 ACUs in increments of 0.5 ACU. At 0 ACUs the instance auto-pauses during inactivity (resumes in ~15 seconds) so you pay nothing for idle compute. Ideal for variable workloads
Backtrack: Rewind the database to any point in the last 72 hours without restoring from backup
Global Database: Cross-region replication with <1 second lag. RPO of 1 second, RTO of <1 minute
Performance Insights: Identify top SQL queries, wait events, and bottlenecks. Free for 7 days of retention

# Aurora cluster endpoint configuration (TypeORM)
{
  type: 'mysql',
  replication: {
    master: {
      host: 'mydb-cluster.cluster-xxxxx.us-east-1.rds.amazonaws.com',
      port: 3306,
      username: 'admin',
      password: process.env.DB_PASSWORD,
      database: 'myapp_prod'
    },
    slaves: [{
      host: 'mydb-cluster.cluster-ro-xxxxx.us-east-1.rds.amazonaws.com',
      port: 3306,
      username: 'admin',
      password: process.env.DB_PASSWORD,
      database: 'myapp_prod'
    }]
  },
  extra: {
    connectionLimit: 20,
    connectTimeout: 10000,
    waitForConnections: true
  }
}

Aurora's reader endpoint automatically load-balances across all read replicas. Use the cluster endpoint for writes and the reader endpoint for reads in your application's connection configuration.

3. S3 and CloudFront: Storage + CDN

S3 (Simple Storage Service)

S3 provides virtually unlimited object storage with 99.999999999% (11 nines) durability.

Storage classes: Standard, Intelligent-Tiering, Standard-IA, One Zone-IA, Glacier Instant/Flexible/Deep Archive
Lifecycle policies: Automatically transition objects between classes based on age
Versioning: Keep every version of every object. Protect against accidental deletes
Server-side encryption: SSE-S3 (default), SSE-KMS (auditable), SSE-C (customer keys)
Pre-signed URLs: Grant temporary access (upload/download) without exposing credentials
S3 Event Notifications: Trigger Lambda, SQS, or SNS on object creation/deletion

CloudFront (CDN)

CloudFront distributes content from 750+ points of presence worldwide with single-digit millisecond latency.

Origin Access Control (OAC): Secure S3 access so objects are only accessible via CloudFront
Cache behaviors: Different TTLs, headers, and compression settings per path pattern
Lambda@Edge and CloudFront Functions: Run code at edge locations for URL rewrites, A/B testing, auth
Real-time logs: Stream access logs to Kinesis for real-time analytics
Price classes: Restrict edge locations to reduce cost (PriceClass_100: US/EU only)

Production Pattern: S3 + CloudFront + OAC

# CloudFormation: S3 bucket with CloudFront distribution
Resources:
  AssetsBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: app-static-assets
      PublicAccessBlockConfiguration:
        BlockPublicAcls: true
        BlockPublicPolicy: true
        IgnorePublicAcls: true
        RestrictPublicBuckets: true
      LifecycleConfiguration:
        Rules:
          - Id: TransitionToIA
            Status: Enabled
            Transitions:
              - StorageClass: STANDARD_IA
                TransitionInDays: 90

  CDN:
    Type: AWS::CloudFront::Distribution
    Properties:
      DistributionConfig:
        Origins:
          - Id: S3Origin
            DomainName: !GetAtt AssetsBucket.RegionalDomainName
            OriginAccessControlId: !Ref OAC
            S3OriginConfig:
              OriginAccessIdentity: ''
        DefaultCacheBehavior:
          TargetOriginId: S3Origin
          ViewerProtocolPolicy: redirect-to-https
          CachePolicyId: 658327ea-f89d-4fab-a63d-7e88639e58f6  # CachingOptimized
          Compress: true
        PriceClass: PriceClass_100
        ViewerCertificate:
          AcmCertificateArn: !Ref SSLCert
          MinimumProtocolVersion: TLSv1.2_2021

4. SES: Email Service at Scale

Amazon SES (Simple Email Service) handles transactional and marketing email at $0.10 per 1,000 emails. It provides high deliverability when configured correctly with authentication records and reputation management.

Authentication: SPF (Sender Policy Framework), DKIM (DomainKeys Identified Mail), DMARC alignment
Sending modes: SMTP interface (port 587), AWS SDK (SendEmail/SendRawEmail API), SendBulkTemplatedEmail for batch
Configuration sets: Track delivery, bounces, complaints, opens, and clicks. Route events to SNS, Kinesis, or CloudWatch
Suppression list: Automatically stops sending to addresses that bounced or complained. Reduces bounce rate
Dedicated IPs: Isolate your sending reputation from other SES users. Required for high-volume senders
Templates: Store email templates in SES. Use Handlebars-style placeholders for personalization

// Node.js: Send transactional email via SES SDK v3
import { SESv2Client, SendEmailCommand } from '@aws-sdk/client-sesv2';

const ses = new SESv2Client({ region: 'us-east-1' });

await ses.send(new SendEmailCommand({
  FromEmailAddress: 'noreply@example.app',
  Destination: { ToAddresses: [userEmail] },
  Content: {
    Template: {
      TemplateName: 'BookingConfirmation',
      TemplateData: JSON.stringify({
        userName: 'Jose',
        className: 'CrossFit 7AM',
        date: '2026-03-17',
        location: 'Santiago Centro'
      })
    }
  },
  ConfigurationSetName: 'app-transactional'
}));

New SES accounts start in sandbox mode (can only send to verified addresses). Request production access early -- approval can take 24-48 hours. Always set up bounce/complaint handling before going live.

5. VPC Networking: Subnets, Security Groups, NACLs

VPC Architecture

A VPC (Virtual Private Cloud) is your isolated network within AWS. Every production workload runs inside a VPC. Proper network design is the foundation of AWS security -- it determines what can communicate with what.

CIDR planning: Use /16 for production VPCs (65,536 IPs). Avoid overlapping CIDRs if you need VPC peering or Transit Gateway
Public subnets: Contain ALBs and NAT Gateways. Route table points 0.0.0.0/0 to Internet Gateway
Private subnets: Contain EC2 instances, ECS tasks, RDS databases, ElastiCache. Route table points 0.0.0.0/0 to NAT Gateway for outbound internet
Multi-AZ: Deploy subnets across at least 2 Availability Zones for high availability. 3 AZs for production workloads
VPC Endpoints: Gateway endpoints (S3, DynamoDB) are free. Interface endpoints (most other services) cost ~$0.01/hour + data charges. Eliminate NAT Gateway data processing fees
VPC Flow Logs: Capture IP traffic metadata for all network interfaces. Send to CloudWatch Logs or S3 for security analysis

Security Groups

Stateful: Return traffic is automatically allowed. Only define inbound rules
Reference other security groups as sources instead of CIDRs for internal traffic
ALB SG: Allow 443 from 0.0.0.0/0. App SG: Allow 3000 only from ALB SG. DB SG: Allow 3306 only from App SG
Default deny: Security groups deny all inbound by default. Explicitly allow only what is needed

Network ACLs

Stateless: You must define both inbound and outbound rules, including ephemeral port ranges
Processed in order by rule number. First matching rule wins. Explicit deny is possible
Use as defense-in-depth: Block known malicious CIDRs at the subnet level before traffic reaches instances
Default NACL allows all traffic. Custom NACLs deny all by default

Production VPC Layout

# Terraform: Multi-AZ VPC with public and private subnets
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "myapp-prod" }
}

resource "aws_subnet" "public" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet("10.0.0.0/16", 8, count.index)       # 10.0.0.0/24, 10.0.1.0/24, 10.0.2.0/24
  availability_zone = data.aws_availability_zones.az.names[count.index]
  map_public_ip_on_launch = true
  tags = { Name = "public-${count.index}" }
}

resource "aws_subnet" "private" {
  count             = 3
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet("10.0.0.0/16", 8, count.index + 10)  # 10.0.10.0/24, 10.0.11.0/24, 10.0.12.0/24
  availability_zone = data.aws_availability_zones.az.names[count.index]
  tags = { Name = "private-${count.index}" }
}

# Security group chain: ALB -> App -> DB
resource "aws_security_group" "alb" {
  vpc_id = aws_vpc.main.id
  ingress { from_port = 443; to_port = 443; protocol = "tcp"; cidr_blocks = ["0.0.0.0/0"] }
}

resource "aws_security_group" "app" {
  vpc_id = aws_vpc.main.id
  ingress { from_port = 3000; to_port = 3000; protocol = "tcp"; security_groups = [aws_security_group.alb.id] }
}

resource "aws_security_group" "db" {
  vpc_id = aws_vpc.main.id
  ingress { from_port = 3306; to_port = 3306; protocol = "tcp"; security_groups = [aws_security_group.app.id] }
}

6. IAM: Roles, Policies, and MFA

Principle of Least Privilege

IAM (Identity and Access Management) controls who can do what in your AWS account. Every API call is evaluated against IAM policies. A misconfigured policy is the most common cause of AWS security breaches.

Never use root account for daily operations. Enable MFA on root and all IAM users
Use IAM roles (not long-lived access keys) for EC2 instances, ECS tasks, and Lambda functions
Policy types: Identity-based (attached to users/roles), Resource-based (attached to S3/SQS/etc.), Permission boundaries
Use AWS Organizations with SCPs (Service Control Policies) to restrict what member accounts can do
IAM Access Analyzer: Identifies resources shared with external accounts. Run continuously
CloudTrail: Log every API call across all AWS services. Enable in all regions. Send to centralized S3 bucket

// Least-privilege policy for an ECS task that reads from S3 and writes to SQS
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::app-static-assets/*"
    },
    {
      "Effect": "Allow",
      "Action": ["sqs:SendMessage"],
      "Resource": "arn:aws:sqs:us-east-1:123456789:notification-queue"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter",
        "ssm:GetParameters"
      ],
      "Resource": "arn:aws:ssm:us-east-1:123456789:parameter/prod/*"
    }
  ]
}

Additional Security Services

AWS WAF: Protect ALBs and CloudFront from SQL injection, XSS, and rate-limit abusive IPs
Secrets Manager: Rotate database passwords automatically. Never store secrets in environment variables or code
GuardDuty: ML-based threat detection. Monitors CloudTrail, VPC Flow Logs, and DNS logs
AWS Config: Track resource configuration history and evaluate compliance rules continuously
Security Hub: Aggregates findings from GuardDuty, Inspector, Macie, and third-party tools into a single dashboard

7. Cost Optimization: Reserved, Spot, Savings Plans

AWS costs can spiral quickly without active management. These strategies consistently cut spending by 30-60% on production workloads.

Right-sizing: Use AWS Compute Optimizer to analyze CPU/memory utilization. Downsize over-provisioned instances. Most teams over-provision by 40-60%
Reserved Instances / Savings Plans: Commit to 1-year or 3-year usage for 30-60% discount. Compute Savings Plans are the most flexible (apply across EC2, Fargate, Lambda)
Spot Instances: Use for stateless workloads, batch processing, CI/CD runners. Combine with On-Demand via mixed instance policies in ASGs
S3 Lifecycle Policies: Move infrequently accessed data to Standard-IA after 30 days, Glacier after 90 days. Archive old logs to Deep Archive
NAT Gateway costs: NAT Gateways charge $0.045/GB of data processed. Use VPC endpoints for S3, DynamoDB, and other AWS services to avoid NAT charges
Turn off dev/staging: Schedule non-production environments to shut down outside business hours. Use Instance Scheduler or Lambda-based automation
Data transfer: Keep traffic within the same AZ when possible. Use CloudFront to reduce origin data transfer costs. Avoid cross-region replication unless required for DR
Cost monitoring: Set up AWS Budgets with alerts at 50%, 80%, and 100% thresholds. Use Cost Explorer's daily granularity to catch anomalies early

The single highest-impact action for most teams: buy Compute Savings Plans for your baseline steady-state usage, and run everything above baseline on Spot or On-Demand.

8. Infrastructure as Code: CloudFormation & Terraform

CloudFormation

AWS-native IaC. YAML/JSON templates. Tight integration with every AWS service
Stacks and nested stacks for modular infrastructure
Change sets: Preview changes before applying. Drift detection to find manual changes
Stack policies: Prevent accidental deletion of critical resources (RDS, S3)
Rollback on failure: Automatically reverts to previous state if deployment fails

Terraform

Multi-cloud IaC by HashiCorp. HCL language. Provider ecosystem for AWS, GCP, Azure, Cloudflare, etc.
State management: Remote state in S3 + DynamoDB locking. State locking prevents concurrent modifications
Modules: Reusable infrastructure components. Terraform Registry has thousands of community modules
Plan/Apply workflow: Always review terraform plan before applying. CI/CD integration with plan as PR comment
Import: Bring existing resources under Terraform management without recreation

Terraform Example: Aurora + VPC

resource "aws_rds_cluster" "aurora" {
  cluster_identifier     = "myapp-prod"
  engine                 = "aurora-mysql"
  engine_version         = "8.0.mysql_aurora.3.04.0"
  database_name          = "myapp"
  master_username        = "admin"
  master_password        = var.db_password
  db_subnet_group_name   = aws_db_subnet_group.private.name
  vpc_security_group_ids = [aws_security_group.aurora.id]
  backup_retention_period = 14
  preferred_backup_window = "03:00-04:00"
  deletion_protection     = true
  storage_encrypted       = true
  kms_key_id             = aws_kms_key.rds.arn

  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 16
  }
}

resource "aws_rds_cluster_instance" "writer" {
  identifier         = "myapp-prod-writer"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.aurora.engine
  engine_version     = aws_rds_cluster.aurora.engine_version
}

resource "aws_rds_cluster_instance" "reader" {
  identifier         = "myapp-prod-reader"
  cluster_identifier = aws_rds_cluster.aurora.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.aurora.engine
  engine_version     = aws_rds_cluster.aurora.engine_version
}

9. CloudWatch Monitoring and Observability

CloudWatch is the central monitoring and observability service for all AWS resources. It collects metrics, logs, and traces. Without proper CloudWatch configuration, production incidents go undetected until users report them.

Metrics: Every AWS service emits default metrics (CPU, network, errors). Custom metrics for application-level data (queue depth, active users, response times)
Alarms: Trigger notifications or Auto Scaling actions when metrics cross thresholds. Use composite alarms to combine multiple conditions
Logs: Centralize logs from ECS tasks, Lambda functions, EC2 instances, and API Gateway. Use Logs Insights to query with SQL-like syntax
Dashboards: Build real-time operational dashboards. Cross-account and cross-region dashboards for multi-account setups
Container Insights: Automatic metrics for ECS and EKS clusters: CPU, memory, network, and disk per task/pod
Anomaly Detection: ML-based anomaly detection on metrics. Automatically adjusts to seasonal patterns. Reduces alert noise
Metric filters: Extract numeric values from log data and create CloudWatch metrics. Example: count 5xx errors per minute from ALB logs

# CloudWatch alarm for high API error rate
resource "aws_cloudwatch_metric_alarm" "api_5xx" {
  alarm_name          = "app-api-5xx-rate"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "HTTPCode_Target_5XX_Count"
  namespace           = "AWS/ApplicationELB"
  period              = 300
  statistic           = "Sum"
  threshold           = 50
  alarm_description   = "API 5xx errors exceeded 50 in 5 minutes"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    TargetGroup  = aws_lb_target_group.api.arn_suffix
    LoadBalancer = aws_lb.main.arn_suffix
  }
}

Set up CloudWatch alarms on day one, not after the first incident. Key alarms: CPU > 80%, memory > 85%, 5xx error rate > 1%, RDS connections > 80% of max, queue depth growing for more than 10 minutes.

10. Route 53: DNS and Traffic Management

Route 53 is AWS's DNS service with 100% SLA uptime. It handles domain registration, DNS resolution, and health-check-based routing. Supports public and private hosted zones.

Routing policies: Simple, weighted (A/B testing), latency-based (route to closest region), failover (active-passive DR), geolocation, multi-value answer
Alias records: AWS-specific record type that maps directly to ALBs, CloudFront, S3 website endpoints, and other AWS resources. No charge for alias queries to AWS resources
Health checks: Monitor endpoint availability from multiple global locations. Failover to standby when primary is unhealthy. Integrates with CloudWatch alarms
Private hosted zones: DNS resolution within VPCs. Internal service names (api.internal.example.com) that resolve to private IPs
DNSSEC: Sign hosted zones to protect against DNS spoofing. Route 53 manages KMS keys for signing

# Route 53: Latency-based routing with health checks
resource "aws_route53_record" "api" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"

  alias {
    name                   = aws_lb.main.dns_name
    zone_id                = aws_lb.main.zone_id
    evaluate_target_health = true
  }

  set_identifier = "us-east-1"
  latency_routing_policy {
    region = "us-east-1"
  }
}

11. Secrets Manager

Secrets Manager stores and rotates database credentials, API keys, and tokens. It eliminates hardcoded secrets and provides automatic rotation with zero-downtime credential updates.

Automatic rotation: Lambda-based rotation for RDS, Redshift, and DocumentDB credentials. Custom rotation for any secret type. Configurable rotation intervals (30, 60, 90 days)
Cross-account access: Share secrets across AWS accounts using resource-based policies. Useful for shared services in multi-account architectures
ECS/Lambda integration: Reference secrets directly in ECS task definitions and Lambda environment variables. Secrets are decrypted at runtime, never stored in plaintext
Versioning: Each secret maintains current and previous versions. Applications can reference specific versions or always fetch the latest
Pricing: $0.40/secret/month + $0.05/10,000 API calls. Far cheaper than a credential leak incident

// Fetch secret from Secrets Manager in Node.js
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';

const client = new SecretsManagerClient({ region: 'us-east-1' });

async function getDbCredentials() {
  const response = await client.send(
    new GetSecretValueCommand({ SecretId: 'myapp/prod/aurora-credentials' })
  );
  return JSON.parse(response.SecretString);
  // { username: "admin", password: "rotated-password-xyz", host: "...", port: 3306 }
}

Never store secrets in environment variables, SSM Parameter Store (unencrypted), or source code. Use Secrets Manager for all credentials. Enable automatic rotation and audit access via CloudTrail.

12. EventBridge: Event-Driven Architecture

EventBridge is a serverless event bus that connects applications using events. It decouples producers from consumers, enabling scalable event-driven architectures. It replaces CloudWatch Events with richer filtering and more targets.

Event buses: Default bus receives AWS service events. Custom buses for application events. Partner buses for SaaS integrations (Datadog, PagerDuty, Stripe)
Rules and patterns: Filter events using content-based matching. Match on event source, detail-type, and any field in the event payload using prefix, suffix, numeric, and exists patterns
Targets: Route matched events to Lambda, SQS, SNS, Step Functions, API Gateway, Kinesis, ECS tasks, and 20+ other targets
Scheduler: EventBridge Scheduler for cron and rate-based schedules. Replaces CloudWatch Events for scheduled tasks. One-time schedules for deferred actions
Archive and replay: Archive events for replay during debugging or recovery. Filter by date range and event pattern
Schema registry: Automatically discovers event schemas from your bus. Generates code bindings for TypeScript, Python, Java

// EventBridge: Send custom application event
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';

const eb = new EventBridgeClient({ region: 'us-east-1' });

await eb.send(new PutEventsCommand({
  Entries: [{
    Source: 'myapp.bookings',
    DetailType: 'BookingConfirmed',
    Detail: JSON.stringify({
      bookingId: 'bk-12345',
      userId: 'usr-67890',
      className: 'CrossFit 7AM',
      locationId: 'loc-santiago-centro',
      timestamp: new Date().toISOString()
    }),
    EventBusName: 'app-events'
  }]
}));
// Rule targets: Lambda (send confirmation email via SES),
// SQS (update analytics), Step Functions (trigger post-booking workflow)

EventBridge is the backbone of event-driven architectures on AWS. Use it instead of direct Lambda-to-Lambda calls or hard-wired SQS queues. It gives you filtering, retry, dead-letter queues, and replay for free.

13. Amazon Bedrock: Managed AI/ML Platform

Amazon Bedrock is AWS's fully managed service for building generative AI applications. It provides access to foundation models from Anthropic (Claude), Amazon (Titan), Meta (Llama), Mistral, Cohere, and others through a unified API. Bedrock handles infrastructure, scaling, and security so you focus on application logic rather than model hosting.

Claude models on Bedrock: Claude Fable 5 (GA on Bedrock June 9, 2026 -- the first Mythos-class model and the most intelligent Claude), Claude Opus 4.8 (May 28, 2026), Claude Opus 4.7 (April 2026), Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5, and Claude Opus 4.1 are all available. Bedrock's next-generation inference engine with dynamic scheduling and scaling logic improves availability for steady-state workloads. Supports both 200K and 1M context windows for processing extensive documents and codebases. The Bedrock Marketplace now hosts nearly 100 serverless foundation models from 10+ providers
Bedrock Agents and AgentCore Runtime: Build autonomous AI agents that can reason, plan, and execute multi-step tasks. Define action groups specifying APIs the agent can call, connect knowledge bases for domain-specific RAG, and orchestrate complex workflows without writing agent orchestration code. The AgentCore Runtime adds a first-class A2A Protocol contract so agents built in Strands, LangGraph, OpenAI Agents SDK, or Google ADK can interoperate with agents in other clouds out of the box
Knowledge Bases (RAG): Fully managed Retrieval Augmented Generation. Ingest documents from S3, chunk and embed them automatically, store in a managed vector database, and query with automatic context injection. Eliminates custom RAG pipeline development
Guardrails: Configurable safety policies for AI applications. Content and word filters, prompt attack detection, denied topic classification, PII redaction, and hallucination detection with Automated Reasoning checks. Blocks up to 88% of harmful content with 99% accuracy on correct response identification
Security and privacy: Data never leaves your AWS account and is never used to train models. VPC isolation, IAM role-based access, encryption in transit and at rest. All API calls logged in CloudTrail for compliance auditing
Claude Code integration: Set CLAUDE_CODE_USE_BEDROCK=1 to route all Claude Code traffic through Bedrock. Traffic stays within your VPC, costs appear on your AWS bill, and IAM policies control who can use AI services

// Invoke Claude on Bedrock (AWS SDK v3)
import { BedrockRuntimeClient, InvokeModelCommand }
  from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient({ region: "us-east-1" });

const response = await client.send(new InvokeModelCommand({
  modelId: "anthropic.claude-sonnet-4-6-20260217-v1:0",
  contentType: "application/json",
  body: JSON.stringify({
    anthropic_version: "bedrock-2023-05-31",
    max_tokens: 4096,
    messages: [{
      role: "user",
      content: "Analyze this architecture for security risks."
    }]
  })
}));

const result = JSON.parse(
  new TextDecoder().decode(response.body)
);

For AI consulting engagements, Bedrock is the recommended starting point for AWS-centric organizations. It provides enterprise-grade security, predictable pricing, and seamless integration with existing AWS infrastructure -- no need to manage GPU instances or model deployments.

14. Latest AWS Updates (June 2026)

AWS DevOps Agent (GA): Investigates incidents, reduces time to resolution, and prevents issues. Preview customers report up to 75% lower MTTR and 3-5x faster resolution. Integrates with CloudWatch, X-Ray, and EventBridge for automated root-cause analysis.

AWS Security Agent (GA): Continuous, context-aware penetration testing integrated into the development lifecycle. Teams report 50%+ faster testing and ~30% lower costs with significantly fewer false positives compared to traditional scanning tools.

Database Savings Plans: Now supports Amazon OpenSearch Service and Neptune Analytics -- save up to 35% on eligible serverless and provisioned instance usage with a one-year commitment.

Elastic Beanstalk AI Analysis: When environment health is degraded, Beanstalk can collect events, instance health, and logs and send them to Amazon Bedrock for analysis, providing step-by-step troubleshooting recommendations.

VPC Encryption Controls: Transitioned from free preview to paid feature starting March 1, 2026.

Lambda SnapStart for Python and .NET (GA): SnapStart now supports Python 3.12+ and .NET 8+ runtimes (in addition to Java 11+), reducing cold starts by up to 90% -- from 2 seconds to sub-200ms. Available in 23+ AWS Regions. Particularly impactful for Python functions loading heavy ML libraries (LangChain, NumPy, Pandas) or web frameworks (Flask, Django). Use runtime hooks to run code before snapshot capture and after resume for proper initialization handling.

Bedrock Model Garden Expansion: Amazon Bedrock now hosts nearly 100 serverless foundation models from 10+ providers. Claude Opus 4.7 launched April 17, 2026, powered by Bedrock's next-generation inference engine with dynamic scheduling. Claude Opus 4.8 followed on May 28, 2026, and Claude Fable 5 -- the first Mythos-class model -- reached GA on Bedrock on June 9, 2026. Bedrock added 18 fully managed open-weight models (Mistral Large 3, Google Gemma, MiniMax, Moonshot, NVIDIA, Qwen) and now supports reinforcement fine-tuning with OpenAI-compatible APIs for open-weight models.

Lambda Managed Instances (Preview): Run Lambda functions on customer-selected EC2 instance types including GPUs (p4d with NVIDIA A100, g5 with A10G). Enables compute-intensive workloads like ML inference and HPC directly within the Lambda programming model, bridging the gap between serverless simplicity and GPU access.

15. Real-World Experience

In production, I architected and managed production AWS infrastructure supporting 26 microservices across multiple Latin American countries. Key components:

Aurora RDS MySQL: Primary database for all services. Writer + reader endpoint separation, automated backups with 14-day retention, Performance Insights for query optimization
S3 + CloudFront: Static asset delivery (images, documents, exports) via CloudFront CDN with OAC. Pre-signed URLs for secure file uploads from mobile apps
SES: Transactional email (booking confirmations, password resets, invoices) with DKIM/SPF authentication and dedicated IP for deliverability
Cost management: Implemented Compute Savings Plans, S3 lifecycle policies, and VPC endpoints that reduced monthly AWS spend by ~35%
Security: Private subnets for all databases, IAM roles for ECS tasks, Secrets Manager for credential rotation, CloudTrail for audit logging

Book free 1-hour consult All Guides Home

More Guides

OrchestrationKubernetes: Container Orchestration ContainersDocker: Containerization Guide CloudGCP: Google Cloud Platform Guide AI DevelopmentClaude Code: AI-Augmented Development EdgeCloudflare: Workers AI & Edge Platform AI AgentsBedrock AgentCore: Deploy AI Agents at Scale