This document provides a comprehensive overview of the AWS architecture for EKB services infrastructure. The architecture follows cloud-native patterns with AWS managed services, Kubernetes orchestration, and automated scaling.

AWS Architecture Overview

The architecture follows cloud-native patterns with AWS managed services, Kubernetes orchestration, and automated scaling.

Architecture Diagram

Self-hosted Supabase uses a CloudNativePG-managed HA PostgreSQL cluster (ha-supabase-db) with PgBouncer pooling, MinIO for object storage, and the full Supabase application stack deployed via helm-deployment/supabase-kubernetes-ha. Supabase Cloud is the alternative if self-hosting is not required.

Key Components

1. Networking Layer

DNS Provider

Purpose: Domain name resolution; works with any provider (Route 53, Cloudflare, etc.)
Domains (use your own domain, e.g. example.com):
- app.example.com — Web Frontend
- api.example.com — FastAPI Backend
- automations.example.com — Automator Service
- supabase.example.com — Supabase Kong (self-hosted only)
- signoz.example.com — SigNoz observability (optional)
SSL Validation: CNAME records required for ACM DNS validation

Application Load Balancer (ALB)

Purpose: SSL termination, load balancing, and hostname-based routing
Features:
- SSL/TLS termination using ACM certificates (wildcard or per-service)
- HTTP → HTTPS redirect
- Health checks for all target groups
Managed by: AWS Load Balancer Controller (Helm chart in helm-deployment/infrastructure)

VPC

CIDR: Environment-specific (e.g. 10.x.0.0/16)
Availability Zones: 3 AZs in the chosen region
Subnets: 3 public (NAT Gateways) + 3 private (EKS nodes)
Outbound: NAT Gateway per AZ for node egress

2. Compute Layer

EKS Cluster

Version: Kubernetes 1.33
System Node Group: Managed node group running Karpenter controller (not on Karpenter-managed nodes, per AWS best practice)
Add-ons: EBS CSI Driver, AWS Load Balancer Controller, CoreDNS, kube-proxy

Karpenter — Dynamic Node Provisioning

Purpose: Just-in-time node provisioning and cost optimisation
Node Classes:
- General Purpose: Spot instances for most workloads
- Compute Intensive: High-CPU instances for CPU-bound tasks
- Memory Intensive: Memory-optimised instances for large datasets
- Database: On-demand instances for stateful/database workloads
- GPU: GPU instances for AI/ML workloads (optional)
Features: Spot prioritisation, automatic consolidation, SQS-based interruption handling

KEDA — Kubernetes Event-Driven Autoscaling

Purpose: Horizontal pod autoscaling based on resource metrics
Targets:

Service	Replicas	CPU Threshold	Memory Threshold
Web Frontend	2–8	60%	80%
FastAPI Backend	2–10	70%	80%
Celery Workers	2–8	70%	80%
Automator	2–8	70%	80%

Scale-down: 30s stabilisation window for fast response

3. Application Services

Web Frontend

Port: 3000
Replicas: 2–8 (KEDA-managed)
Purpose: React application serving the user interface

FastAPI Backend

Port: 8001
Replicas: 2–10 (KEDA-managed)
Purpose: REST API server handling business logic and data access

Celery Workers

Replicas: 2–8 (KEDA-managed)
Purpose: Background task processing (queued via RabbitMQ)

Automator Service

Port: 80
Replicas: 2–8 (KEDA-managed)
Purpose: Workflow automation and orchestration

Supabase Kong

Port: 8000 (internal cluster service)
Purpose: API gateway for all Supabase services
Routing: External traffic reaches Kong via the ALB ingress defined in odin-services/main-ingress.yaml

SigNoz (optional)

Namespace: monitoring
Components: SigNoz platform + k8s-infra DaemonSet agent
Purpose: Distributed tracing, metrics aggregation, log management
Enabled by: ENABLE_SIGNOZ=true

4. Data Layer

ElastiCache Redis

Purpose: Caching, session storage, and Celery broker/result backend
Configuration:
- Node type: configurable (e.g. cache.t3.micro)
- Port: 6379
- Encryption at-rest and in-transit
- Multi-AZ for high availability
Enabled by: ENABLE_AWS_SERVICES=true

Amazon MQ (RabbitMQ)

Purpose: Message queuing for asynchronous task processing
Configuration:
- Engine: RabbitMQ
- Ports: 5671 (AMQP/SSL), 15671 (Management/SSL)
- Deployment mode: single-instance or active/standby
Enabled by: ENABLE_AWS_SERVICES=true

Supabase — Option A: Cloud (managed)

Purpose: External managed PostgreSQL, Auth, Storage, and Realtime
Connection: Supabase project URL and service role key configured in values/odin-services.yaml

Supabase — Option B: Self-hosted on EKS

Purpose: Full Supabase stack running inside the cluster
Components:
- CloudNativePG operator (cnpg-system) — manages the Postgres cluster lifecycle
- HA Supabase DB (ha-supabase-db) — CloudNativePG Cluster resource with PgBouncer pooler
- Supabase application (supabase namespace) — Kong, Auth, Storage (MinIO), Meta, Rest, Realtime, Studio
Deployment order: CloudNativePG → HA Supabase DB → Supabase app
Enabled by: ENABLE_CNPG=true, ENABLE_HA_SUPABASE_DB=true, ENABLE_SUPABASE=true

PostgreSQL Automator

Purpose: Local PostgreSQL database for the Automator service
Port: 5432
Storage: EBS persistent volume
Node affinity: Database-dedicated nodes

5. Security & IAM

IAM Roles

Role	Purpose
EKS Cluster Role	Cluster-level API permissions
Node Group Role	EC2 node permissions (ECR, SSM, networking)
Karpenter Controller Role	EC2 provisioning, SQS interruption queue
AWS Load Balancer Controller Role	ELBv2 and EC2 management
EBS CSI Driver Role	EBS volume lifecycle management

Role names follow the pattern <env-name>-<component> and are created by the EKS Terraform module.

Security Groups

ALB: Auto-created by AWS Load Balancer Controller (80/443 inbound)
EKS Cluster: Node-to-node and pod communication
Redis: Port 6379 from VPC CIDR only
RabbitMQ: Ports 5671, 15671 from VPC CIDR only

SSL/TLS

Termination: ALB level (pods see plain HTTP internally)
Certificates: ACM certificates — either per-service or a single wildcard
Validation: DNS CNAME validation via your DNS provider
Minimum protocol: TLS 1.2

6. Infrastructure as Code

Terraform Modules

modules/eks: EKS cluster, VPC, node groups, Karpenter, IAM, Helm releases
State: S3 bucket with versioning, DynamoDB lock table

Terragrunt

Environment isolation: One directory per environment under terragrunt/environments/
Template: env-template-folder — copy and fill placeholders to create a new environment
DRY configuration: Shared root.hcl with per-environment overrides
Enable/disable flags: Services toggled via environment variables (ENABLE_*)

Helm Charts

Chart	Namespace	Description
`infrastructure`	`infrastructure`	ALB Controller
`odin-services`	`default`	Web, API, Workers, Automator, Ingress
`aws-ebs-csi-driver`	`kube-system`	EBS volume provisioning
`keda`	`keda`	Pod autoscaling
`cloudnative-pg`	`cnpg-system`	PostgreSQL operator
`ha-supabase-db`	`ha-supabase-db`	HA Postgres cluster + PgBouncer
`supabase-kubernetes-ha`	`supabase`	Full Supabase stack
`signoz`	`monitoring`	Observability platform
`k8s-infra`	`monitoring`	Cluster metrics agent

Data Flow

1. User Request Flow

User → DNS → ALB (SSL termination) → EKS Pod (Web Frontend)
                                   → EKS Pod (FastAPI Backend)
                                   → EKS Pod (Supabase Kong)

User accesses app.example.com
DNS resolves to the ALB
ALB terminates SSL and routes by hostname to the correct target group
Web Frontend serves the React app and makes API calls to api.example.com
FastAPI Backend processes requests and reads/writes to data services

2. API Request Flow

Client → ALB → FastAPI Backend → Redis (cache) / RabbitMQ (queue) / Supabase (DB)

Client calls api.example.com
ALB routes to FastAPI pod
Backend checks Redis cache; on miss, queries Supabase database
Async tasks are enqueued in RabbitMQ and processed by Celery Workers

3. Background Processing Flow

FastAPI → RabbitMQ → Celery Worker → Supabase DB

FastAPI enqueues a task in RabbitMQ
Celery Worker dequeues and processes the task
Results are written back to the Supabase database

4. Automator Workflow

Automator → PostgreSQL (local) → Redis → External APIs

Automator receives a workflow request
Workflow state is persisted in the local PostgreSQL instance
Redis caches intermediate results
External APIs are called as part of the automation

5. Scaling Flow

Metrics → KEDA → Pod scaling → Karpenter → Node provisioning

KEDA evaluates CPU/Memory metrics against configured thresholds
Pods are scaled horizontally within the configured replica range
If cluster capacity is insufficient, Karpenter provisions new EC2 nodes (preferring Spot)
When load drops, KEDA scales pods down; Karpenter consolidates and terminates idle nodes

6. Security Flow

Internet → ALB (TLS 1.2+, ACM) → Security Groups → Pods → IAM IRSA roles → AWS APIs

All external traffic terminates TLS at the ALB
Security groups enforce least-privilege network access
Pods communicate with AWS services via IRSA (IAM Roles for Service Accounts)

High Availability Summary

Feature	Implementation
Multi-AZ deployment	3 AZs for EKS nodes, Redis, subnets
Load balancing	ALB with multiple target groups
Pod redundancy	Minimum 2 replicas per service
Database HA	CloudNativePG cluster with PgBouncer (self-hosted) or Supabase Cloud
Cache redundancy	ElastiCache Multi-AZ
Node autoscaling	Karpenter with Spot + On-Demand mix
Pod autoscaling	KEDA CPU/Memory-based
Observability	SigNoz (optional)
State management	S3 with versioning + DynamoDB lock

Cost Optimisation

Spot Instances: Karpenter prioritises Spot for all non-database workloads
Node consolidation: Karpenter automatically reclaims underutilised nodes
Pod right-sizing: KEDA scales pods down during quiet periods
On-Demand only where needed: Database node class uses On-Demand for stability

Maintenance & Operations

Deployment Process

cd terragrunt/environments/<your-env-name>
# Set required ENABLE_* and domain/certificate environment variables
terragrunt apply
# Rolling updates: re-apply after updating image tags or values files

See Terragrunt Deployment Guide for the full deployment sequence.

Backup Strategy

EBS Snapshots: Automated snapshots for persistent volumes (Automator DB, Supabase MinIO)
CloudNativePG: Continuous WAL archiving + scheduled base backups (if configured)
Supabase Cloud: Managed daily backups (cloud option)
IaC state: S3 versioned bucket

Disaster Recovery

Multi-AZ: All stateful services span multiple availability zones
CloudNativePG HA: Automatic failover between Postgres primary and replicas
Supabase Cloud: Cross-region redundancy (cloud option)
Terraform state: S3 versioning allows rollback to any previous state

​AWS Architecture Overview

​Architecture Diagram

​Key Components

​1. Networking Layer

​DNS Provider

​Application Load Balancer (ALB)

​VPC

​2. Compute Layer

​EKS Cluster

​Karpenter — Dynamic Node Provisioning

​KEDA — Kubernetes Event-Driven Autoscaling

​3. Application Services

​Web Frontend

​FastAPI Backend

​Celery Workers

​Automator Service

​Supabase Kong

​SigNoz (optional)

​4. Data Layer

​ElastiCache Redis

​Amazon MQ (RabbitMQ)

​Supabase — Option A: Cloud (managed)

​Supabase — Option B: Self-hosted on EKS

​PostgreSQL Automator

​5. Security & IAM

​IAM Roles

​Security Groups

​SSL/TLS

​6. Infrastructure as Code

​Terraform Modules

​Terragrunt

​Helm Charts

​Data Flow

​1. User Request Flow

​2. API Request Flow

​3. Background Processing Flow

​4. Automator Workflow

​5. Scaling Flow

​6. Security Flow

​High Availability Summary

​Cost Optimisation

​Maintenance & Operations

​Deployment Process

​Backup Strategy

​Disaster Recovery

​Additional Resources

AWS Architecture Overview

Architecture Diagram

Key Components

1. Networking Layer

DNS Provider

Application Load Balancer (ALB)

VPC

2. Compute Layer

EKS Cluster

Karpenter — Dynamic Node Provisioning

KEDA — Kubernetes Event-Driven Autoscaling

3. Application Services

Web Frontend

FastAPI Backend

Celery Workers

Automator Service

Supabase Kong

SigNoz (optional)

4. Data Layer

ElastiCache Redis

Amazon MQ (RabbitMQ)

Supabase — Option A: Cloud (managed)

Supabase — Option B: Self-hosted on EKS

PostgreSQL Automator

5. Security & IAM

IAM Roles

Security Groups

SSL/TLS

6. Infrastructure as Code

Terraform Modules

Terragrunt

Helm Charts

Data Flow

1. User Request Flow

2. API Request Flow

3. Background Processing Flow

4. Automator Workflow

5. Scaling Flow

6. Security Flow

High Availability Summary

Cost Optimisation

Maintenance & Operations

Deployment Process

Backup Strategy

Disaster Recovery

Additional Resources