Skip to main content
This document provides a comprehensive overview of the AWS architecture for EKB services infrastructure. The architecture follows cloud-native patterns with AWS managed services, Kubernetes orchestration, and automated scaling.

AWS Architecture Overview

The architecture follows cloud-native patterns with AWS managed services, Kubernetes orchestration, and automated scaling.

Architecture Diagram

Self-hosted Supabase uses a CloudNativePG-managed HA PostgreSQL cluster (ha-supabase-db) with PgBouncer pooling, MinIO for object storage, and the full Supabase application stack deployed via helm-deployment/supabase-kubernetes-ha. Supabase Cloud is the alternative if self-hosting is not required.

Key Components

1. Networking Layer

DNS Provider

  • Purpose: Domain name resolution; works with any provider (Route 53, Cloudflare, etc.)
  • Domains (use your own domain, e.g. example.com):
    • app.example.com — Web Frontend
    • api.example.com — FastAPI Backend
    • automations.example.com — Automator Service
    • supabase.example.com — Supabase Kong (self-hosted only)
    • signoz.example.com — SigNoz observability (optional)
  • SSL Validation: CNAME records required for ACM DNS validation

Application Load Balancer (ALB)

  • Purpose: SSL termination, load balancing, and hostname-based routing
  • Features:
    • SSL/TLS termination using ACM certificates (wildcard or per-service)
    • HTTP → HTTPS redirect
    • Health checks for all target groups
  • Managed by: AWS Load Balancer Controller (Helm chart in helm-deployment/infrastructure)

VPC

  • CIDR: Environment-specific (e.g. 10.x.0.0/16)
  • Availability Zones: 3 AZs in the chosen region
  • Subnets: 3 public (NAT Gateways) + 3 private (EKS nodes)
  • Outbound: NAT Gateway per AZ for node egress

2. Compute Layer

EKS Cluster

  • Version: Kubernetes 1.33
  • System Node Group: Managed node group running Karpenter controller (not on Karpenter-managed nodes, per AWS best practice)
  • Add-ons: EBS CSI Driver, AWS Load Balancer Controller, CoreDNS, kube-proxy

Karpenter — Dynamic Node Provisioning

  • Purpose: Just-in-time node provisioning and cost optimisation
  • Node Classes:
    • General Purpose: Spot instances for most workloads
    • Compute Intensive: High-CPU instances for CPU-bound tasks
    • Memory Intensive: Memory-optimised instances for large datasets
    • Database: On-demand instances for stateful/database workloads
    • GPU: GPU instances for AI/ML workloads (optional)
  • Features: Spot prioritisation, automatic consolidation, SQS-based interruption handling

KEDA — Kubernetes Event-Driven Autoscaling

  • Purpose: Horizontal pod autoscaling based on resource metrics
  • Targets:
ServiceReplicasCPU ThresholdMemory Threshold
Web Frontend2–860%80%
FastAPI Backend2–1070%80%
Celery Workers2–870%80%
Automator2–870%80%
  • Scale-down: 30s stabilisation window for fast response

3. Application Services

Web Frontend

  • Port: 3000
  • Replicas: 2–8 (KEDA-managed)
  • Purpose: React application serving the user interface

FastAPI Backend

  • Port: 8001
  • Replicas: 2–10 (KEDA-managed)
  • Purpose: REST API server handling business logic and data access

Celery Workers

  • Replicas: 2–8 (KEDA-managed)
  • Purpose: Background task processing (queued via RabbitMQ)

Automator Service

  • Port: 80
  • Replicas: 2–8 (KEDA-managed)
  • Purpose: Workflow automation and orchestration

Supabase Kong

  • Port: 8000 (internal cluster service)
  • Purpose: API gateway for all Supabase services
  • Routing: External traffic reaches Kong via the ALB ingress defined in odin-services/main-ingress.yaml

SigNoz (optional)

  • Namespace: monitoring
  • Components: SigNoz platform + k8s-infra DaemonSet agent
  • Purpose: Distributed tracing, metrics aggregation, log management
  • Enabled by: ENABLE_SIGNOZ=true

4. Data Layer

ElastiCache Redis

  • Purpose: Caching, session storage, and Celery broker/result backend
  • Configuration:
    • Node type: configurable (e.g. cache.t3.micro)
    • Port: 6379
    • Encryption at-rest and in-transit
    • Multi-AZ for high availability
  • Enabled by: ENABLE_AWS_SERVICES=true

Amazon MQ (RabbitMQ)

  • Purpose: Message queuing for asynchronous task processing
  • Configuration:
    • Engine: RabbitMQ
    • Ports: 5671 (AMQP/SSL), 15671 (Management/SSL)
    • Deployment mode: single-instance or active/standby
  • Enabled by: ENABLE_AWS_SERVICES=true

Supabase — Option A: Cloud (managed)

  • Purpose: External managed PostgreSQL, Auth, Storage, and Realtime
  • Connection: Supabase project URL and service role key configured in values/odin-services.yaml

Supabase — Option B: Self-hosted on EKS

  • Purpose: Full Supabase stack running inside the cluster
  • Components:
    • CloudNativePG operator (cnpg-system) — manages the Postgres cluster lifecycle
    • HA Supabase DB (ha-supabase-db) — CloudNativePG Cluster resource with PgBouncer pooler
    • Supabase application (supabase namespace) — Kong, Auth, Storage (MinIO), Meta, Rest, Realtime, Studio
  • Deployment order: CloudNativePG → HA Supabase DB → Supabase app
  • Enabled by: ENABLE_CNPG=true, ENABLE_HA_SUPABASE_DB=true, ENABLE_SUPABASE=true

PostgreSQL Automator

  • Purpose: Local PostgreSQL database for the Automator service
  • Port: 5432
  • Storage: EBS persistent volume
  • Node affinity: Database-dedicated nodes

5. Security & IAM

IAM Roles

RolePurpose
EKS Cluster RoleCluster-level API permissions
Node Group RoleEC2 node permissions (ECR, SSM, networking)
Karpenter Controller RoleEC2 provisioning, SQS interruption queue
AWS Load Balancer Controller RoleELBv2 and EC2 management
EBS CSI Driver RoleEBS volume lifecycle management
Role names follow the pattern <env-name>-<component> and are created by the EKS Terraform module.

Security Groups

  • ALB: Auto-created by AWS Load Balancer Controller (80/443 inbound)
  • EKS Cluster: Node-to-node and pod communication
  • Redis: Port 6379 from VPC CIDR only
  • RabbitMQ: Ports 5671, 15671 from VPC CIDR only

SSL/TLS

  • Termination: ALB level (pods see plain HTTP internally)
  • Certificates: ACM certificates — either per-service or a single wildcard
  • Validation: DNS CNAME validation via your DNS provider
  • Minimum protocol: TLS 1.2

6. Infrastructure as Code

Terraform Modules

  • modules/eks: EKS cluster, VPC, node groups, Karpenter, IAM, Helm releases
  • State: S3 bucket with versioning, DynamoDB lock table

Terragrunt

  • Environment isolation: One directory per environment under terragrunt/environments/
  • Template: env-template-folder — copy and fill placeholders to create a new environment
  • DRY configuration: Shared root.hcl with per-environment overrides
  • Enable/disable flags: Services toggled via environment variables (ENABLE_*)

Helm Charts

ChartNamespaceDescription
infrastructureinfrastructureALB Controller
odin-servicesdefaultWeb, API, Workers, Automator, Ingress
aws-ebs-csi-driverkube-systemEBS volume provisioning
kedakedaPod autoscaling
cloudnative-pgcnpg-systemPostgreSQL operator
ha-supabase-dbha-supabase-dbHA Postgres cluster + PgBouncer
supabase-kubernetes-hasupabaseFull Supabase stack
signozmonitoringObservability platform
k8s-inframonitoringCluster metrics agent

Data Flow

1. User Request Flow

User → DNS → ALB (SSL termination) → EKS Pod (Web Frontend)
                                   → EKS Pod (FastAPI Backend)
                                   → EKS Pod (Supabase Kong)
  1. User accesses app.example.com
  2. DNS resolves to the ALB
  3. ALB terminates SSL and routes by hostname to the correct target group
  4. Web Frontend serves the React app and makes API calls to api.example.com
  5. FastAPI Backend processes requests and reads/writes to data services

2. API Request Flow

Client → ALB → FastAPI Backend → Redis (cache) / RabbitMQ (queue) / Supabase (DB)
  1. Client calls api.example.com
  2. ALB routes to FastAPI pod
  3. Backend checks Redis cache; on miss, queries Supabase database
  4. Async tasks are enqueued in RabbitMQ and processed by Celery Workers

3. Background Processing Flow

FastAPI → RabbitMQ → Celery Worker → Supabase DB
  1. FastAPI enqueues a task in RabbitMQ
  2. Celery Worker dequeues and processes the task
  3. Results are written back to the Supabase database

4. Automator Workflow

Automator → PostgreSQL (local) → Redis → External APIs
  1. Automator receives a workflow request
  2. Workflow state is persisted in the local PostgreSQL instance
  3. Redis caches intermediate results
  4. External APIs are called as part of the automation

5. Scaling Flow

Metrics → KEDA → Pod scaling → Karpenter → Node provisioning
  1. KEDA evaluates CPU/Memory metrics against configured thresholds
  2. Pods are scaled horizontally within the configured replica range
  3. If cluster capacity is insufficient, Karpenter provisions new EC2 nodes (preferring Spot)
  4. When load drops, KEDA scales pods down; Karpenter consolidates and terminates idle nodes

6. Security Flow

Internet → ALB (TLS 1.2+, ACM) → Security Groups → Pods → IAM IRSA roles → AWS APIs
  1. All external traffic terminates TLS at the ALB
  2. Security groups enforce least-privilege network access
  3. Pods communicate with AWS services via IRSA (IAM Roles for Service Accounts)

High Availability Summary

FeatureImplementation
Multi-AZ deployment3 AZs for EKS nodes, Redis, subnets
Load balancingALB with multiple target groups
Pod redundancyMinimum 2 replicas per service
Database HACloudNativePG cluster with PgBouncer (self-hosted) or Supabase Cloud
Cache redundancyElastiCache Multi-AZ
Node autoscalingKarpenter with Spot + On-Demand mix
Pod autoscalingKEDA CPU/Memory-based
ObservabilitySigNoz (optional)
State managementS3 with versioning + DynamoDB lock

Cost Optimisation

  • Spot Instances: Karpenter prioritises Spot for all non-database workloads
  • Node consolidation: Karpenter automatically reclaims underutilised nodes
  • Pod right-sizing: KEDA scales pods down during quiet periods
  • On-Demand only where needed: Database node class uses On-Demand for stability

Maintenance & Operations

Deployment Process

cd terragrunt/environments/<your-env-name>
# Set required ENABLE_* and domain/certificate environment variables
terragrunt apply
# Rolling updates: re-apply after updating image tags or values files
See Terragrunt Deployment Guide for the full deployment sequence.

Backup Strategy

  • EBS Snapshots: Automated snapshots for persistent volumes (Automator DB, Supabase MinIO)
  • CloudNativePG: Continuous WAL archiving + scheduled base backups (if configured)
  • Supabase Cloud: Managed daily backups (cloud option)
  • IaC state: S3 versioned bucket

Disaster Recovery

  • Multi-AZ: All stateful services span multiple availability zones
  • CloudNativePG HA: Automatic failover between Postgres primary and replicas
  • Supabase Cloud: Cross-region redundancy (cloud option)
  • Terraform state: S3 versioning allows rollback to any previous state

Additional Resources