This guide walks through the full deployment of the EKB EKS infrastructure on AWS using Terragrunt. It covers tool installation, environment setup, and a phased deployment sequence designed to ensure proper dependency ordering across all infrastructure components.
Deployments are organized into nine phases:
- State Management — Bootstraps the S3 bucket used to store Terraform state for the environment.
- EKS Infrastructure — Provisions the VPC, subnets, NAT gateways, IAM roles, and the EKS cluster and managed node groups.
- Storage & Load Balancing — Deploys the EBS CSI driver for persistent volumes and the AWS Load Balancer Controller for ALB ingress.
- Karpenter Autoscaling — Sets up dynamic node provisioning with Spot instance support and interruption handling via SQS and EventBridge.
- KEDA Autoscaling — Deploys KEDA for pod-level autoscaling based on CPU and memory thresholds.
- Data Services — Provisions Supabase (self-hosted or Cloud), ElastiCache Redis, and Amazon MQ RabbitMQ.
- Odin Services — Deploys the EKB application stack (Web, FastAPI, Celery, Automator) via Helm.
- SigNoz Observability — Deploys distributed tracing, metrics, and log aggregation via SigNoz and the k8s-infra agent.
- Final Deployment — Runs a full
terragrunt apply to reconcile any remaining resources.
Before starting, complete the prerequisites checklist with the customer and ensure all <YOUR_*> placeholders in the environment template are filled in. Several values — including the VPC ID, EKS cluster endpoint, and Redis and RabbitMQ endpoints — are only available after specific phases complete, so the guide flags exactly when to capture and apply them.
Prerequisites
- AWS CLI configured with appropriate permissions
- Terraform (>= 1.0)
- Terragrunt (latest version)
kubectl for Kubernetes management
helm for Helm chart management
Installation Guide
Installing Terragrunt
macOS (Homebrew)
Linux (apt)
# Add HashiCorp GPG key
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
# Add HashiCorp repository
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
# Update and install
sudo apt update
sudo apt install terragrunt
Windows (Chocolatey)
Installing kubectl
macOS (Homebrew)
Linux
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
Windows (Chocolatey)
choco install kubernetes-cli
Installing Helm
macOS (Homebrew)
Linux
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Windows (Chocolatey)
choco install kubernetes-helm
Verifying Installation
terragrunt --version
terraform --version
kubectl version --client
helm version
AWS CLI Configuration
# Install AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
# Configure AWS credentials
aws configure
# Verify configuration
aws sts get-caller-identity
Creating a New Environment
Step 1: Copy the Environment Template
The env-template-folder contains pre-structured files with <YOUR_*> placeholders ready to be filled in. Copy it entirely to create your new environment folder.
# Navigate to the terragrunt environments directory
cd terragrunt/environments
# Copy the full template folder to a new environment (replace 'your-env-name')
cp -r env-template-folder your-env-name
# The folder structure is ready:
# your-env-name/
# ├── terragrunt.hcl # Core cluster configuration
# ├── state/
# │ └── terragrunt.hcl # S3 state bucket configuration
# └── values/
# ├── infrastructure.yaml # AWS Load Balancer Controller
# ├── karpenter-values.yaml # Karpenter controller settings
# ├── karpenter-nodeclasses.yaml # EC2NodeClass definitions
# ├── karpenter.yaml # Karpenter NodePool definitions
# ├── keda.yaml # KEDA autoscaler
# ├── aws-ebs-csi-driver.yaml # EBS CSI driver
# ├── odin-services.yaml # Odin application services
# ├── supabase.yaml # Supabase (if self-hosting)
# ├── ha-supabase-db.yaml # Supabase HA DB (if self-hosting)
# ├── cloudnative-pg.yaml # CloudNativePG operator (if self-hosting)
# ├── signoz.yaml # SigNoz observability (optional)
# └── signoz-k8s-infra.yaml # SigNoz k8s metrics (optional)
Step 2: Verify All Placeholders Are Present
cd your-env-name
# List all placeholders that need to be filled in
grep -r "<YOUR_" . --include="*.hcl" --include="*.yaml" | sort
All placeholders follow the <YOUR_*> convention. The steps below walk through filling them in file by file.
Step 3: Provision SSL Certificates (AWS ACM)
Before setting environment variables you need the certificate ARNs. Use the AWS Console to request SSL certificates in AWS Certificate Manager (ACM) for all domains your environment will serve.
Option A: Single wildcard certificate (recommended)
A single wildcard certificate covers all subdomains with one ARN. For example, if your base domain is app.example.com, a single *.app.example.com certificate covers:
| Service | Domain |
|---|
| Web | app.example.com |
| FastAPI | api-app.example.com |
| Automator | automations-app.example.com |
| Supabase | supabase-app.example.com |
| SigNoz | signoz-app.example.com |
Option B: Per-service certificates
Request one certificate per domain if you cannot use a wildcard. Repeat the steps below for each domain: <YOUR_WEB_DOMAIN>, <YOUR_API_DOMAIN>, <YOUR_AUTOMATOR_DOMAIN>, <YOUR_SUPABASE_DOMAIN> (only if ENABLE_SUPABASE=true), <YOUR_SIGNOZ_DOMAIN> (only if ENABLE_SIGNOZ=true).
Requesting a certificate in the AWS Console
- Open the AWS Certificate Manager console
- Switch to the correct region (top-right) — must match
<YOUR_AWS_REGION>
- Click Request a certificate → Request a public certificate → Next
- Under Fully qualified domain name, enter the wildcard (e.g.,
*.app.example.com) or a specific domain
- Set Validation method to DNS validation
- Click Request — the certificate is created in
Pending validation state
Adding the DNS CNAME validation record
ACM generates a CNAME record that you must add to your DNS provider to prove domain ownership. Get the values from the ACM Console by opening the certificate and expanding the domain under Domains.
| DNS Field | Value |
|---|
| Record type | CNAME |
| Name / Host | e.g., _fa187f22ac17bce6f508bf3c56439c61.signoz-app.example.com. |
| Value / Points to | e.g., _c7c97325fe38061e168e232d122c7ff3.jkddzztszm.acm-validations.aws. |
Include the trailing dot (.) at the end of the CNAME values if your DNS provider requires it.
Cloudflare
- Log in to Cloudflare → select your domain → go to DNS → Records → Add record
- Set Type to
CNAME
- Paste the ACM CNAME name into Name and the ACM CNAME value into Target
- Set Proxy status to DNS only (grey cloud icon) — the certificate will not validate through the Cloudflare proxy
- Click Save
Route 53
- Open the Route 53 console → Hosted zones → select your zone → Create record
- Set Record type to
CNAME
- Paste the ACM CNAME name into Record name (subdomain portion only) and the value into Value
- Set TTL to
300 and click Create records
In ACM you can also click Create records in Route 53 to have ACM add the record automatically if the hosted zone is in the same account.
Once DNS propagates (typically 1–5 minutes), the certificate status changes to Issued. Copy the ARN from the top of the certificate — it looks like arn:aws:acm:<region>:<account-id>:certificate/<uuid>. Keep the ARN(s) handy for the next step.
Step 4: Set Environment Variables
Set these shell environment variables before running any Terragrunt commands. They are read directly by terragrunt.hcl via get_env().
export AWS_REGION="<YOUR_AWS_REGION>" # e.g., "eu-west-2", "us-east-2"
export CLUSTER_NAME="<env_folder_name>" # e.g., "env-template-folder"
# Domain configuration
export WEB_DOMAIN="<YOUR_WEB_DOMAIN>" # e.g., "app.example.com"
export FASTAPI_DOMAIN="<YOUR_API_DOMAIN>" # e.g., "api-app.example.com"
export AUTOMATOR_DOMAIN="<YOUR_AUTOMATOR_DOMAIN>" # e.g., "automations-app.example.com"
export SUPABASE_DOMAIN="<YOUR_SUPABASE_DOMAIN>" # e.g., "supabase-app.example.com"
export SIGNOZ_DOMAIN="<YOUR_SIGNOZ_DOMAIN>" # e.g., "signoz-app.example.com"
# SSL Certificate ARNs — Option A: Single wildcard certificate (recommended)
export WILDCARD_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_WILDCARD_CERT_ID>"
# SSL Certificate ARNs — Option B: Per-service certificates
export WEB_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_WEB_CERT_ID>"
export FASTAPI_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_API_CERT_ID>"
export AUTOMATOR_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_AUTOMATOR_CERT_ID>"
export SUPABASE_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_SUPABASE_CERT_ID>"
export SIGNOZ_CERTIFICATE_ARN="arn:aws:acm:<YOUR_AWS_REGION>:<YOUR_AWS_ACCOUNT_ID>:certificate/<YOUR_SIGNOZ_CERT_ID>"
# Service enablement flags
export ENABLE_ALB_CONTROLLER="true"
export ENABLE_AWS_SERVICES="true" # Set to "true" to enable ElastiCache and AmazonMQ
# Supabase stack (self-hosted) — enable all three together if self-hosting Supabase
export ENABLE_CNPG="true" # CloudNativePG operator (namespace: cnpg-system)
export ENABLE_HA_SUPABASE_DB="true" # Supabase HA database (namespace: ha-supabase-db)
export ENABLE_SUPABASE="true" # Supabase application (namespace: supabase)
export ENABLE_SIGNOZ="true" # Set to "true" to enable SigNoz observability
export SSL_TERMINATION="alb"
Spot Instances & Stateful Workloads
Spot instances are configured per NodePool in values/karpenter.yaml, not via environment variables. Each NodePool declares its own capacity strategy:
| NodePool | workload-type label | Capacity Type | Rationale |
|---|
general | general | Spot → On-Demand fallback | Cost-optimised for stateless batch/background workloads |
compute-intensive | compute-intensive | Spot → On-Demand fallback | Cost-optimised for CPU-bound workloads |
memory-intensive | memory-intensive | On-Demand → Spot fallback | Stability prioritised for high-memory pods |
gpu | gpu | Spot → On-Demand fallback | Cost-optimised for AI/ML batch workloads |
application | application | On-Demand only | Stable user-facing services (Supabase, Kong, etc.) — no Spot interruptions |
database | database / node-type: database-dedicated | On-Demand only | Stateful — Spot interruption is unsafe for databases |
The application NodePool uses m/c instance families (generation 5+) with On-Demand only. Supabase service pods are pinned here via nodeSelector: workload-type: "application" to guarantee they are never interrupted by a Spot reclamation event.
The database-dedicated NodePool never uses Spot. It uses consolidationPolicy: WhenEmpty so Karpenter will not evict a node that still has a running pod, making it safe for stateful workloads such as PostgreSQL and CloudNativePG replicas.
Guidelines for stateful applications on Spot:
- Do not schedule databases, persistent queues, or any pod with a
PersistentVolumeClaim on Spot NodePools.
- Use a
nodeSelector targeting node-type: database-dedicated with the matching database-workload: "true" toleration for database pods.
- Use
nodeSelector: workload-type: "application" for user-facing stateless services that must remain available without interruption.
- For background workloads (Web, API, Celery, Automator), the
general Spot NodePool is appropriate — Karpenter’s SQS interruption handler drains Spot nodes gracefully before AWS reclaims them, and KEDA’s minimum replica count (≥ 2) ensures availability during node replacement.
- To disable Spot globally, remove
"spot" from the values list in every NodePool inside values/karpenter.yaml.
How Karpenter handles Spot interruption warnings:
AWS gives a 2-minute interruption notice before terminating a Spot instance. Karpenter uses EventBridge and SQS to act on this automatically:
AWS Spot Interruption Event
│
▼
Amazon EventBridge (CloudWatch Events)
Rule: EC2 Spot Instance Interruption Warning
│
▼
SQS Queue (Karpenter interruption queue)
│
▼
Karpenter Controller (polls SQS continuously)
│
├── Cordons the node (no new pods scheduled)
├── Drains existing pods (respects PodDisruptionBudgets)
├── Provisions a replacement node in parallel
└── Pods reschedule onto the new node before the 2-min window closes
This is configured in the karpenter block in terragrunt.hcl:
karpenter = {
spot_interruption_handling = true # creates the SQS queue and EventBridge rule
enable_spot_instances = true # allows Spot in NodePool capacity requirements
}
Step 5: Update Environment-Specific File Values
Do a find-and-replace across all files in your new env folder for the following placeholders:
| Placeholder | Description | Example |
|---|
<YOUR_ENV_NAME> | Unique environment identifier | app-eks-prod |
<YOUR_AWS_REGION> | AWS region of the cluster | eu-west-2, us-east-2 |
<YOUR_AWS_ACCOUNT_ID> | 12-digit AWS account ID | 123456789012 |
<YOUR_ENVIRONMENT> | Environment tag value | prod, staging, dev |
<YOUR_PROJECT> | Project tag value | odin, ekb |
# Run from your new env folder to find all remaining placeholders
grep -r "<YOUR_" .
5.1 terragrunt.hcl — Core cluster configuration
| Field | Placeholder | Notes |
|---|
cluster_name | <YOUR_ENV_NAME> | Must match EKS cluster name |
cluster_region | <YOUR_AWS_REGION> | AWS region |
aws_account_id | <YOUR_AWS_ACCOUNT_ID> | 12-digit account ID |
vpc_cidr | <YOUR_VPC_CIDR> | e.g., 192.168.0.0/16 |
availability_zones | <YOUR_REGION>a/b/c | 3 AZs in your region |
tags.Environment | <YOUR_ENVIRONMENT> | e.g., prod |
tags.Project | <YOUR_PROJECT> | e.g., odin |
aws_services.amazon_mq.rabbitmq.username | <YOUR_RABBITMQ_USERNAME> | RabbitMQ admin username (only when ENABLE_AWS_SERVICES=true) |
aws_services.amazon_mq.rabbitmq.password | <YOUR_RABBITMQ_PASSWORD> | Min 12 chars; must include uppercase, lowercase, digits, and special characters |
5.2 state/terragrunt.hcl — S3 state bucket
nano state/terragrunt.hcl
| Field | Placeholder | Notes |
|---|
bucket_name | odin-terraform-state-<YOUR_ENV_NAME> | Must be globally unique |
region | <YOUR_AWS_REGION> | Same region as cluster |
5.3 values/infrastructure.yaml — AWS Load Balancer Controller
Obtain the VPC ID after the EKS cluster is created before deploying the AWS Load Balancer Controller.
# Get VPC ID after EKS cluster is created
aws eks describe-cluster --name <YOUR_ENV_NAME> \
--query "cluster.resourcesVpcConfig.vpcId" --output text
nano values/infrastructure.yaml
| Field | Placeholder | Notes |
|---|
clusterName | <YOUR_ENV_NAME> | EKS cluster name |
region | <YOUR_AWS_REGION> | AWS region |
vpcId | <YOUR_VPC_ID> | Required before ALB deploy |
serviceAccount.annotations.eks.amazonaws.com/role-arn | <YOUR_AWS_ACCOUNT_ID>, <YOUR_ENV_NAME> | IAM role for ALB controller |
5.4 values/karpenter-values.yaml — Karpenter controller
Obtain the EKS cluster endpoint after the EKS cluster is created and before deploying Karpenter.
# Get cluster endpoint after EKS cluster is created
aws eks describe-cluster --name <YOUR_ENV_NAME> \
--query "cluster.endpoint" --output text
nano values/karpenter-values.yaml
| Field | Placeholder | Notes |
|---|
serviceAccount.annotations.eks.amazonaws.com/role-arn | <YOUR_AWS_ACCOUNT_ID>, <YOUR_ENV_NAME> | IAM role for Karpenter |
env.CLUSTER_NAME | <YOUR_ENV_NAME> | EKS cluster name |
env.CLUSTER_ENDPOINT | <YOUR_EKS_CLUSTER_ENDPOINT> | Required before Karpenter deploy |
settings.aws.defaultInstanceProfile | <YOUR_ENV_NAME> | Karpenter node instance profile |
5.5 values/karpenter-nodeclasses.yaml — Karpenter node classes
nano values/karpenter-nodeclasses.yaml
| Field | Placeholder | Notes |
|---|
All kubernetes.io/cluster/<YOUR_ENV_NAME> tags | <YOUR_ENV_NAME> | Cluster tag for subnet/SG selectors |
user_data bootstrap cluster name | <YOUR_ENV_NAME> | Node bootstrap script |
tags.Environment | <YOUR_ENVIRONMENT> | e.g., prod |
tags.Project | <YOUR_PROJECT> | e.g., odin |
5.6 values/aws-ebs-csi-driver.yaml — EBS CSI Driver
nano values/aws-ebs-csi-driver.yaml
| Field | Placeholder | Notes |
|---|
controller.serviceAccount.annotations.eks.amazonaws.com/role-arn | <YOUR_AWS_ACCOUNT_ID>, <YOUR_ENV_NAME> | IAM role for EBS CSI controller |
node.serviceAccount.annotations.eks.amazonaws.com/role-arn | <YOUR_AWS_ACCOUNT_ID>, <YOUR_ENV_NAME> | IAM role for EBS CSI node |
controller.env.AWS_DEFAULT_REGION | <YOUR_AWS_REGION> | AWS region |
controller.env.AWS_REGION | <YOUR_AWS_REGION> | AWS region |
node.env.AWS_DEFAULT_REGION | <YOUR_AWS_REGION> | AWS region |
node.env.AWS_REGION | <YOUR_AWS_REGION> | AWS region |
5.7 values/karpenter.yaml — Karpenter NodePools
nano values/karpenter.yaml
| Field | Placeholder | Notes |
|---|
*.labels.Environment | <YOUR_ENVIRONMENT> | Applied to all NodePool labels |
*.requirements topology.kubernetes.io/zone | ["<YOUR_REGION>a", "<YOUR_REGION>b", "<YOUR_REGION>c"] | AZs for all NodePools |
Node class names (general, compute-intensive, memory-intensive, gpu, database) must match entries in karpenter-nodeclasses.yaml.
5.8 values/keda.yaml — KEDA Autoscaler
No environment-specific placeholders required. Resource limits and replica counts are pre-configured with sensible defaults. Review and adjust if needed.
5.9 values/supabase.yaml — Supabase application (only if ENABLE_SUPABASE=true)
All keys below must be generated consistently and shared with ha-supabase-db.yaml. Generate them once and use the same values in both files.
# Generate JWT secret
openssl rand -hex 32
# Generate anon/service role JWTs (requires Supabase CLI)
brew install supabase/tap/supabase
supabase gen-keys
# Generate passwords and tokens
openssl rand -hex 24 # for passwords
openssl rand -base64 64 # for secretKeyBase
nano values/supabase.yaml
| Field | Placeholder | Notes |
|---|
secret.jwt.anonKey | <YOUR_SUPABASE_ANON_KEY> | Must match ha-supabase-db.yaml anonKey |
secret.jwt.serviceKey | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Must match ha-supabase-db.yaml serviceRoleKey |
secret.jwt.secret | <YOUR_SUPABASE_JWT_SECRET> | Must match ha-supabase-db.yaml jwtSecret |
secret.db.password | <YOUR_SUPABASE_DB_PASSWORD> | Must match ha-supabase-db.yaml postgresPassword |
secret.analytics.publicAccessToken | <YOUR_SUPABASE_ANALYTICS_PUBLIC_TOKEN> | Internal Logflare token |
secret.analytics.privateAccessToken | <YOUR_SUPABASE_ANALYTICS_PRIVATE_TOKEN> | Internal Logflare token |
secret.dashboard.username | <YOUR_SUPABASE_DASHBOARD_USERNAME> | Studio UI login |
secret.dashboard.password | <YOUR_SUPABASE_DASHBOARD_PASSWORD> | Studio UI login |
secret.realtime.secretKeyBase | <YOUR_SUPABASE_REALTIME_SECRET_KEY_BASE> | Phoenix secret key |
secret.meta.cryptoKey | <YOUR_SUPABASE_META_CRYPTO_KEY> | openssl rand -hex 32 |
secret.s3.keyId | <YOUR_MINIO_KEY_ID> | Must match secret.minio.user (openssl rand -hex 16) |
secret.s3.accessKey | <YOUR_MINIO_ACCESS_KEY> | Must match secret.minio.password (openssl rand -hex 32) |
secret.minio.user | <YOUR_MINIO_KEY_ID> | Same value as secret.s3.keyId |
secret.minio.password | <YOUR_MINIO_ACCESS_KEY> | Same value as secret.s3.accessKey |
5.10 values/ha-supabase-db.yaml — Supabase HA Database (only if ENABLE_HA_SUPABASE_DB=true)
Secrets here must match supabase.yaml. Use the same generated values for postgresPassword, jwtSecret, anonKey, and serviceRoleKey.
nano values/ha-supabase-db.yaml
| Field | Placeholder | Notes |
|---|
secrets.inline.postgresPassword | <YOUR_SUPABASE_DB_PASSWORD> | Must match supabase.yaml secret.db.password |
secrets.inline.authenticatorPassword | <YOUR_SUPABASE_DB_PASSWORD> | Must be identical to postgresPassword |
secrets.inline.pgbouncerPassword | <YOUR_SUPABASE_DB_PASSWORD> | Must be identical to postgresPassword |
secrets.inline.jwtSecret | <YOUR_SUPABASE_JWT_SECRET> | Must match supabase.yaml secret.jwt.secret |
secrets.inline.anonKey | <YOUR_SUPABASE_ANON_KEY> | Must match supabase.yaml secret.jwt.anonKey |
secrets.inline.serviceRoleKey | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Must match supabase.yaml secret.jwt.serviceKey |
Storage class (ebs-csi-gp2), instance counts, and resource limits are pre-configured. Adjust postgres.storage.size and postgres.walStorage.size for your expected data volume.
5.11 values/cloudnative-pg.yaml — CloudNativePG Operator (only if ENABLE_CNPG=true)
No environment-specific placeholders required. This deploys the CNPG operator controller only. Default settings (3 replicas, resource limits) are suitable for most environments.
5.12 values/odin-services.yaml — Odin application services
Redis and RabbitMQ endpoints are only available after Terraform creates those AWS resources. Certificate ARNs must be provisioned in ACM before deployment.
nano values/odin-services.yaml
General settings:
| Field | Placeholder | Notes |
|---|
server | <YOUR_WEB_DOMAIN> | Main web domain |
toolkitEncryptionKey | <YOUR_TOOLKIT_ENCRYPTION_KEY> | Generate: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())" |
Supabase (dataServiceConfig) — self-hosted (ENABLE_SUPABASE=true):
| Field | Placeholder | Source |
|---|
supabase.projectUrl | http://supabase-kong:8000 | Fixed — internal Supabase Kong |
supabase.key | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Same as secret.jwt.serviceKey in supabase.yaml |
supabase.postgres.user | postgres | Fixed for self-hosted |
supabase.postgres.host | ha-supabase-db-postgres-pooler-rw.ha-supabase-db.svc.cluster.local | Fixed — DB Pool service within the cluster |
supabase.postgres.password | <YOUR_SUPABASE_DB_PASSWORD> | Same as secret.db.password in supabase.yaml |
supabase.projectId | (leave empty) | Not used in self-hosted mode |
Supabase (dataServiceConfig) — Supabase Cloud (ENABLE_SUPABASE=false):
| Field | Placeholder | Source |
|---|
supabase.projectUrl | <YOUR_SUPABASE_PROJECT_URL> | Supabase dashboard → Project Settings → API |
supabase.key | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Supabase dashboard → API → service_role key |
supabase.postgres.user | <YOUR_SUPABASE_DB_USER> | Supabase dashboard → Project Settings → Database |
supabase.postgres.host | <YOUR_SUPABASE_DB_HOST> | Supabase dashboard → Database (e.g., aws-0-eu-west-2.pooler.supabase.com) |
supabase.postgres.password | <YOUR_SUPABASE_DB_PASSWORD> | Supabase dashboard → Project Settings → Database |
supabase.projectId | <YOUR_SUPABASE_PROJECT_ID> | From your Supabase project URL |
Redis:
| Field | Placeholder | Notes |
|---|
redis.url | rediss://<YOUR_REDIS_HOST>:6379?ssl_cert_reqs=none | After Terraform creates ElastiCache |
redis.host | <YOUR_REDIS_HOST> | ElastiCache primary endpoint |
# Get Redis endpoint after Terraform apply
aws elasticache describe-cache-clusters \
--show-cache-node-info \
--query "CacheClusters[?starts_with(CacheClusterId,'<YOUR_ENV_NAME>')].CacheNodes[0].Endpoint.Address" \
--output text
RabbitMQ:
| Field | Placeholder | Notes |
|---|
rabbitmq.url | amqps://<YOUR_RABBITMQ_USERNAME>:<YOUR_RABBITMQ_PASSWORD>@<YOUR_RABBITMQ_HOST>:5671 | After Terraform creates AmazonMQ |
rabbitmq.host | <YOUR_RABBITMQ_HOST> | AmazonMQ broker endpoint |
rabbitmq.username | <YOUR_RABBITMQ_USERNAME> | Set in terragrunt.hcl |
rabbitmq.password | <YOUR_RABBITMQ_PASSWORD> | Set in terragrunt.hcl |
# Get RabbitMQ endpoint after Terraform apply
aws mq list-brokers \
--query "BrokerSummaries[?BrokerName=='odin-rabbitmq'].BrokerId" --output text | \
xargs -I{} aws mq describe-broker --broker-id {} \
--query "BrokerInstances[0].Endpoints[0]" --output text
SSL / Certificate ARNs:
| Field | Placeholder | Notes |
|---|
ssl.services.web.domain | <YOUR_WEB_DOMAIN> | e.g., app.example.com |
ssl.services.web.certificateArn | <YOUR_WEB_CERTIFICATE_ARN> | ACM certificate ARN |
ssl.services.fastapiBackend.domain | <YOUR_API_DOMAIN> | e.g., api-app.example.com |
ssl.services.fastapiBackend.certificateArn | <YOUR_API_CERTIFICATE_ARN> | ACM certificate ARN |
ssl.services.automator.domain | <YOUR_AUTOMATOR_DOMAIN> | e.g., automations-app.example.com |
ssl.services.automator.certificateArn | <YOUR_AUTOMATOR_CERTIFICATE_ARN> | ACM certificate ARN |
ssl.services.supabase.domain | <YOUR_SUPABASE_DOMAIN> | e.g., supabase-app.example.com |
ssl.services.supabase.certificateArn | <YOUR_SUPABASE_CERTIFICATE_ARN> | ACM certificate ARN |
# List ACM certificates in your region
aws acm list-certificates --region <YOUR_AWS_REGION> \
--query "CertificateSummaryList[*].[DomainName,CertificateArn]" --output table
Web frontend Supabase keys — self-hosted (ENABLE_SUPABASE=true):
| Field | Placeholder | Source |
|---|
web.supabase.url | https://<YOUR_SUPABASE_DOMAIN> | External URL routed via ALB ingress |
web.supabase.anonKey | <YOUR_SUPABASE_ANON_KEY> | Same as secret.jwt.anonKey in supabase.yaml |
web.supabase.serviceRoleKey | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Same as secret.jwt.serviceKey in supabase.yaml |
web.supabase.clientanonKey | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Same as secret.jwt.serviceKey in supabase.yaml |
Web frontend Supabase keys — Supabase Cloud (ENABLE_SUPABASE=false):
| Field | Placeholder | Source |
|---|
web.supabase.url | <YOUR_SUPABASE_PROJECT_URL> | Supabase dashboard → Project Settings → API |
web.supabase.anonKey | <YOUR_SUPABASE_ANON_KEY> | Supabase dashboard → API → anon key |
web.supabase.serviceRoleKey | <YOUR_SUPABASE_SERVICE_ROLE_KEY> | Supabase dashboard → API → service_role key |
web.supabase.clientanonKey | <YOUR_SUPABASE_CLIENT_ANON_KEY> | Same as service_role key |
5.13 values/signoz.yaml — SigNoz Observability (only if ENABLE_SIGNOZ=true)
| Field | Placeholder | Notes |
|---|
global.clusterName | <YOUR_ENV_NAME> | EKS cluster name |
signoz.ingress.annotations.alb.ingress.kubernetes.io/certificate-arn | <YOUR_AWS_REGION>, <YOUR_AWS_ACCOUNT_ID>, <YOUR_SIGNOZ_CERTIFICATE_ID> | ACM certificate for SigNoz |
signoz.ingress.hosts[0].host | <YOUR_SIGNOZ_DOMAIN> | e.g., signoz-app.example.com |
5.14 values/signoz-k8s-infra.yaml — SigNoz K8s Metrics (only if ENABLE_SIGNOZ=true)
nano values/signoz-k8s-infra.yaml
| Field | Placeholder | Notes |
|---|
global.clusterName | <YOUR_ENV_NAME> | EKS cluster name for metric labeling |
The OTel collector endpoint (signoz-otel-collector.monitoring.svc.cluster.local:4317) is pre-configured assuming both SigNoz and k8s-infra are deployed in the monitoring namespace. No change needed unless you use a custom release name.
Deployment Ordering Reminder
Some values are only available after certain infrastructure has been deployed. Follow this order:
- Before any deployment — Set:
<YOUR_ENV_NAME>, <YOUR_AWS_REGION>, <YOUR_AWS_ACCOUNT_ID>, <YOUR_ENVIRONMENT>, <YOUR_PROJECT>, <YOUR_VPC_CIDR>, all domain names, all certificate ARNs, all Supabase values, <YOUR_TOOLKIT_ENCRYPTION_KEY>, RabbitMQ username/password
- After EKS cluster created — Set:
<YOUR_VPC_ID> (infrastructure.yaml), <YOUR_EKS_CLUSTER_ENDPOINT> (karpenter-values.yaml)
- After
terraform apply for AWS services — Set: <YOUR_REDIS_HOST>, <YOUR_RABBITMQ_HOST> (odin-services.yaml)
Step 6: Verify No Placeholders Remain
grep -r "<YOUR_" . --include="*.hcl" --include="*.yaml"
Expected output should be empty, or contain only references to resources about to be created (VPC, Redis, MQ, EKS). If any placeholders remain, refer to the Step 5 sub-sections above.
Files checklist:
| File | Step | Required |
|---|
terragrunt.hcl | 5.1 | Always |
state/terragrunt.hcl | 5.2 | Always |
values/infrastructure.yaml | 5.3 | Always |
values/karpenter-values.yaml | 5.4 | Always |
values/karpenter-nodeclasses.yaml | 5.5 | Always |
values/karpenter.yaml | 5.7 | Always |
values/keda.yaml | 5.8 | Always |
values/aws-ebs-csi-driver.yaml | 5.6 | Always |
values/odin-services.yaml | 5.12 | Always |
values/cloudnative-pg.yaml | 5.11 | Only if ENABLE_CNPG=true |
values/ha-supabase-db.yaml | 5.10 | Only if ENABLE_HA_SUPABASE_DB=true |
values/supabase.yaml | 5.9 | Only if ENABLE_SUPABASE=true |
values/signoz.yaml | 5.13 | Only if ENABLE_SIGNOZ=true |
values/signoz-k8s-infra.yaml | 5.14 | Only if ENABLE_SIGNOZ=true |
Phase 1: State Management Setup
Purpose: S3 bucket creation for Terraform state.
Each environment’s state management module creates an S3 bucket with the pattern odin-terraform-state-{environment-name}, configures encryption, versioning, and public access blocking, and uses local state for the state module itself (bootstrap pattern).
cd terragrunt/environments/{your-env-name}/state
terragrunt init
terragrunt plan
terragrunt apply
Phase 2: EKS Infrastructure Deployment
Purpose: Core networking (VPC, subnets, NAT gateway), IAM roles and policies, EKS cluster and managed node groups.
2.1 Dry Run — EKS Infrastructure
Core Infrastructure
cd terragrunt/environments/your-env-name
terragrunt plan -target="aws_vpc.main" \
-target="aws_internet_gateway.main" \
-target="aws_subnet.public" \
-target="aws_subnet.private" \
-target="aws_eip.nat" \
-target="aws_nat_gateway.main" \
-target="aws_route_table.public" \
-target="aws_route_table.private" \
-target="aws_route_table_association.public" \
-target="aws_route_table_association.private"
IAM Roles and Policies
terragrunt plan -target="aws_iam_role.cluster" \
-target="aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy" \
-target="aws_iam_openid_connect_provider.eks" \
-target="aws_iam_role.node" \
-target="aws_iam_role_policy_attachment.node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.node_AmazonEC2ContainerRegistryReadOnly"
EKS Cluster and Node Groups
terragrunt plan -target="aws_eks_cluster.main" \
-target="aws_eks_node_group.main" \
-target="kubernetes_secret.regcred"
Using a Custom / Private Docker Registry
By default, EKB images are pulled from Docker Hub using a secret named regcred. If the customer hosts images in a different registry, follow these steps before deploying odin-services.
Step 1 — Create the imagePullSecret in the target namespace
# Generic private registry (Docker Hub, Quay, self-hosted, etc.)
kubectl create secret docker-registry regcred \
--namespace default \
--docker-server=<YOUR_REGISTRY_HOST> \
--docker-username=<YOUR_REGISTRY_USERNAME> \
--docker-password=<YOUR_REGISTRY_PASSWORD> \
--docker-email=<YOUR_EMAIL>
# AWS ECR — token expires every 12h; refresh via a CronJob or use ECR pull-through cache
aws ecr get-login-password --region <YOUR_AWS_REGION> | \
kubectl create secret docker-registry regcred \
--namespace default \
--docker-server=<YOUR_AWS_ACCOUNT_ID>.dkr.ecr.<YOUR_AWS_REGION>.amazonaws.com \
--docker-username=AWS \
--docker-password-stdin
Step 2 — Set the secret name in values/odin-services.yaml
# values/odin-services.yaml
imagePullSecrets:
- name: regcred # must match the secret name created above
# - name: customer-registry-secret # add additional registries if needed
Step 3 — Update image references
web:
image: <YOUR_REGISTRY_HOST>/<YOUR_ORG>/web:<TAG>
fastapiBackend:
image: <YOUR_REGISTRY_HOST>/<YOUR_ORG>/server:<TAG>
Step 4 — Verify pull access before full deployment
kubectl run registry-test \
--image=<YOUR_REGISTRY_HOST>/<YOUR_ORG>/web:<TAG> \
--overrides='{"spec":{"imagePullSecrets":[{"name":"regcred"}]}}' \
--restart=Never --rm -it -- echo "Pull successful"
2.2 Deploy EKS Infrastructure
Step 1: Core Infrastructure
cd terragrunt/environments/your-env-name
terragrunt apply -target="aws_vpc.main" \
-target="aws_internet_gateway.main" \
-target="aws_subnet.public" \
-target="aws_subnet.private" \
-target="aws_eip.nat" \
-target="aws_nat_gateway.main" \
-target="aws_route_table.public" \
-target="aws_route_table.private" \
-target="aws_route_table_association.public" \
-target="aws_route_table_association.private"
After this step, update vpcId in values/infrastructure.yaml before deploying the AWS Load Balancer Controller.
Step 2: EKS Cluster and IAM Roles and Policies
terragrunt apply -target="aws_iam_role.cluster" \
-target="aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy" \
-target="aws_iam_openid_connect_provider.eks" \
-target="aws_iam_role.node" \
-target="aws_iam_role_policy_attachment.node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.node_AmazonEC2ContainerRegistryReadOnly"
Step 3: Node Groups and Addons
terragrunt apply -target="aws_eks_cluster.main" \
-target="aws_eks_node_group.main" \
-target="kubernetes_secret.regcred"
After this step, update CLUSTER_ENDPOINT in values/karpenter-values.yaml before deploying Karpenter.
Check EKS Cluster Connectivity
aws eks update-kubeconfig --region $AWS_REGION --name $CLUSTER_NAME
kubectl cluster-info
kubectl get nodes
kubectl get secret regcred -n default
Phase 3: Storage and Load Balancing
Purpose: EBS CSI driver for persistent volumes, AWS Load Balancer Controller running on managed node group.
3.1 Dry Run — Storage and Load Balancing
EBS CSI Driver
cd terragrunt/environments/your-env-name
terragrunt plan -target="aws_iam_role.ebs_csi_driver" \
-target="aws_iam_role_policy_attachment.ebs_csi_driver" \
-target="helm_release.ebs_csi_driver"
AWS Load Balancer Controller
terragrunt plan -target="aws_iam_role.aws_load_balancer_controller" \
-target="aws_iam_role_policy_attachment.aws_load_balancer_controller" \
-target="aws_iam_policy.aws_load_balancer_controller" \
-target="helm_release.infrastructure"
3.2 Deploy Storage and Load Balancing
Step 1: EBS CSI Driver
cd terragrunt/environments/your-env-name
terragrunt apply -target="aws_iam_role.ebs_csi_driver" \
-target="aws_iam_role_policy_attachment.ebs_csi_driver" \
-target="helm_release.ebs_csi_driver"
Verification
helm list -n kube-system | grep ebs
kubectl get pods -n kube-system | grep ebs-csi
kubectl get storageclass
aws iam get-role --role-name $CLUSTER_NAME-ebs-csi-driver-role --region $AWS_REGION
kubectl get sa -n kube-system | grep ebs-csi
kubectl describe sa ebs-csi-controller-sa -n kube-system
Step 2: AWS Load Balancer Controller
terragrunt apply -target="aws_iam_role.aws_load_balancer_controller" \
-target="aws_iam_role_policy_attachment.aws_load_balancer_controller" \
-target="aws_iam_policy.aws_load_balancer_controller" \
-target="helm_release.infrastructure"
Verification
helm list -n infrastructure
kubectl get pods -n infrastructure | grep aws-load-balancer-controller
kubectl get sa -n infrastructure
kubectl describe sa aws-load-balancer-controller -n infrastructure
aws iam get-role --role-name $CLUSTER_NAME-aws-load-balancer-controller --region $AWS_REGION
kubectl logs -n infrastructure -l app.kubernetes.io/name=aws-load-balancer-controller
kubectl get ingressclass
Phase 4: Karpenter Autoscaling
Purpose: IAM roles for Karpenter, Spot interruption handling, Karpenter controller and node pools.
4.1 Dry Run — Karpenter
Karpenter IAM Resources
cd terragrunt/environments/your-env-name
terragrunt plan -target="aws_iam_role.karpenter_controller" \
-target="aws_iam_policy.karpenter_controller" \
-target="aws_iam_role_policy_attachment.karpenter_controller" \
-target="aws_iam_role.karpenter_node" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEC2ContainerRegistryReadOnly" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEBSCSIDriverPolicy" \
-target="aws_iam_instance_profile.karpenter_node"
EC2 Spot Service-Linked Role (if spot instances are enabled)
terragrunt plan -target="aws_iam_service_linked_role.ec2_spot[0]"
Karpenter Spot Interruption (if enabled in terragrunt.hcl)
terragrunt plan -target="aws_sqs_queue.karpenter_interruption_queue" \
-target="aws_sqs_queue_policy.karpenter_interruption_queue" \
-target="aws_cloudwatch_event_rule.karpenter_interruption" \
-target="aws_cloudwatch_event_target.karpenter_interruption"
Karpenter Helm Charts
terragrunt plan -target="helm_release.karpenter"
Karpenter NodePools and EC2NodeClasses
terragrunt plan -target="kubernetes_manifest.karpenter_nodepool" \
-target="kubernetes_manifest.karpenter_nodeclass" \
-target="kubernetes_config_map.aws_auth"
An expected error may appear during plan: API did not recognize GroupVersionKind from manifest (CRD may not be installed). This is safe to ignore — Kubernetes validates resources against the live API at plan time, before CRDs are installed.
4.2 Deploy Karpenter
Step 1: Karpenter IAM Resources
cd terragrunt/environments/your-env-name
terragrunt apply -target="aws_iam_role.karpenter_controller" \
-target="aws_iam_policy.karpenter_controller" \
-target="aws_iam_role_policy_attachment.karpenter_controller" \
-target="aws_iam_role.karpenter_node" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEC2ContainerRegistryReadOnly" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEBSCSIDriverPolicy" \
-target="aws_iam_instance_profile.karpenter_node"
Verification
aws iam get-role --role-name $CLUSTER_NAME-karpenter-controller --region $AWS_REGION
aws iam get-role --role-name $CLUSTER_NAME-karpenter-node --region $AWS_REGION
aws iam get-instance-profile --instance-profile-name $CLUSTER_NAME-karpenter-node --region $AWS_REGION
aws iam list-attached-role-policies --role-name $CLUSTER_NAME-karpenter-node --region $AWS_REGION
Step 2: EC2 Spot Service-Linked Role (if spot instances are enabled)
The EC2 Spot service-linked role is account-wide (only one per AWS account) and must exist before Karpenter can launch Spot instances.
Option A: Let Terraform create it (recommended for new deployments)
terragrunt apply -target="aws_iam_service_linked_role.ec2_spot[0]"
Option B: Import if the role already exists
# Check if the role exists
aws iam get-role --role-name AWSServiceRoleForEC2Spot --region $AWS_REGION
# If it doesn't exist, create it manually
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com --region $AWS_REGION
# Import into Terraform (replace ACCOUNT_ID with your 12-digit AWS account ID)
terragrunt import 'aws_iam_service_linked_role.ec2_spot[0]' \
arn:aws:iam::ACCOUNT_ID:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot
Step 3: Karpenter Spot Interruption (if enabled in terragrunt.hcl)
terragrunt apply -target="aws_sqs_queue.karpenter_interruption_queue" \
-target="aws_sqs_queue_policy.karpenter_interruption_queue" \
-target="aws_cloudwatch_event_rule.karpenter_interruption" \
-target="aws_cloudwatch_event_target.karpenter_interruption"
Verification
aws events describe-rule --name $CLUSTER_NAME-karpenter-interruption --region $AWS_REGION
aws events list-targets-by-rule --rule $CLUSTER_NAME-karpenter-interruption --region $AWS_REGION
aws sqs get-queue-url --queue-name $CLUSTER_NAME-karpenter-interruption-queue --region $AWS_REGION
Step 4: Karpenter Helm Chart
terragrunt apply -target="helm_release.karpenter"
Verification
helm list -n kube-system | grep karpenter
kubectl get pods -n kube-system | grep karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
kubectl describe sa karpenter -n kube-system
Step 5: Karpenter Kubernetes Manifests
terragrunt apply -target="kubernetes_manifest.karpenter_nodepool" \
-target="kubernetes_manifest.karpenter_nodeclass"
# Import the existing aws-auth ConfigMap
# Note: Use quotes to prevent zsh from interpreting brackets as glob patterns
terragrunt import 'kubernetes_config_map.aws_auth[0]' kube-system/aws-auth
# Then apply
terragrunt apply -target='kubernetes_config_map.aws_auth[0]'
Verification
kubectl get nodepools -o wide
kubectl describe nodepool general
kubectl describe nodepool application
kubectl describe nodepool database
kubectl get ec2nodeclasses -o wide
kubectl get configmap aws-auth -n kube-system -o jsonpath='{.data.mapRoles}' | grep karpenter-node
kubectl get nodepools -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}'
kubectl get nodes -l karpenter.sh/nodepool --show-labels
kubectl get events -n kube-system --field-selector involvedObject.name=karpenter --sort-by='.lastTimestamp'
Phase 5: KEDA Autoscaling
Purpose: KEDA for application-level autoscaling.
5.1 Dry Run — KEDA
cd terragrunt/environments/your-env-name
terragrunt plan -target="helm_release.keda"
5.2 Deploy KEDA
cd terragrunt/environments/your-env-name
terragrunt apply -target="helm_release.keda"
Verification
helm list -n keda
kubectl get pods -n keda
kubectl get deployment -n keda
kubectl get crd | grep keda
kubectl get validatingwebhookconfigurations | grep keda
kubectl get svc -n keda
Phase 6: Data Services
Purpose: Supabase (database), ElastiCache (Redis), RabbitMQ (message queue).
Deploy CloudNativePG operator first, then the HA Supabase DB cluster, then the Supabase application. The DB cluster must be ready before Supabase starts.
6.1 Dry Run — Data Services
Step 1: CloudNativePG operator (if enabled)
cd terragrunt/environments/your-env-name
ENABLE_CNPG=true terragrunt plan \
--target='helm_release.additional_charts["cloudnative-pg"]'
Step 2: HA Supabase DB (if enabled)
ENABLE_HA_SUPABASE_DB=true terragrunt plan \
--target='helm_release.additional_charts["ha-supabase-db"]'
Step 3: Supabase application (if enabled)
if [ "${ENABLE_SUPABASE:-false}" = "true" ]; then
ENABLE_SUPABASE=true terragrunt plan \
--target='helm_release.supabase[0]'
fi
Step 4: AWS Services — ElastiCache and RabbitMQ (if enabled)
if [ "${ENABLE_AWS_SERVICES:-false}" = "true" ]; then
terragrunt plan -target="aws_elasticache_subnet_group.redis" \
-target="aws_security_group.redis" \
-target="aws_elasticache_replication_group.redis" \
-target="aws_security_group.rabbitmq" \
-target="aws_mq_broker.rabbitmq"
fi
6.2 Deploy Data Services
Step 1: CloudNativePG operator (if enabled)
cd terragrunt/environments/your-env-name
ENABLE_CNPG=true terragrunt apply --auto-approve \
--target='helm_release.additional_charts["cloudnative-pg"]'
Step 2: HA Supabase DB (if enabled)
ENABLE_HA_SUPABASE_DB=true terragrunt apply --auto-approve \
--target='helm_release.additional_charts["ha-supabase-db"]'
Verify PgBouncer pooler and credentials after deployment:
kubectl get svc -n ha-supabase-db | grep pooler
kubectl get secrets -n ha-supabase-db
kubectl get secret ha-supabase-db-authenticator-credentials -n ha-supabase-db \
-o jsonpath='{.data.username}' | base64 -d && echo ""
Use the pooler ClusterIP (or EXTERNAL-IP if LoadBalancer) as the SUPABASE_POSTGRES_HOST value in values/odin-services.yaml and as secret.db.postgresHost in values/supabase.yaml.
Step 3: Supabase application (if enabled)
All Supabase service pods run exclusively on the Karpenter application NodePool (On-Demand only) to prevent Spot interruptions.
if [ "${ENABLE_SUPABASE:-false}" = "true" ]; then
ENABLE_SUPABASE=true terragrunt apply --auto-approve \
--target='helm_release.supabase[0]'
fi
Step 4: AWS Services — ElastiCache and RabbitMQ (if enabled)
if [ "${ENABLE_AWS_SERVICES:-false}" = "true" ]; then
terragrunt apply \
-target="aws_elasticache_subnet_group.redis" \
-target="aws_security_group.redis" \
-target="aws_elasticache_replication_group.redis" \
-target="aws_security_group.rabbitmq" \
-target="aws_mq_broker.rabbitmq"
fi
Verification
# Get connection details from Terraform outputs
terragrunt output elasticache_endpoint
terragrunt output elasticache_port
terragrunt output rabbitmq_endpoint
terragrunt output rabbitmq_port
# Test Redis connectivity from EKS cluster
kubectl run redis-test --image=redis:7-alpine --restart=Never -- \
sh -c "redis-cli -h <redis-endpoint> -p 6379 --tls --insecure ping && echo 'Redis connection successful'"
kubectl logs redis-test
kubectl delete pod redis-test
# Check Redis encryption status
aws elasticache describe-replication-groups \
--replication-group-id $CLUSTER_NAME-redis \
--region $AWS_REGION \
--query 'ReplicationGroups[0].{AtRestEncryption:AtRestEncryptionEnabled,TransitEncryption:TransitEncryptionEnabled}'
Before deploying Odin Services, update values/odin-services.yaml with the Redis endpoint, RabbitMQ endpoint, and all certificate ARNs obtained in this phase.
Phase 7: Odin Services
Purpose: Application deployment via Helm.
Before deploying, temporarily scale down fastapiBackend to a single replica for the initial database migration run — set replicaCount: 1, workers: 1, and keda.minReplicas: 1. Once the migration completes successfully, revert these values to their production defaults before re-deploying.
7.1 Dry Run — Odin Services
cd terragrunt/environments/your-env-name
terragrunt plan -target="helm_release.odin_services"
7.2 Deploy Odin Services
cd terragrunt/environments/your-env-name
terragrunt apply -target="helm_release.odin_services"
Verification
kubectl get pods
kubectl get ingress # Add the ALB endpoints to your DNS provider
Phase 8: SigNoz Observability
Purpose: Logs and metrics monitoring.
8.1 Dry Run — SigNoz Charts
cd terragrunt/environments/your-env-name
terragrunt plan -target='helm_release.additional_charts["signoz"]'
terragrunt plan -target='helm_release.additional_charts["k8s-infra"]'
8.2 Deploy SigNoz Charts
cd terragrunt/environments/your-env-name
terragrunt apply -target='helm_release.additional_charts["signoz"]'
terragrunt apply -target='helm_release.additional_charts["k8s-infra"]'
Verification
kubectl get pods -n monitoring
kubectl get ingress -n monitoring # Add the ALB endpoints to your DNS provider
Phase 9: Final Deployment
9.1 Complete Deployment
cd terragrunt/environments/your-env-name
terragrunt apply
This final apply handles any remaining resources not explicitly targeted in previous phases.
9.2 Verify Deployment
# Update kubeconfig
aws eks update-kubeconfig --region us-east-2 --name your-env-name
# Check cluster status
kubectl get nodes
kubectl get pods --all-namespaces
# Check Karpenter
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
# Check AWS Load Balancer Controller
kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
# Check KEDA
kubectl get pods -n keda
# Check Odin Services
kubectl get pods -n default
kubectl get services -n default
kubectl get ingress -n default
# Check all Helm releases
helm list --all-namespaces
Troubleshooting
State lock issues
terragrunt force-unlock <lock-id>
Karpenter not working
kubectl describe nodes
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter
Load Balancer issues
kubectl describe ingress -n default
kubectl logs -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
Helm chart issues
helm status <release-name> -n <namespace>
helm rollback <release-name> <revision> -n <namespace>
Cleanup
# Destroy infrastructure
cd terragrunt/environments/your-env-name
terragrunt destroy -auto-approve
# Destroy state bucket (use with caution)
cd terragrunt/environments/your-env-name/state
terragrunt destroy -auto-approve
Monitoring and Logging
# AWS Resources
aws eks describe-cluster --name your-env-name --region us-east-2
aws ec2 describe-instances --filters "Name=tag:kubernetes.io/cluster/your-env-name,Values=owned"
# Kubernetes Resources
kubectl top nodes
kubectl top pods --all-namespaces
kubectl get events --sort-by=.metadata.creationTimestamp
Quick Reference — All Deployment Commands
# Phase 1: State Management
cd terragrunt/environments/your-env-name/state
terragrunt apply
# Phase 2: EKS Infrastructure
cd terragrunt/environments/your-env-name
terragrunt apply -target="aws_vpc.main" -target="aws_internet_gateway.main" \
-target="aws_subnet.public" -target="aws_subnet.private" -target="aws_eip.nat" \
-target="aws_nat_gateway.main" -target="aws_route_table.public" \
-target="aws_route_table.private" -target="aws_route_table_association.public" \
-target="aws_route_table_association.private" -auto-approve
terragrunt apply -target="aws_iam_role.cluster" \
-target="aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy" \
-target="aws_iam_openid_connect_provider.eks" -target="aws_iam_role.node" \
-target="aws_iam_role_policy_attachment.node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.node_AmazonEC2ContainerRegistryReadOnly" \
-auto-approve
terragrunt apply -target="aws_eks_cluster.main" \
-target="aws_eks_node_group.main" -target="kubernetes_secret.regcred" -auto-approve
# Phase 3: Storage and Load Balancing
terragrunt apply -target="aws_iam_role.ebs_csi_driver" \
-target="aws_iam_role_policy_attachment.ebs_csi_driver" \
-target="helm_release.ebs_csi_driver" -auto-approve
terragrunt apply -target="aws_iam_role.aws_load_balancer_controller" \
-target="aws_iam_role_policy_attachment.aws_load_balancer_controller" \
-target="aws_iam_policy.aws_load_balancer_controller" \
-target="helm_release.infrastructure" -auto-approve
# Phase 4: Karpenter Autoscaling
terragrunt apply -target="aws_iam_role.karpenter_controller" \
-target="aws_iam_policy.karpenter_controller" \
-target="aws_iam_role_policy_attachment.karpenter_controller" \
-target="aws_iam_role.karpenter_node" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKSWorkerNodePolicy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEKS_CNI_Policy" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEC2ContainerRegistryReadOnly" \
-target="aws_iam_role_policy_attachment.karpenter_node_AmazonEBSCSIDriverPolicy" \
-target="aws_iam_instance_profile.karpenter_node" -auto-approve
# Spot interruption handling (if spot_interruption_handling = true)
terragrunt apply -target="aws_sqs_queue.karpenter_interruption_queue" \
-target="aws_sqs_queue_policy.karpenter_interruption_queue" \
-target="aws_cloudwatch_event_rule.karpenter_interruption" \
-target="aws_cloudwatch_event_target.karpenter_interruption" -auto-approve
terragrunt apply -target="helm_release.karpenter" -auto-approve
terragrunt apply -target="kubernetes_manifest.karpenter_nodepool" \
-target="kubernetes_manifest.karpenter_nodeclass" \
-target='kubernetes_config_map.aws_auth[0]' -auto-approve
# Phase 5: KEDA Autoscaling
terragrunt apply -target="helm_release.keda" -auto-approve
# Phase 6: Data Services
ENABLE_CNPG=true terragrunt apply --target='helm_release.additional_charts["cloudnative-pg"]' -auto-approve
ENABLE_HA_SUPABASE_DB=true terragrunt apply --target='helm_release.additional_charts["ha-supabase-db"]' -auto-approve
if [ "${ENABLE_SUPABASE:-false}" = "true" ]; then
ENABLE_SUPABASE=true terragrunt apply --target='helm_release.supabase[0]' -auto-approve
fi
if [ "${ENABLE_AWS_SERVICES:-false}" = "true" ]; then
terragrunt apply -target="aws_elasticache_subnet_group.redis" \
-target="aws_security_group.redis" \
-target="aws_elasticache_replication_group.redis" \
-target="aws_security_group.rabbitmq" \
-target="aws_mq_broker.rabbitmq" -auto-approve
fi
# Phase 7: Odin Services
terragrunt apply -target="helm_release.odin_services" -auto-approve
# Phase 8: SigNoz (if enabled)
terragrunt apply -target='helm_release.additional_charts["signoz"]' -auto-approve
terragrunt apply -target='helm_release.additional_charts["k8s-infra"]' -auto-approve
# Phase 9: Final Deployment
terragrunt apply -auto-approve
Replace your-env-name with your actual environment name throughout. Always run dry runs (terragrunt plan) first to validate your configuration before applying changes.