AB
Learn how to deploy a complete AI-powered caption generator application on AWS using modern infrastructure practices. Part 1 covers architecture design, AWS service selection, and Terraform infrastructure setup.
Building and deploying AI applications in the cloud requires careful planning, especially when dealing with file uploads, AI processing, and scalable architecture. In this comprehensive two-part series, I’ll walk you through deploying a real-world AI application - a caption generator that processes images and videos using Google’s Gemini AI and OpenAI Whisper.
Our application consists of:
The frontend allows users to upload images or videos, customize caption tone and length, and receive AI-generated captions. The backend handles file processing, AI integration, and response formatting.
AWS provides several advantages for AI workloads:
Our production architecture leverages multiple AWS services:
Why ECS Fargate:
Cost Comparison:
Why Aurora Serverless v2:
Cost Example:
Why ALB:
Monthly Estimates (us-east-1):
Service | Configuration | Monthly Cost |
---|---|---|
ECS Fargate (Frontend) | 2-5 tasks, 0.25 vCPU, 0.5 GB | $15-40 |
ECS Fargate (Backend) | 2-10 tasks, 1 vCPU, 2 GB | $60-300 |
Aurora Serverless v2 | 0.5-4 ACUs average | $45-350 |
ElastiCache Redis | cache.t3.micro | $15 |
Application Load Balancer | Standard usage | $18 |
CloudFront | 1TB transfer | $85 |
S3 Storage | 100GB + requests | $5 |
Total Estimated | Low-Medium Traffic | $243-813/month |
Terdraform provies:
First, let’s create our Terraform project structure:
# Create the project directory
mkdir caption-generator-infrastructure
cd caption-generator-infrastructure
# Create the directory structure
mkdir -p {modules/{networking,ecs,rds,elasticache,s3,cloudfront},environments/{dev,staging,prod}}
# Create main files
touch {main.tf,variables.tf,outputs.tf,terraform.tfvars}
Your structure should look like:
Before we start with Terraform, ensure you have:
aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region: us-east-1
# Default output format: json
# On macOS
brew install terraform
# On Ubuntu/Debian
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
# Verify installation
terraform --version
# Create bucket for state storage
aws s3 mb s3://your-terraform-state-bucket-unique-name --region us-east-1
# Enable versioning
aws s3api put-bucket-versioning \
--bucket your-terraform-state-bucket-unique-name \
--versioning-configuration Status=Enabled
# Create DynamoDB table for state locking
aws dynamodb create-table \
--table-name terraform-state-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
Create main.tf
:
# Configure the AWS Provider
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
# Backend configuration for state storage
backend "s3" {
bucket = "your-terraform-state-bucket-unique-name"
key = "caption-generator/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
}
}
# Configure the AWS Provider
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = "caption-generator"
Environment = var.environment
ManagedBy = "terraform"
}
}
}
# Data sources for availability zones
data "aws_availability_zones" "available" {
state = "available"
}
# Module declarations
module "networking" {
source = "./modules/networking"
project_name = var.project_name
environment = var.environment
vpc_cidr = var.vpc_cidr
availability_zones = slice(data.aws_availability_zones.available.names, 0, 3)
}
module "s3" {
source = "./modules/s3"
project_name = var.project_name
environment = var.environment
}
module "rds" {
source = "./modules/rds"
project_name = var.project_name
environment = var.environment
vpc_id = module.networking.vpc_id
private_subnets = module.networking.private_subnet_ids
depends_on = [module.networking]
}
module "elasticache" {
source = "./modules/elasticache"
project_name = var.project_name
environment = var.environment
vpc_id = module.networking.vpc_id
private_subnets = module.networking.private_subnet_ids
depends_on = [module.networking]
}
Create variables.tf
:
variable "aws_region" {
description = "AWS region for resources"
type = string
default = "us-east-1"
}
variable "project_name" {
description = "Name of the project"
type = string
default = "caption-generator"
}
variable "environment" {
description = "Environment name (dev, staging, prod)"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "domain_name" {
description = "Domain name for the application"
type = string
default = ""
}
Create terraform.tfvars
:
aws_region = "us-east-1"
project_name = "caption-generator"
environment = "dev"
vpc_cidr = "10.0.0.0/16"
domain_name = "your-domain.com" # Optional
Create modules/networking/main.tf
:
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-${var.environment}-igw"
}
}
# Public Subnets (for Load Balancer)
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
Type = "public"
}
}
# Private Subnets (for ECS tasks and databases)
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
Type = "private"
}
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = length(var.availability_zones)
domain = "vpc"
tags = {
Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
# NAT Gateways (for private subnet internet access)
resource "aws_nat_gateway" "main" {
count = length(var.availability_zones)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
}
depends_on = [aws_internet_gateway.main]
}
# Route Tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-${var.environment}-public-rt"
}
}
resource "aws_route_table" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
}
}
# Route Table Associations
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# Security Groups
resource "aws_security_group" "alb" {
name_prefix = "${var.project_name}-${var.environment}-alb-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-alb-sg"
}
}
resource "aws_security_group" "ecs_tasks" {
name_prefix = "${var.project_name}-${var.environment}-ecs-tasks-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
from_port = 5000
to_port = 5000
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-ecs-tasks-sg"
}
}
Create modules/networking/variables.tf
:
variable "project_name" {
description = "Name of the project"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
}
Create modules/networking/outputs.tf
:
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "alb_security_group_id" {
description = "ID of the ALB security group"
value = aws_security_group.alb.id
}
output "ecs_security_group_id" {
description = "ID of the ECS security group"
value = aws_security_group.ecs_tasks.id
}
Create modules/s3/main.tf
:
# S3 bucket for file uploads (temporary storage)
resource "aws_s3_bucket" "uploads" {
bucket = "${var.project_name}-${var.environment}-uploads-${random_id.bucket_suffix.hex}"
}
# Random suffix to ensure bucket name uniqueness
resource "random_id" "bucket_suffix" {
byte_length = 4
}
# S3 bucket versioning
resource "aws_s3_bucket_versioning" "uploads" {
bucket = aws_s3_bucket.uploads.id
versioning_configuration {
status = "Enabled"
}
}
# S3 bucket encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "uploads" {
bucket = aws_s3_bucket.uploads.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
# S3 bucket public access block (security)
resource "aws_s3_bucket_public_access_block" "uploads" {
bucket = aws_s3_bucket.uploads.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# S3 lifecycle configuration for automatic cleanup
resource "aws_s3_bucket_lifecycle_configuration" "uploads" {
bucket = aws_s3_bucket.uploads.id
rule {
id = "cleanup_temp_files"
status = "Enabled"
# Delete temporary upload files after 1 day
expiration {
days = 1
}
# Delete incomplete multipart uploads after 1 day
abort_incomplete_multipart_upload {
days_after_initiation = 1
}
}
}
# CORS configuration for frontend uploads
resource "aws_s3_bucket_cors_configuration" "uploads" {
bucket = aws_s3_bucket.uploads.id
cors_rule {
allowed_headers = ["*"]
allowed_methods = ["GET", "PUT", "POST", "DELETE"]
allowed_origins = ["http://localhost:3000"] # Update for production
expose_headers = ["ETag"]
max_age_seconds = 3000
}
}
# IAM role for ECS tasks to access S3
resource "aws_iam_role" "ecs_s3_access" {
name = "${var.project_name}-${var.environment}-ecs-s3-access"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ecs-tasks.amazonaws.com"
}
}
]
})
}
# IAM policy for S3 access
resource "aws_iam_policy" "s3_access" {
name = "${var.project_name}-${var.environment}-s3-access"
description = "IAM policy for S3 access from ECS tasks"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
]
Resource = [
"${aws_s3_bucket.uploads.arn}/*"
]
},
{
Effect = "Allow"
Action = [
"s3:ListBucket"
]
Resource = [
aws_s3_bucket.uploads.arn
]
}
]
})
}
# Attach policy to role
resource "aws_iam_role_policy_attachment" "ecs_s3_access" {
role = aws_iam_role.ecs_s3_access.name
policy_arn = aws_iam_policy.s3_access.arn
}
Create the corresponding variables and outputs files for the S3 module.
When handling file uploads for AI processing, security is paramount. Here’s how we implement it:
Store sensitive information securely:
# Store Google API key
aws ssm put-parameter \
--name "/caption-generator/dev/google-api-key" \
--value "your-api-key-here" \
--type "SecureString" \
--description "Google API key for Gemini"
# Store database password
aws ssm put-parameter \
--name "/caption-generator/dev/db-password" \
--value "$(openssl rand -base64 32)" \
--type "SecureString" \
--description "Database password"
cd caption-generator-infrastructure
terraform init
terraform validate
terraform plan -var="environment=dev"
terraform apply -var="environment=dev"
Estimated deployment time: 15-20 minutes Expected costs for dev environment: $50-100/month
In Part 2, we’ll cover:
This foundation provides a secure, scalable infrastructure for your AI application. The modular structure makes it easy to replicate across environments and scale as needed.
Next: Part 2 - Complete DevOps Pipeline for AI Applications →