Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Learn how to deploy a complete AI-powered caption generator application on AWS using modern infrastructure practices. Part 1 covers architecture design, AWS service selection, and Terraform infrastructure setup.

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Table of Contents

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Building and deploying AI applications in the cloud requires careful planning, especially when dealing with file uploads, AI processing, and scalable architecture. In this comprehensive two-part series, I’ll walk you through deploying a real-world AI application - a caption generator that processes images and videos using Google’s Gemini AI and OpenAI Whisper.

What We’re Building

Our application consists of:

  • Backend: Python Flask API that processes images/videos and generates captions using AI
  • Frontend: Next.js React application with drag-and-drop file upload
  • AI Processing: Google Gemini for image analysis, OpenAI Whisper for video transcription

Application Architecture Overview

Application Architecture Overview

The frontend allows users to upload images or videos, customize caption tone and length, and receive AI-generated captions. The backend handles file processing, AI integration, and response formatting.

Why AWS for AI Applications?

AWS provides several advantages for AI workloads:

  1. Scalable Compute: ECS Fargate auto-scales based on demand
  2. Global Reach: CloudFront reduces latency for file uploads
  3. AI Integration: Easy integration with various AI services
  4. Cost Optimization: Pay-per-use pricing for variable workloads
  5. Security: Comprehensive security services and compliance

AWS Architecture Design

High-Level Architecture

Our production architecture leverages multiple AWS services:

AWS Infrastructure Architecture

Service Selection Rationale

1. Amazon ECS with Fargate vs. EKS

Why ECS Fargate:

  • Serverless: No EC2 instances to manage
  • Cost-Effective: Pay only for running tasks
  • Simpler: Less operational overhead than Kubernetes
  • Quick Setup: Faster deployment for smaller teams

Cost Comparison:

  • ECS Fargate: ~$0.04048 per vCPU per hour + $0.004445 per GB per hour
  • EKS: $0.10 per hour for control plane + EC2 costs
  • Verdict: ECS saves ~60% for our use case

2. Aurora Serverless v2 vs. RDS

Why Aurora Serverless v2:

  • Auto-scaling: Scales from 0.5 to 128 ACUs automatically
  • Cost-Effective: Pay only for actual usage
  • High Availability: Multi-AZ by default
  • Performance: Better for variable AI workloads

Cost Example:

  • Aurora Serverless v2: $0.12 per ACU-hour (minimum 0.5 ACU = $43.2/month)
  • RDS db.t3.micro: $0.017 per hour = $12.24/month
  • Trade-off: Higher minimum cost but better scalability

3. Application Load Balancer vs. Network Load Balancer

Why ALB:

  • Layer 7 Routing: Path-based routing (/api/* → backend)
  • SSL Termination: Handles certificates automatically
  • Health Checks: Application-level health monitoring
  • WebSocket Support: For real-time features

Cost Breakdown Analysis

Monthly Estimates (us-east-1):

ServiceConfigurationMonthly Cost
ECS Fargate (Frontend)2-5 tasks, 0.25 vCPU, 0.5 GB$15-40
ECS Fargate (Backend)2-10 tasks, 1 vCPU, 2 GB$60-300
Aurora Serverless v20.5-4 ACUs average$45-350
ElastiCache Rediscache.t3.micro$15
Application Load BalancerStandard usage$18
CloudFront1TB transfer$85
S3 Storage100GB + requests$5
Total EstimatedLow-Medium Traffic$243-813/month

Infrastructure as Code with Terraform

Why Terraform?

Terdraform provies:

  • Version Control: Infrastructure changes tracked in Git
  • Reproducibility: Identical environments across dev/staging/prod
  • State Management: Tracks resource dependencies
  • Multi-Cloud: Can extend to other providers if needed

Project Structure Setup

First, let’s create our Terraform project structure:

# Create the project directory
mkdir caption-generator-infrastructure
cd caption-generator-infrastructure

# Create the directory structure
mkdir -p {modules/{networking,ecs,rds,elasticache,s3,cloudfront},environments/{dev,staging,prod}}

# Create main files
touch {main.tf,variables.tf,outputs.tf,terraform.tfvars}

Your structure should look like:

Terraform Project Structure

Step 1: AWS Setup and Prerequisites

Before we start with Terraform, ensure you have:

  1. AWS CLI configured:
aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region: us-east-1
# Default output format: json
  1. Terraform installed:
# On macOS
brew install terraform

# On Ubuntu/Debian
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

# Verify installation
terraform --version
  1. Create S3 bucket for Terraform state (one-time setup):
# Create bucket for state storage
aws s3 mb s3://your-terraform-state-bucket-unique-name --region us-east-1

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket your-terraform-state-bucket-unique-name \
  --versioning-configuration Status=Enabled

# Create DynamoDB table for state locking
aws dynamodb create-table \
  --table-name terraform-state-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

Step 2: Provider Configuration

Create main.tf:

# Configure the AWS Provider
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Backend configuration for state storage
  backend "s3" {
    bucket         = "your-terraform-state-bucket-unique-name"
    key            = "caption-generator/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

# Configure the AWS Provider
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = "caption-generator"
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }
}

# Data sources for availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Module declarations
module "networking" {
  source = "./modules/networking"

  project_name        = var.project_name
  environment         = var.environment
  vpc_cidr           = var.vpc_cidr
  availability_zones = slice(data.aws_availability_zones.available.names, 0, 3)
}

module "s3" {
  source = "./modules/s3"

  project_name = var.project_name
  environment  = var.environment
}

module "rds" {
  source = "./modules/rds"

  project_name    = var.project_name
  environment     = var.environment
  vpc_id          = module.networking.vpc_id
  private_subnets = module.networking.private_subnet_ids

  depends_on = [module.networking]
}

module "elasticache" {
  source = "./modules/elasticache"

  project_name    = var.project_name
  environment     = var.environment
  vpc_id          = module.networking.vpc_id
  private_subnets = module.networking.private_subnet_ids

  depends_on = [module.networking]
}

Create variables.tf:

variable "aws_region" {
  description = "AWS region for resources"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Name of the project"
  type        = string
  default     = "caption-generator"
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "domain_name" {
  description = "Domain name for the application"
  type        = string
  default     = ""
}

Create terraform.tfvars:

aws_region   = "us-east-1"
project_name = "caption-generator"
environment  = "dev"
vpc_cidr     = "10.0.0.0/16"
domain_name  = "your-domain.com"  # Optional

Step 3: Networking Module (VPC, Subnets, Security Groups)

Create modules/networking/main.tf:

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-${var.environment}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-${var.environment}-igw"
  }
}

# Public Subnets (for Load Balancer)
resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
    Type = "public"
  }
}

# Private Subnets (for ECS tasks and databases)
resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
    Type = "private"
  }
}

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count = length(var.availability_zones)

  domain = "vpc"

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# NAT Gateways (for private subnet internet access)
resource "aws_nat_gateway" "main" {
  count = length(var.availability_zones)

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-public-rt"
  }
}

resource "aws_route_table" "private" {
  count = length(var.availability_zones)

  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
  }
}

# Route Table Associations
resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count = length(aws_subnet.private)

  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# Security Groups
resource "aws_security_group" "alb" {
  name_prefix = "${var.project_name}-${var.environment}-alb-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-alb-sg"
  }
}

resource "aws_security_group" "ecs_tasks" {
  name_prefix = "${var.project_name}-${var.environment}-ecs-tasks-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 3000
    to_port         = 3000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  ingress {
    from_port       = 5000
    to_port         = 5000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-ecs-tasks-sg"
  }
}

Create modules/networking/variables.tf:

variable "project_name" {
  description = "Name of the project"
  type        = string
}

variable "environment" {
  description = "Environment name"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
}

Create modules/networking/outputs.tf:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of the private subnets"
  value       = aws_subnet.private[*].id
}

output "alb_security_group_id" {
  description = "ID of the ALB security group"
  value       = aws_security_group.alb.id
}

output "ecs_security_group_id" {
  description = "ID of the ECS security group"
  value       = aws_security_group.ecs_tasks.id
}

Step 4: S3 Module for File Storage

Create modules/s3/main.tf:

# S3 bucket for file uploads (temporary storage)
resource "aws_s3_bucket" "uploads" {
  bucket = "${var.project_name}-${var.environment}-uploads-${random_id.bucket_suffix.hex}"
}

# Random suffix to ensure bucket name uniqueness
resource "random_id" "bucket_suffix" {
  byte_length = 4
}

# S3 bucket versioning
resource "aws_s3_bucket_versioning" "uploads" {
  bucket = aws_s3_bucket.uploads.id
  versioning_configuration {
    status = "Enabled"
  }
}

# S3 bucket encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

# S3 bucket public access block (security)
resource "aws_s3_bucket_public_access_block" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# S3 lifecycle configuration for automatic cleanup
resource "aws_s3_bucket_lifecycle_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  rule {
    id     = "cleanup_temp_files"
    status = "Enabled"

    # Delete temporary upload files after 1 day
    expiration {
      days = 1
    }

    # Delete incomplete multipart uploads after 1 day
    abort_incomplete_multipart_upload {
      days_after_initiation = 1
    }
  }
}

# CORS configuration for frontend uploads
resource "aws_s3_bucket_cors_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  cors_rule {
    allowed_headers = ["*"]
    allowed_methods = ["GET", "PUT", "POST", "DELETE"]
    allowed_origins = ["http://localhost:3000"] # Update for production
    expose_headers  = ["ETag"]
    max_age_seconds = 3000
  }
}

# IAM role for ECS tasks to access S3
resource "aws_iam_role" "ecs_s3_access" {
  name = "${var.project_name}-${var.environment}-ecs-s3-access"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

# IAM policy for S3 access
resource "aws_iam_policy" "s3_access" {
  name        = "${var.project_name}-${var.environment}-s3-access"
  description = "IAM policy for S3 access from ECS tasks"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ]
        Resource = [
          "${aws_s3_bucket.uploads.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "s3:ListBucket"
        ]
        Resource = [
          aws_s3_bucket.uploads.arn
        ]
      }
    ]
  })
}

# Attach policy to role
resource "aws_iam_role_policy_attachment" "ecs_s3_access" {
  role       = aws_iam_role.ecs_s3_access.name
  policy_arn = aws_iam_policy.s3_access.arn
}

Create the corresponding variables and outputs files for the S3 module.

Security Implementation

File Upload Security for AI Applications

When handling file uploads for AI processing, security is paramount. Here’s how we implement it:

  1. File Size Limits: Prevent large files from overwhelming the system
  2. File Type Validation: Only allow specific image/video formats
  3. Virus Scanning: Integrate with AWS S3 virus scanning
  4. Temporary Storage: Auto-delete files after processing

Secrets Management

Store sensitive information securely:

# Store Google API key
aws ssm put-parameter \
  --name "/caption-generator/dev/google-api-key" \
  --value "your-api-key-here" \
  --type "SecureString" \
  --description "Google API key for Gemini"

# Store database password
aws ssm put-parameter \
  --name "/caption-generator/dev/db-password" \
  --value "$(openssl rand -base64 32)" \
  --type "SecureString" \
  --description "Database password"

Deploying Your Infrastructure

Step-by-Step Deployment

  1. Initialize Terraform:
cd caption-generator-infrastructure
terraform init
  1. Validate Configuration:
terraform validate
  1. Plan Deployment:
terraform plan -var="environment=dev"
  1. Apply Infrastructure:
terraform apply -var="environment=dev"

Estimated deployment time: 15-20 minutes Expected costs for dev environment: $50-100/month

What’s Next?

In Part 2, we’ll cover:

  • Complete ECS service setup with Docker containers
  • CI/CD pipeline with GitHub Actions
  • Monitoring and logging implementation
  • Blue/Green deployment strategies
  • Production optimization techniques

This foundation provides a secure, scalable infrastructure for your AI application. The modular structure makes it easy to replicate across environments and scale as needed.


Next: Part 2 - Complete DevOps Pipeline for AI Applications →

Table of Contents