Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Learn how to deploy a complete AI-powered caption generator application on AWS using modern infrastructure practices. Part 1 covers architecture design, AWS service selection, and Terraform infrastructure setup.

Cloud Computing ML DevOps

February 18, 2025

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Share This Post

Twitter LinkedIn Copy Link

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Building and deploying AI applications in the cloud requires careful planning, especially when dealing with file uploads, AI processing, and scalable architecture. In this comprehensive two-part series, I’ll walk you through deploying a real-world AI application - a caption generator that processes images and videos using Google’s Gemini AI and OpenAI Whisper.

What We’re Building

Our application consists of:

Backend: Python Flask API that processes images/videos and generates captions using AI
Frontend: Next.js React application with drag-and-drop file upload
AI Processing: Google Gemini for image analysis, OpenAI Whisper for video transcription

Application Architecture Overview

The frontend allows users to upload images or videos, customize caption tone and length, and receive AI-generated captions. The backend handles file processing, AI integration, and response formatting.

Why AWS for AI Applications?

AWS provides several advantages for AI workloads:

Scalable Compute: ECS Fargate auto-scales based on demand
Global Reach: CloudFront reduces latency for file uploads
AI Integration: Easy integration with various AI services
Cost Optimization: Pay-per-use pricing for variable workloads
Security: Comprehensive security services and compliance

AWS Architecture Design

High-Level Architecture

Our production architecture leverages multiple AWS services:

AWS Infrastructure Architecture

Service Selection Rationale

1. Amazon ECS with Fargate vs. EKS

Why ECS Fargate:

Serverless: No EC2 instances to manage
Cost-Effective: Pay only for running tasks
Simpler: Less operational overhead than Kubernetes
Quick Setup: Faster deployment for smaller teams

Cost Comparison:

ECS Fargate: ~$0.04048 per vCPU per hour + $0.004445 per GB per hour
EKS: $0.10 per hour for control plane + EC2 costs
Verdict: ECS saves ~60% for our use case

2. Aurora Serverless v2 vs. RDS

Why Aurora Serverless v2:

Auto-scaling: Scales from 0.5 to 128 ACUs automatically
Cost-Effective: Pay only for actual usage
High Availability: Multi-AZ by default
Performance: Better for variable AI workloads

Cost Example:

Aurora Serverless v2: $0.12 per ACU-hour (minimum 0.5 ACU = $43.2/month)
RDS db.t3.micro: $0.017 per hour = $12.24/month
Trade-off: Higher minimum cost but better scalability

3. Application Load Balancer vs. Network Load Balancer

Why ALB:

Layer 7 Routing: Path-based routing (/api/* → backend)
SSL Termination: Handles certificates automatically
Health Checks: Application-level health monitoring
WebSocket Support: For real-time features

Cost Breakdown Analysis

Monthly Estimates (us-east-1):

Service	Configuration	Monthly Cost
ECS Fargate (Frontend)	2-5 tasks, 0.25 vCPU, 0.5 GB	$15-40
ECS Fargate (Backend)	2-10 tasks, 1 vCPU, 2 GB	$60-300
Aurora Serverless v2	0.5-4 ACUs average	$45-350
ElastiCache Redis	cache.t3.micro	$15
Application Load Balancer	Standard usage	$18
CloudFront	1TB transfer	$85
S3 Storage	100GB + requests	$5
Total Estimated	Low-Medium Traffic	$243-813/month

Infrastructure as Code with Terraform

Why Terraform?

Terdraform provies:

Version Control: Infrastructure changes tracked in Git
Reproducibility: Identical environments across dev/staging/prod
State Management: Tracks resource dependencies
Multi-Cloud: Can extend to other providers if needed

Project Structure Setup

First, let’s create our Terraform project structure:

# Create the project directory
mkdir caption-generator-infrastructure
cd caption-generator-infrastructure

# Create the directory structure
mkdir -p {modules/{networking,ecs,rds,elasticache,s3,cloudfront},environments/{dev,staging,prod}}

# Create main files
touch {main.tf,variables.tf,outputs.tf,terraform.tfvars}

Your structure should look like:

Terraform Project Structure

Step 1: AWS Setup and Prerequisites

Before we start with Terraform, ensure you have:

AWS CLI configured:

aws configure
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Default region: us-east-1
# Default output format: json

Terraform installed:

# On macOS
brew install terraform

# On Ubuntu/Debian
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

# Verify installation
terraform --version

Create S3 bucket for Terraform state (one-time setup):

# Create bucket for state storage
aws s3 mb s3://your-terraform-state-bucket-unique-name --region us-east-1

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket your-terraform-state-bucket-unique-name \
  --versioning-configuration Status=Enabled

# Create DynamoDB table for state locking
aws dynamodb create-table \
  --table-name terraform-state-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

Step 2: Provider Configuration

Create main.tf:

# Configure the AWS Provider
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  # Backend configuration for state storage
  backend "s3" {
    bucket         = "your-terraform-state-bucket-unique-name"
    key            = "caption-generator/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

# Configure the AWS Provider
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Project     = "caption-generator"
      Environment = var.environment
      ManagedBy   = "terraform"
    }
  }
}

# Data sources for availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Module declarations
module "networking" {
  source = "./modules/networking"

  project_name        = var.project_name
  environment         = var.environment
  vpc_cidr           = var.vpc_cidr
  availability_zones = slice(data.aws_availability_zones.available.names, 0, 3)
}

module "s3" {
  source = "./modules/s3"

  project_name = var.project_name
  environment  = var.environment
}

module "rds" {
  source = "./modules/rds"

  project_name    = var.project_name
  environment     = var.environment
  vpc_id          = module.networking.vpc_id
  private_subnets = module.networking.private_subnet_ids

  depends_on = [module.networking]
}

module "elasticache" {
  source = "./modules/elasticache"

  project_name    = var.project_name
  environment     = var.environment
  vpc_id          = module.networking.vpc_id
  private_subnets = module.networking.private_subnet_ids

  depends_on = [module.networking]
}

Create variables.tf:

variable "aws_region" {
  description = "AWS region for resources"
  type        = string
  default     = "us-east-1"
}

variable "project_name" {
  description = "Name of the project"
  type        = string
  default     = "caption-generator"
}

variable "environment" {
  description = "Environment name (dev, staging, prod)"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "domain_name" {
  description = "Domain name for the application"
  type        = string
  default     = ""
}

Create terraform.tfvars:

aws_region   = "us-east-1"
project_name = "caption-generator"
environment  = "dev"
vpc_cidr     = "10.0.0.0/16"
domain_name  = "your-domain.com"  # Optional

Step 3: Networking Module (VPC, Subnets, Security Groups)

Create modules/networking/main.tf:

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "${var.project_name}-${var.environment}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "${var.project_name}-${var.environment}-igw"
  }
}

# Public Subnets (for Load Balancer)
resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
    Type = "public"
  }
}

# Private Subnets (for ECS tasks and databases)
resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
    Type = "private"
  }
}

# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
  count = length(var.availability_zones)

  domain = "vpc"

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-eip-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# NAT Gateways (for private subnet internet access)
resource "aws_nat_gateway" "main" {
  count = length(var.availability_zones)

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-public-rt"
  }
}

resource "aws_route_table" "private" {
  count = length(var.availability_zones)

  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
  }
}

# Route Table Associations
resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count = length(aws_subnet.private)

  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# Security Groups
resource "aws_security_group" "alb" {
  name_prefix = "${var.project_name}-${var.environment}-alb-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-alb-sg"
  }
}

resource "aws_security_group" "ecs_tasks" {
  name_prefix = "${var.project_name}-${var.environment}-ecs-tasks-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 3000
    to_port         = 3000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  ingress {
    from_port       = 5000
    to_port         = 5000
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "${var.project_name}-${var.environment}-ecs-tasks-sg"
  }
}

Create modules/networking/variables.tf:

variable "project_name" {
  description = "Name of the project"
  type        = string
}

variable "environment" {
  description = "Environment name"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
}

Create modules/networking/outputs.tf:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

output "public_subnet_ids" {
  description = "IDs of the public subnets"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of the private subnets"
  value       = aws_subnet.private[*].id
}

output "alb_security_group_id" {
  description = "ID of the ALB security group"
  value       = aws_security_group.alb.id
}

output "ecs_security_group_id" {
  description = "ID of the ECS security group"
  value       = aws_security_group.ecs_tasks.id
}

Step 4: S3 Module for File Storage

Create modules/s3/main.tf:

# S3 bucket for file uploads (temporary storage)
resource "aws_s3_bucket" "uploads" {
  bucket = "${var.project_name}-${var.environment}-uploads-${random_id.bucket_suffix.hex}"
}

# Random suffix to ensure bucket name uniqueness
resource "random_id" "bucket_suffix" {
  byte_length = 4
}

# S3 bucket versioning
resource "aws_s3_bucket_versioning" "uploads" {
  bucket = aws_s3_bucket.uploads.id
  versioning_configuration {
    status = "Enabled"
  }
}

# S3 bucket encryption
resource "aws_s3_bucket_server_side_encryption_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

# S3 bucket public access block (security)
resource "aws_s3_bucket_public_access_block" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# S3 lifecycle configuration for automatic cleanup
resource "aws_s3_bucket_lifecycle_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  rule {
    id     = "cleanup_temp_files"
    status = "Enabled"

    # Delete temporary upload files after 1 day
    expiration {
      days = 1
    }

    # Delete incomplete multipart uploads after 1 day
    abort_incomplete_multipart_upload {
      days_after_initiation = 1
    }
  }
}

# CORS configuration for frontend uploads
resource "aws_s3_bucket_cors_configuration" "uploads" {
  bucket = aws_s3_bucket.uploads.id

  cors_rule {
    allowed_headers = ["*"]
    allowed_methods = ["GET", "PUT", "POST", "DELETE"]
    allowed_origins = ["http://localhost:3000"] # Update for production
    expose_headers  = ["ETag"]
    max_age_seconds = 3000
  }
}

# IAM role for ECS tasks to access S3
resource "aws_iam_role" "ecs_s3_access" {
  name = "${var.project_name}-${var.environment}-ecs-s3-access"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}

# IAM policy for S3 access
resource "aws_iam_policy" "s3_access" {
  name        = "${var.project_name}-${var.environment}-s3-access"
  description = "IAM policy for S3 access from ECS tasks"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ]
        Resource = [
          "${aws_s3_bucket.uploads.arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "s3:ListBucket"
        ]
        Resource = [
          aws_s3_bucket.uploads.arn
        ]
      }
    ]
  })
}

# Attach policy to role
resource "aws_iam_role_policy_attachment" "ecs_s3_access" {
  role       = aws_iam_role.ecs_s3_access.name
  policy_arn = aws_iam_policy.s3_access.arn
}

Create the corresponding variables and outputs files for the S3 module.

Security Implementation

File Upload Security for AI Applications

When handling file uploads for AI processing, security is paramount. Here’s how we implement it:

File Size Limits: Prevent large files from overwhelming the system
File Type Validation: Only allow specific image/video formats
Virus Scanning: Integrate with AWS S3 virus scanning
Temporary Storage: Auto-delete files after processing

Secrets Management

Store sensitive information securely:

# Store Google API key
aws ssm put-parameter \
  --name "/caption-generator/dev/google-api-key" \
  --value "your-api-key-here" \
  --type "SecureString" \
  --description "Google API key for Gemini"

# Store database password
aws ssm put-parameter \
  --name "/caption-generator/dev/db-password" \
  --value "$(openssl rand -base64 32)" \
  --type "SecureString" \
  --description "Database password"

Deploying Your Infrastructure

Step-by-Step Deployment

Initialize Terraform:

cd caption-generator-infrastructure
terraform init

Validate Configuration:

terraform validate

Plan Deployment:

terraform plan -var="environment=dev"

Apply Infrastructure:

terraform apply -var="environment=dev"

Estimated deployment time: 15-20 minutes Expected costs for dev environment: $50-100/month

What’s Next?

In Part 2, we’ll cover:

Complete ECS service setup with Docker containers
CI/CD pipeline with GitHub Actions
Monitoring and logging implementation
Blue/Green deployment strategies
Production optimization techniques

This foundation provides a secure, scalable infrastructure for your AI application. The modular structure makes it easy to replicate across environments and scale as needed.

Next: Part 2 - Complete DevOps Pipeline for AI Applications →

Share This Post

Twitter LinkedIn Copy Link

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

Table of Contents

Share This Post

Deploying AI-Powered Applications on AWS: Architecture & Infrastructure Setup (Part 1)

What We’re Building

Application Architecture Overview

Why AWS for AI Applications?

AWS Architecture Design

High-Level Architecture

Service Selection Rationale

1. Amazon ECS with Fargate vs. EKS

2. Aurora Serverless v2 vs. RDS

3. Application Load Balancer vs. Network Load Balancer

Cost Breakdown Analysis

Infrastructure as Code with Terraform

Why Terraform?

Project Structure Setup

Step 1: AWS Setup and Prerequisites

Step 2: Provider Configuration

Step 3: Networking Module (VPC, Subnets, Security Groups)

Step 4: S3 Module for File Storage

Security Implementation

File Upload Security for AI Applications

Secrets Management

Deploying Your Infrastructure

Step-by-Step Deployment

What’s Next?

Table of Contents

Share This Post