Building Enterprise NFS Shared Storage on AWS: A Complete Guide with EFS, EC2, and Terraform

Learn how to design and deploy a secure, scalable NFS shared storage solution on AWS using EFS and EC2, automated with Terraform for enterprise applications

Building Enterprise NFS Shared Storage on AWS: A Complete Guide with EFS, EC2, and Terraform

Table of Contents

Building Enterprise NFS Shared Storage on AWS: A Complete Guide with EFS, EC2, and Terraform

Introduction

The Challenge of Legacy Application Storage

In today’s cloud-first world, many organizations still operate legacy internal applications that require traditional Network File System (NFS) storage for seamless operation. These applications often depend on shared file systems for configuration files, logs, application data, and inter-service communication.

Common Challenges Include:

  • Legacy applications designed for on-premises NFS storage
  • Need for high availability across multiple data centers
  • Compliance requirements for data protection and backup
  • Manual infrastructure management leading to operational overhead
  • Scaling limitations with traditional storage solutions

The Solution: AWS EFS + EC2 NFS Architecture

This guide demonstrates how to build a robust, scalable NFS shared storage solution using:

  • AWS Elastic File System (EFS) for managed NFS storage
  • EC2-based NFS servers for legacy application compatibility
  • Terraform for infrastructure automation
  • IAM for fine-grained access control
  • AWS Backup for automated data protection
  • CloudWatch for comprehensive monitoring

Why This Architecture?

Enterprise Benefits:

  1. High Availability: Multi-AZ deployment ensures 99.9% uptime
  2. Scalability: Automatic scaling without capacity planning
  3. Security: IAM-based access controls and encryption
  4. Compliance: Automated backup and audit trails
  5. Cost-Effective: Pay-as-you-use model with lifecycle policies
  6. Automation: Infrastructure as Code reduces manual errors

Real-World Use Cases

Scenario 1: Financial services company with compliance applications requiring shared configuration storage across multiple servers.

Scenario 2: Media company with legacy video processing applications needing shared storage for large media files.

Scenario 3: Manufacturing company with ERP systems requiring shared file storage for production data.

Architecture Overview

High-Level Architecture Components

Our enterprise NFS solution consists of several key components working together:

┌─────────────────────────────────────────────────────────────┐
│                    AWS Cloud Environment                   │
│                                                            │
│  ┌─────────────────┐           ┌─────────────────┐        │
│  │   Availability  │           │   Availability  │        │
│  │     Zone A      │           │     Zone B      │        │
│  │                 │           │                 │        │
│  │  ┌───────────┐  │           │  ┌───────────┐  │        │
│  │  │    EC2    │  │           │  │    EC2    │  │        │
│  │  │ NFS Server│  │           │  │ NFS Server│  │        │
│  │  └───────────┘  │           │  └───────────┘  │        │
│  │        │        │           │        │        │        │
│  └────────┼────────┘           └────────┼────────┘        │
│           │                             │                 │
│           └─────────────┬───────────────┘                 │
│                         │                                 │
│                  ┌─────────────┐                          │
│                  │   AWS EFS   │                          │
│                  │ File System │                          │
│                  └─────────────┘                          │
│                         │                                 │
│                  ┌─────────────┐                          │
│                  │ AWS Backup  │                          │
│                  │   Service   │                          │
│                  └─────────────┘                          │
└─────────────────────────────────────────────────────────────┘

Component Relationships

1. Amazon EFS (Elastic File System)

  • Acts as the central storage repository
  • Provides NFS v4.1 protocol support
  • Automatically scales storage capacity
  • Replicates data across multiple AZs

2. EC2 NFS Servers

  • Bridge between legacy applications and EFS
  • Provide traditional NFS interface
  • Handle protocol translation and caching
  • Deployed in multiple AZs for redundancy

3. IAM (Identity and Access Management)

  • Controls access to EFS resources
  • Manages EC2 instance permissions
  • Implements least-privilege security model

4. AWS Backup

  • Automates EFS backup scheduling
  • Manages retention policies
  • Provides point-in-time recovery

5. CloudWatch

  • Monitors system performance
  • Tracks storage metrics
  • Triggers automated responses to issues

Multi-AZ High Availability Design

The architecture spans multiple Availability Zones to ensure:

  • Zero single points of failure
  • Automatic failover capabilities
  • Disaster recovery across data centers
  • Load distribution for performance

AWS EFS Fundamentals

Understanding Elastic File System

Amazon EFS is a fully managed, scalable file storage service designed for use with AWS services and on-premises resources. It provides a traditional file system interface and file system semantics.

Key EFS Characteristics

1. NFS Protocol Support

  • Compatible with NFSv4.1 and NFSv4.0
  • POSIX-compliant file system
  • Supports standard file operations (read, write, append)

2. Elastic Scaling

  • Automatically grows and shrinks as files are added/removed
  • No capacity planning required
  • Scales to petabytes of storage

3. Multi-AZ Durability

  • Data stored redundantly across multiple AZs
  • 99.999999999% (11 nines) durability
  • Regional availability and resilience

EFS vs Other AWS Storage Options

FeatureEFSEBSS3
ProtocolNFSBlockREST API
AccessMultiple instancesSingle instanceInternet
ScalingAutomaticManualAutomatic
Use CaseShared file storageBoot volumesObject storage
DurabilityMulti-AZSingle AZMulti-AZ

EFS Performance Modes

AWS EFS offers two performance modes to optimize for different workloads:

1. General Purpose Mode

  • Best for: Most use cases
  • Latency: Lowest latency per operation
  • Throughput: Up to 7,000 file operations per second
  • Ideal for: Web serving, content management

2. Max I/O Mode

  • Best for: High-performance applications
  • Latency: Slightly higher latency
  • Throughput: Higher levels of aggregate throughput
  • Ideal for: Big data analytics, media processing

EFS Throughput Options

1. Bursting Throughput (Default)

  • Throughput scales with file system size
  • Baseline: 50 MB/s per TB stored
  • Burst: Up to 100 MB/s for file systems < 1 TB

2. Provisioned Throughput

  • Independent of storage size
  • Pay for provisioned throughput
  • Consistent performance regardless of file system size

Network File System (NFS) Protocol Basics

Understanding NFS is crucial for implementing our solution:

NFS Operation Flow:

  1. Mount Request: Client requests to mount file system
  2. Authentication: Server validates client credentials
  3. File Operations: Read, write, create, delete operations
  4. Caching: Local caching for performance optimization

NFS Benefits for Enterprise:

  • Transparent Access: Applications see standard file system
  • Centralized Management: Single point of storage administration
  • Shared Access: Multiple clients can access same files
  • Consistency: File locking ensures data integrity

Infrastructure as Code with Terraform

Terraform Configuration Overview

Our Terraform implementation automates the entire infrastructure deployment, ensuring consistency, repeatability, and version control of our NFS storage solution.

Project Structure

terraform-efs-nfs/
├── main.tf              # Main configuration
├── variables.tf         # Input variables
├── outputs.tf          # Output values
├── data.tf             # Data sources
├── vpc.tf              # VPC configuration
├── security-groups.tf  # Security group rules
├── efs.tf              # EFS configuration
├── ec2.tf              # EC2 instances
├── iam.tf              # IAM roles and policies
├── backup.tf           # AWS Backup configuration
└── monitoring.tf       # CloudWatch setup

Main Terraform Configuration

main.tf

# Configure the AWS Provider
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      Environment = var.environment
      Project     = "enterprise-nfs-storage"
      ManagedBy   = "terraform"
    }
  }
}

# Data source for availability zones
data "aws_availability_zones" "available" {
  state = "available"
}

# Random ID for unique resource naming
resource "random_id" "suffix" {
  byte_length = 4
}

EFS Configuration

efs.tf

# EFS File System
resource "aws_efs_file_system" "enterprise_storage" {
  creation_token = "enterprise-nfs-${random_id.suffix.hex}"

  performance_mode = var.efs_performance_mode
  throughput_mode  = var.efs_throughput_mode

  # Provisioned throughput (if using provisioned mode)
  provisioned_throughput_in_mibps = var.efs_throughput_mode == "provisioned" ? var.efs_provisioned_throughput : null

  # Encryption at rest
  encrypted = true
  kms_key_id = aws_kms_key.efs_key.arn

  # Lifecycle policy
  lifecycle_policy {
    transition_to_ia = var.efs_ia_transition
  }

  lifecycle_policy {
    transition_to_primary_storage_class = "AFTER_1_ACCESS"
  }

  tags = {
    Name = "Enterprise-NFS-Storage"
    Type = "SharedStorage"
  }
}

# EFS Mount Targets for each AZ
resource "aws_efs_mount_target" "enterprise_storage" {
  count = length(var.availability_zones)

  file_system_id  = aws_efs_file_system.enterprise_storage.id
  subnet_id       = aws_subnet.private[count.index].id
  security_groups = [aws_security_group.efs.id]
}

# EFS Access Points
resource "aws_efs_access_point" "application_data" {
  file_system_id = aws_efs_file_system.enterprise_storage.id

  path = "/application-data"
  creation_info {
    owner_gid   = 1000
    owner_uid   = 1000
    permissions = "755"
  }

  posix_user {
    gid = 1000
    uid = 1000
  }

  tags = {
    Name = "Application-Data-Access-Point"
  }
}

# KMS Key for EFS encryption
resource "aws_kms_key" "efs_key" {
  description             = "KMS key for EFS encryption"
  deletion_window_in_days = 7

  tags = {
    Name = "EFS-Encryption-Key"
  }
}

resource "aws_kms_alias" "efs_key_alias" {
  name          = "alias/efs-enterprise-storage"
  target_key_id = aws_kms_key.efs_key.key_id
}

EC2 NFS Server Configuration

ec2.tf

# Launch Template for NFS Servers
resource "aws_launch_template" "nfs_server" {
  name_prefix   = "nfs-server-"
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = var.nfs_instance_type

  vpc_security_group_ids = [aws_security_group.nfs_server.id]

  iam_instance_profile {
    name = aws_iam_instance_profile.nfs_server.name
  }

  user_data = base64encode(templatefile("${path.module}/user-data/nfs-server.sh", {
    efs_id = aws_efs_file_system.enterprise_storage.id
    region = var.aws_region
  }))

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "NFS-Server"
      Role = "NFSServer"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Auto Scaling Group for NFS Servers
resource "aws_autoscaling_group" "nfs_servers" {
  name                = "nfs-servers-asg"
  vpc_zone_identifier = aws_subnet.private[*].id
  target_group_arns   = [aws_lb_target_group.nfs_servers.arn]
  health_check_type   = "ELB"
  health_check_grace_period = 300

  min_size         = var.nfs_min_instances
  max_size         = var.nfs_max_instances
  desired_capacity = var.nfs_desired_instances

  launch_template {
    id      = aws_launch_template.nfs_server.id
    version = "$Latest"
  }

  tag {
    key                 = "Name"
    value               = "NFS-Server-ASG"
    propagate_at_launch = false
  }
}

# Application Load Balancer for NFS Servers
resource "aws_lb" "nfs_servers" {
  name               = "nfs-servers-alb"
  internal           = true
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = aws_subnet.private[*].id

  enable_deletion_protection = false

  tags = {
    Name = "NFS-Servers-ALB"
  }
}

resource "aws_lb_target_group" "nfs_servers" {
  name     = "nfs-servers-tg"
  port     = 2049
  protocol = "TCP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher             = "200"
    path                = "/health"
    port                = "8080"
    protocol            = "HTTP"
    timeout             = 5
    unhealthy_threshold = 2
  }

  tags = {
    Name = "NFS-Servers-Target-Group"
  }
}

# Data source for latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

VPC and Network Configuration

vpc.tf

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = {
    Name = "Enterprise-NFS-VPC"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "Enterprise-NFS-IGW"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count = length(var.availability_zones)

  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidrs[count.index]
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true

  tags = {
    Name = "Public-Subnet-${count.index + 1}"
    Type = "Public"
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count = length(var.availability_zones)

  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidrs[count.index]
  availability_zone = var.availability_zones[count.index]

  tags = {
    Name = "Private-Subnet-${count.index + 1}"
    Type = "Private"
  }
}

# NAT Gateways
resource "aws_eip" "nat" {
  count = length(var.availability_zones)

  domain = "vpc"

  tags = {
    Name = "NAT-Gateway-EIP-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

resource "aws_nat_gateway" "main" {
  count = length(var.availability_zones)

  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = {
    Name = "NAT-Gateway-${count.index + 1}"
  }

  depends_on = [aws_internet_gateway.main]
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  tags = {
    Name = "Public-Route-Table"
  }
}

resource "aws_route_table" "private" {
  count = length(var.availability_zones)

  vpc_id = aws_vpc.main.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main[count.index].id
  }

  tags = {
    Name = "Private-Route-Table-${count.index + 1}"
  }
}

# Route Table Associations
resource "aws_route_table_association" "public" {
  count = length(var.availability_zones)

  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

resource "aws_route_table_association" "private" {
  count = length(var.availability_zones)

  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

Security Implementation

IAM-Based Access Controls

Security is paramount in enterprise environments. Our implementation uses multiple layers of security to protect data and control access.

IAM Roles and Policies

iam.tf

# IAM Role for EC2 NFS Servers
resource "aws_iam_role" "nfs_server" {
  name = "NFS-Server-Role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })

  tags = {
    Name = "NFS-Server-IAM-Role"
  }
}

# IAM Policy for EFS Access
resource "aws_iam_policy" "efs_access" {
  name        = "EFS-Access-Policy"
  description = "Policy for NFS servers to access EFS"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "elasticfilesystem:CreateFileSystem",
          "elasticfilesystem:DescribeFileSystems",
          "elasticfilesystem:DescribeMountTargets",
          "elasticfilesystem:DescribeAccessPoints"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "elasticfilesystem:ClientMount",
          "elasticfilesystem:ClientWrite",
          "elasticfilesystem:ClientRootAccess"
        ]
        Resource = aws_efs_file_system.enterprise_storage.arn
        Condition = {
          Bool = {
            "aws:SecureTransport" = "true"
          }
        }
      }
    ]
  })
}

# IAM Policy for CloudWatch Monitoring
resource "aws_iam_policy" "cloudwatch_access" {
  name        = "CloudWatch-Monitoring-Policy"
  description = "Policy for CloudWatch monitoring and logging"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "cloudwatch:PutMetricData",
          "cloudwatch:GetMetricStatistics",
          "cloudwatch:ListMetrics",
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents",
          "logs:DescribeLogStreams"
        ]
        Resource = "*"
      }
    ]
  })
}

# Attach policies to role
resource "aws_iam_role_policy_attachment" "nfs_server_efs" {
  role       = aws_iam_role.nfs_server.name
  policy_arn = aws_iam_policy.efs_access.arn
}

resource "aws_iam_role_policy_attachment" "nfs_server_cloudwatch" {
  role       = aws_iam_role.nfs_server.name
  policy_arn = aws_iam_policy.cloudwatch_access.arn
}

resource "aws_iam_role_policy_attachment" "nfs_server_ssm" {
  role       = aws_iam_role.nfs_server.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# Instance Profile for EC2
resource "aws_iam_instance_profile" "nfs_server" {
  name = "NFS-Server-Instance-Profile"
  role = aws_iam_role.nfs_server.name
}

EFS Access Points and File System Policies

EFS File System Policy provides additional access control:

# EFS File System Policy
resource "aws_efs_file_system_policy" "enterprise_storage" {
  file_system_id = aws_efs_file_system.enterprise_storage.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          AWS = "*"
        }
        Action = [
          "elasticfilesystem:ClientMount",
          "elasticfilesystem:ClientWrite"
        ]
        Resource = aws_efs_file_system.enterprise_storage.arn
        Condition = {
          Bool = {
            "aws:SecureTransport" = "true"
          }
          StringEquals = {
            "elasticfilesystem:AccessedViaMountTarget" = "true"
          }
        }
      }
    ]
  })
}

Network Security with Security Groups

security-groups.tf

# Security Group for EFS
resource "aws_security_group" "efs" {
  name_prefix = "efs-security-group"
  description = "Security group for EFS mount targets"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 2049
    to_port         = 2049
    protocol        = "tcp"
    security_groups = [aws_security_group.nfs_server.id]
    description     = "NFS traffic from NFS servers"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "All outbound traffic"
  }

  tags = {
    Name = "EFS-Security-Group"
  }
}

# Security Group for NFS Servers
resource "aws_security_group" "nfs_server" {
  name_prefix = "nfs-server-security-group"
  description = "Security group for NFS servers"
  vpc_id      = aws_vpc.main.id

  # NFS traffic
  ingress {
    from_port   = 2049
    to_port     = 2049
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "NFS traffic from VPC"
  }

  # SSH access (for maintenance)
  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "SSH access from VPC"
  }

  # Health check endpoint
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "Health check from ALB"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "All outbound traffic"
  }

  tags = {
    Name = "NFS-Server-Security-Group"
  }
}

# Security Group for Application Load Balancer
resource "aws_security_group" "alb" {
  name_prefix = "alb-security-group"
  description = "Security group for NFS ALB"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "HTTP traffic from VPC"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "All outbound traffic"
  }

  tags = {
    Name = "ALB-Security-Group"
  }
}

Encryption Implementation

Our solution implements encryption at multiple levels:

1. Encryption at Rest

  • EFS uses AWS KMS for encryption
  • Custom KMS key with rotation enabled
  • Encrypted EBS volumes for EC2 instances

2. Encryption in Transit

  • TLS encryption for EFS communication
  • Stunnel for legacy NFS clients
  • VPC endpoints for AWS service communication

Backup and Disaster Recovery

AWS Backup Service Integration

backup.tf

# AWS Backup Vault
resource "aws_backup_vault" "enterprise_storage" {
  name        = "enterprise-nfs-backup-vault"
  kms_key_arn = aws_kms_key.backup_key.arn

  tags = {
    Name = "Enterprise-NFS-Backup-Vault"
  }
}

# KMS Key for Backup Encryption
resource "aws_kms_key" "backup_key" {
  description             = "KMS key for backup encryption"
  deletion_window_in_days = 7

  tags = {
    Name = "Backup-Encryption-Key"
  }
}

resource "aws_kms_alias" "backup_key_alias" {
  name          = "alias/backup-enterprise-storage"
  target_key_id = aws_kms_key.backup_key.key_id
}

# Backup Plan
resource "aws_backup_plan" "enterprise_storage" {
  name = "enterprise-nfs-backup-plan"

  rule {
    rule_name         = "daily_backup"
    target_vault_name = aws_backup_vault.enterprise_storage.name
    schedule          = "cron(0 2 * * ? *)"  # Daily at 2 AM

    start_window = 60    # 1 hour
    completion_window = 300  # 5 hours

    lifecycle {
      cold_storage_after = 30
      delete_after       = 365
    }

    recovery_point_tags = {
      BackupType = "Automated"
      Frequency  = "Daily"
    }
  }

  rule {
    rule_name         = "weekly_backup"
    target_vault_name = aws_backup_vault.enterprise_storage.name
    schedule          = "cron(0 3 ? * SUN *)"  # Weekly on Sunday at 3 AM

    lifecycle {
      cold_storage_after = 30
      delete_after       = 2555  # 7 years
    }

    recovery_point_tags = {
      BackupType = "Automated"
      Frequency  = "Weekly"
    }
  }

  tags = {
    Name = "Enterprise-NFS-Backup-Plan"
  }
}

# IAM Role for AWS Backup
resource "aws_iam_role" "backup_role" {
  name = "AWS-Backup-Service-Role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "backup.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "backup_policy" {
  role       = aws_iam_role.backup_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForBackup"
}

# Backup Selection
resource "aws_backup_selection" "enterprise_storage" {
  iam_role_arn = aws_iam_role.backup_role.arn
  name         = "enterprise-nfs-backup-selection"
  plan_id      = aws_backup_plan.enterprise_storage.id

  resources = [
    aws_efs_file_system.enterprise_storage.arn
  ]

  condition {
    string_equals {
      key   = "aws:ResourceTag/BackupEnabled"
      value = "true"
    }
  }
}

Cross-Region Backup Strategy

For disaster recovery, implement cross-region backup:

# Cross-region backup vault
resource "aws_backup_vault" "dr_vault" {
  provider    = aws.dr_region
  name        = "enterprise-nfs-dr-vault"
  kms_key_arn = aws_kms_key.dr_backup_key.arn
}

# Copy action for cross-region backup
resource "aws_backup_plan" "enterprise_storage_with_copy" {
  name = "enterprise-nfs-backup-plan-with-dr"

  rule {
    rule_name         = "daily_backup_with_copy"
    target_vault_name = aws_backup_vault.enterprise_storage.name
    schedule          = "cron(0 2 * * ? *)"

    copy_action {
      destination_vault_arn = aws_backup_vault.dr_vault.arn

      lifecycle {
        cold_storage_after = 30
        delete_after       = 365
      }
    }
  }
}

Recovery Testing Procedures

Automated Recovery Testing:

#!/bin/bash
# recovery-test.sh

# Create test EFS from backup
BACKUP_ARN="arn:aws:backup:us-east-1:123456789012:recovery-point:..."
TEST_EFS_NAME="test-recovery-$(date +%Y%m%d)"

# Restore from backup
aws backup start-restore-job \
  --recovery_point_arn $BACKUP_ARN \
  --metadata file-system-id=$TEST_EFS_NAME \
  --iam_role_arn arn:aws:iam::123456789012:role/AWS-Backup-Service-Role

# Validate restored data
echo "Recovery test initiated for $TEST_EFS_NAME"

Monitoring and Observability

CloudWatch Metrics and Alarms

monitoring.tf

# CloudWatch Log Group for NFS Servers
resource "aws_cloudwatch_log_group" "nfs_servers" {
  name              = "/aws/ec2/nfs-servers"
  retention_in_days = 30

  tags = {
    Name = "NFS-Servers-Log-Group"
  }
}

# CloudWatch Dashboard
resource "aws_cloudwatch_dashboard" "nfs_monitoring" {
  dashboard_name = "Enterprise-NFS-Storage-Dashboard"

  dashboard_body = jsonencode({
    widgets = [
      {
        type   = "metric"
        x      = 0
        y      = 0
        width  = 12
        height = 6

        properties = {
          metrics = [
            ["AWS/EFS", "DataReadIOBytes", "FileSystemId", aws_efs_file_system.enterprise_storage.id],
            [".", "DataWriteIOBytes", ".", "."],
            [".", "MetadataIOBytes", ".", "."]
          ]
          view    = "timeSeries"
          stacked = false
          region  = var.aws_region
          title   = "EFS I/O Operations"
          period  = 300
        }
      },
      {
        type   = "metric"
        x      = 0
        y      = 6
        width  = 12
        height = 6

        properties = {
          metrics = [
            ["AWS/EFS", "StorageBytes", "FileSystemId", aws_efs_file_system.enterprise_storage.id, "StorageClass", "Standard"],
            ["...", "InfrequentAccess"]
          ]
          view   = "timeSeries"
          region = var.aws_region
          title  = "EFS Storage Usage"
          period = 300
        }
      }
    ]
  })
}

# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "efs_high_io" {
  alarm_name          = "EFS-High-IO-Operations"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "TotalIOTime"
  namespace           = "AWS/EFS"
  period              = "300"
  statistic           = "Average"
  threshold           = "80"
  alarm_description   = "This metric monitors EFS I/O operations"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    FileSystemId = aws_efs_file_system.enterprise_storage.id
  }

  tags = {
    Name = "EFS-High-IO-Alarm"
  }
}

resource "aws_cloudwatch_metric_alarm" "nfs_server_unhealthy" {
  alarm_name          = "NFS-Server-Unhealthy-Targets"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = "2"
  metric_name         = "HealthyHostCount"
  namespace           = "AWS/ApplicationELB"
  period              = "60"
  statistic           = "Average"
  threshold           = "1"
  alarm_description   = "Alert when NFS servers are unhealthy"
  alarm_actions       = [aws_sns_topic.alerts.arn]

  dimensions = {
    TargetGroup  = aws_lb_target_group.nfs_servers.arn_suffix
    LoadBalancer = aws_lb.nfs_servers.arn_suffix
  }
}

# SNS Topic for Alerts
resource "aws_sns_topic" "alerts" {
  name = "enterprise-nfs-alerts"

  tags = {
    Name = "Enterprise-NFS-Alerts"
  }
}

resource "aws_sns_topic_subscription" "email_alerts" {
  topic_arn = aws_sns_topic.alerts.arn
  protocol  = "email"
  endpoint  = var.alert_email
}

Performance Monitoring

Custom CloudWatch Metrics:

#!/bin/bash
# custom-metrics.sh - Deployed on NFS servers

INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)

# Monitor NFS connection count
NFS_CONNECTIONS=$(netstat -an | grep ':2049' | grep ESTABLISHED | wc -l)
aws cloudwatch put-metric-data \
  --region $REGION \
  --namespace "Custom/NFS" \
  --metric-data MetricName=ActiveConnections,Value=$NFS_CONNECTIONS,Unit=Count,Dimensions=InstanceId=$INSTANCE_ID

# Monitor EFS mount health
EFS_MOUNT_STATUS=$(mount | grep "type nfs4" | wc -l)
aws cloudwatch put-metric-data \
  --region $REGION \
  --namespace "Custom/NFS" \
  --metric-data MetricName=EFSMountStatus,Value=$EFS_MOUNT_STATUS,Unit=Count,Dimensions=InstanceId=$INSTANCE_ID

Best Practices & Production Considerations

Cost Optimization Strategies

1. EFS Storage Classes

  • Use Standard-IA for infrequently accessed data
  • Configure lifecycle policies for automatic transitions
  • Monitor access patterns with EFS Intelligent Tiering

2. Instance Right-Sizing

  • Use t3.medium or t3.large for NFS servers
  • Implement Auto Scaling based on CPU and network metrics
  • Use Spot Instances for non-critical workloads

3. Data Transfer Optimization

  • Place EFS and EC2 in same Availability Zone when possible
  • Use Regional optimization for cross-AZ access
  • Implement caching strategies at application level

Performance Tuning

NFS Server Configuration:

# /etc/nfs.conf optimization
[nfsd]
threads=32
grace-time=10
lease-time=10

[exportfs]
cache-time=300

# Kernel parameters
echo 'net.core.rmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_default = 262144' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf

Scalability Considerations

Horizontal Scaling:

  • Add more NFS server instances for increased throughput
  • Distribute clients across multiple mount targets
  • Use connection pooling for client applications

Vertical Scaling:

  • Upgrade to enhanced networking instance types
  • Use Provisioned Throughput mode for EFS
  • Implement SR-IOV for better network performance

Maintenance Procedures

Regular Maintenance Tasks:

#!/bin/bash
# maintenance.sh

# Update NFS server packages
yum update -y nfs-utils

# Check EFS mount status
if ! mountpoint -q /mnt/efs; then
    mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576 \
        $EFS_ID.efs.$REGION.amazonaws.com:/ /mnt/efs
fi

# Clean up old log files
find /var/log -name "*.log" -mtime +30 -delete

# Restart NFS services if needed
systemctl status nfs-server || systemctl restart nfs-server

Conclusion

Summary of Benefits

Our Enterprise NFS Shared Storage solution on AWS provides:

✅ High Availability: Multi-AZ deployment with automatic failover
✅ Scalability: Elastic storage that grows with your needs
✅ Security: IAM-based controls with encryption at rest and in transit
✅ Automation: Infrastructure as Code with Terraform
✅ Reliability: Automated backups with cross-region disaster recovery
✅ Monitoring: Comprehensive observability with CloudWatch
✅ Cost-Effective: Pay-as-you-use with intelligent tiering

Real-World Impact

This solution has proven effective for:

  • Legacy application modernization without code changes
  • Compliance requirements with automated audit trails
  • Operational efficiency through reduced manual intervention
  • Business continuity with robust disaster recovery

Future Enhancements

Potential Improvements:

  1. Multi-Region Active-Active Setup: Deploy across multiple AWS regions
  2. Container Integration: Add Kubernetes storage classes for EFS
  3. Advanced Monitoring: Implement distributed tracing for performance analysis
  4. AI/ML Integration: Use AWS services for predictive capacity planning

Getting Started

Ready to implement this solution? Start with:

  1. Clone the Terraform code from the examples above
  2. Customize variables for your environment
  3. Deploy incrementally starting with VPC and EFS
  4. Test thoroughly in a development environment
  5. Implement monitoring before going to production

Remember: Enterprise storage is critical infrastructure. Always test disaster recovery procedures and maintain comprehensive documentation.


Have questions about implementing enterprise NFS storage on AWS? Connect with me on LinkedIn or check out my other AWS guides for more cloud infrastructure insights.

Table of Contents