Background and Context
I am working on a Terraform project that has an end goal of an EKS cluster with the following properties:
- Private to the outside internet
- Accessible via a bastion host
- Uses worker groups
- Resources (deployments, cron jobs, etc) configurable via the Terraform Kubernetes module
To accomplish this, I've modified the Terraform EKS example slightly (code at bottom of the question). The problems that I am encountering is that after SSH-ing into the bastion, I cannot ping the cluster and any commands like kubectl get pods
timeout after about 60 seconds.
Here are the facts/things I know to be true:
- I have (for the time being) switched the cluster to a public cluster for testing purposes. Previously when I had
cluster_endpoint_public_access
set tofalse
theterraform apply
command would not even complete as it could not access the/healthz
endpoint on the cluster. - The Bastion configuration works in the sense that the user data runs successfully and installs
kubectl
and the kubeconfig file - I am able to SSH into the bastion via my static IP (that's the
var.company_vpn_ips
in the code) - It's entirely possible this is fully a networking problem and not an EKS/Terraform problem as my understanding of how the VPC and its security groups fit into this picture is not entirely mature.
Code
Here is the VPC configuration:
locals {
vpc_name = "my-vpc"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidr = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
# The definition of the VPC to create
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.2.0"
name = local.vpc_name
cidr = local.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = local.private_subnet_cidr
public_subnets = local.public_subnet_cidr
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
data "aws_availability_zones" "available" {}
Then the security groups I create for the cluster:
resource "aws_security_group" "ssh_sg" {
name_prefix = "ssh-sg"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
]
}
}
resource "aws_security_group" "all_worker_mgmt" {
name_prefix = "all_worker_management"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
}
Here's the cluster configuration:
locals {
cluster_version = "1.21"
}
# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
source = "terraform-aws-modules/eks/aws"
# The name of the cluster to create
cluster_name = var.cluster_name
# Disable public access to the cluster API endpoint
cluster_endpoint_public_access = true
# Enable private access to the cluster API endpoint
cluster_endpoint_private_access = true
# The version of the cluster to create
cluster_version = local.cluster_version
# The VPC ID to create the cluster in
vpc_id = var.vpc_id
# The subnets to add the cluster to
subnets = var.private_subnets
# Default information on the workers
workers_group_defaults = {
root_volume_type = "gp2"
}
worker_additional_security_group_ids = [var.all_worker_mgmt_id]
# Specify the worker groups
worker_groups = [
{
# The name of this worker group
name = "default-workers"
# The instance type for this worker group
instance_type = var.eks_worker_instance_type
# The number of instances to raise up
asg_desired_capacity = var.eks_num_workers
asg_max_size = var.eks_num_workers
asg_min_size = var.eks_num_workers
# The security group IDs for these instances
additional_security_group_ids = [var.ssh_sg_id]
}
]
}
data "aws_eks_cluster" "cluster" {
name = module.eks_cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks_cluster.cluster_id
}
output "worker_iam_role_name" {
value = module.eks_cluster.worker_iam_role_name
}
And the finally the bastion:
locals {
ami = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
instance_type = "t3.small"
key_name = "bastion-kp"
}
resource "aws_iam_instance_profile" "bastion" {
name = "bastion"
role = var.role_name
}
resource "aws_instance" "bastion" {
ami = local.ami
instance_type = local.instance_type
key_name = local.key_name
associate_public_ip_address = true
subnet_id = var.public_subnet
iam_instance_profile = aws_iam_instance_profile.bastion.name
security_groups = [aws_security_group.bastion-sg.id]
tags = {
Name = "K8s Bastion"
}
lifecycle {
ignore_changes = all
}
user_data = <<EOF
#! /bin/bash
# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
# Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm version
# Install AWS
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws --version
# Install aws-iam-authenticator
curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
aws-iam-authenticator help
# Add the kube config file
mkdir ~/.kube
echo "${var.kubectl_config}" >> ~/.kube/config
EOF
}
resource "aws_security_group" "bastion-sg" {
name = "bastion-sg"
vpc_id = var.vpc_id
}
resource "aws_security_group_rule" "sg-rule-ssh" {
security_group_id = aws_security_group.bastion-sg.id
from_port = 22
protocol = "tcp"
to_port = 22
type = "ingress"
cidr_blocks = var.company_vpn_ips
depends_on = [aws_security_group.bastion-sg]
}
resource "aws_security_group_rule" "sg-rule-egress" {
security_group_id = aws_security_group.bastion-sg.id
type = "egress"
from_port = 0
protocol = "all"
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
depends_on = [aws_security_group.bastion-sg]
}
Ask
The most pressing issue for me is finding a way to interact with the cluster via the bastion so that the other part of the Terraform code can run (the resources to spin up in the cluster itself). I am also hoping to understand how to setup a private cluster when it ends up being inaccessible to the terraform apply
command. Thank you in advance for any help you can provide!