Building a Hybrid LLM Platform on EKS, Part 4: Platform Add-ons, the Load Balancer Controller, and Karpenter
In Part 3 we added worker nodes: a CPU system pool for add-ons and the hybrid router, and a GPU pool for vLLM model servers. The cluster now has capacity. What it does not yet have is the infrastructure layer that sits above node groups and makes the cluster usable at production scale — a way to expose services to the internet, and a way to grow and shrink capacity automatically in response to workload demand.
Part 4 installs two add-ons that every serious EKS cluster needs. The first is the AWS Load Balancer Controller, which watches Kubernetes Ingress and Service objects and provisions real AWS Application Load Balancers (ALBs) and Network Load Balancers (NLBs) to back them. Without it, a Kubernetes Service type=LoadBalancer creates a Classic Load Balancer with no path routing — not what you want for a multi-service platform. The second is Karpenter, the node lifecycle controller that replaces Cluster Autoscaler for this cluster. Karpenter provisions and terminates EC2 instances in response to pending pods, not just scaling group percentages, and its NodePool model maps precisely to the "GPU pool scales to zero overnight" requirement we have been building toward.
Both add-ons use IRSA for AWS permissions. This is where the OIDC provider we created in Part 2 starts earning its keep — each add-on gets a purpose-scoped IAM role that no other pod on the cluster can assume.
A Fourth CDK Stack: Add-ons
We add a fourth stack that holds the IRSA roles and Helm chart installations for both add-ons. We keep this separate from the node group stack because the add-on lifecycle (version upgrades, config tuning) is independent of the node group lifecycle (instance type changes, AMI updates). A version bump to the load balancer controller should not touch anything that could trigger a node group replacement.
// lib/addons-stack.ts
import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as eks from "aws-cdk-lib/aws-eks";
import * as iam from "aws-cdk-lib/aws-iam";
import { config } from "./config";
interface AddonsStackProps extends cdk.StackProps {
cluster: eks.Cluster;
oidcProvider: iam.OpenIdConnectProvider;
}
export class AddonsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props: AddonsStackProps) {
super(scope, id, props);
const lbcRole = this.createLoadBalancerControllerRole(props.cluster, props.oidcProvider);
this.installLoadBalancerController(props.cluster, lbcRole);
const karpenterRole = this.createKarpenterRole(props.cluster, props.oidcProvider);
this.installKarpenter(props.cluster, karpenterRole);
this.createKarpenterNodePools(props.cluster);
}
// ─── AWS Load Balancer Controller ────────────────────────────────────────────
private createLoadBalancerControllerRole(
cluster: eks.Cluster,
oidcProvider: iam.OpenIdConnectProvider
): iam.Role {
const issuerHostPath = cluster.clusterOpenIdConnectIssuerUrl.replace("https://", "");
const role = new iam.Role(this, "LoadBalancerControllerRole", {
assumedBy: new iam.WebIdentityPrincipal(
oidcProvider.openIdConnectProviderArn,
{
StringEquals: {
[`${issuerHostPath}:sub`]:
"system:serviceaccount:kube-system:aws-load-balancer-controller",
[`${issuerHostPath}:aud`]: "sts.amazonaws.com",
},
}
),
description: "IRSA role for the AWS Load Balancer Controller",
});
// The controller needs a broad set of EC2 and ELBv2 permissions to
// provision ALBs, security groups, target groups, and listener rules.
// AWS publishes and maintains this policy; we reference it by name.
role.addManagedPolicy(
iam.ManagedPolicy.fromAwsManagedPolicyName(
"ElasticLoadBalancingFullAccess"
)
);
// Inline policy for the permissions the managed policy does not cover:
// EC2 tag/describe calls the controller uses for subnet discovery and
// security group management, plus WAF and Shield integration.
role.addToPolicy(
new iam.PolicyStatement({
actions: [
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeVpcs",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeSubnets",
"ec2:DescribeSecurityGroups",
"ec2:DescribeInstances",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeTags",
"ec2:GetCoipPoolUsage",
"ec2:DescribeCoipPools",
"ec2:CreateSecurityGroup",
"ec2:CreateTags",
"ec2:DeleteTags",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress",
"ec2:DeleteSecurityGroup",
"cognito-idp:DescribeUserPoolClient",
"acm:ListCertificates",
"acm:DescribeCertificate",
"iam:ListServerCertificates",
"iam:GetServerCertificate",
"waf-regional:GetWebACL",
"waf-regional:GetWebACLForResource",
"waf-regional:AssociateWebACL",
"waf-regional:DisassociateWebACL",
"wafv2:GetWebACL",
"wafv2:GetWebACLForResource",
"wafv2:AssociateWebACL",
"wafv2:DisassociateWebACL",
"shield:GetSubscriptionState",
"shield:DescribeProtection",
"shield:CreateProtection",
"shield:DeleteProtection",
],
resources: ["*"],
})
);
return role;
}
private installLoadBalancerController(
cluster: eks.Cluster,
role: iam.Role
): void {
// The service account must exist before the Helm chart — the chart
// references it by name, and we need to annotate it with the role ARN
// so IRSA can inject the credentials.
const serviceAccount = cluster.addServiceAccount("LoadBalancerControllerSA", {
name: "aws-load-balancer-controller",
namespace: "kube-system",
annotations: {
"eks.amazonaws.com/role-arn": role.roleArn,
},
});
const chart = cluster.addHelmChart("AwsLoadBalancerController", {
chart: "aws-load-balancer-controller",
repository: "https://aws.github.io/eks-charts",
namespace: "kube-system",
release: "aws-load-balancer-controller",
version: "1.11.0",
values: {
clusterName: config.clusterName,
serviceAccount: {
create: false, // we created it above with the IRSA annotation
name: "aws-load-balancer-controller",
},
region: config.region,
vpcId: cluster.vpc.vpcId,
replicaCount: 2,
resources: {
requests: { cpu: "100m", memory: "128Mi" },
limits: { cpu: "500m", memory: "256Mi" },
},
},
});
chart.node.addDependency(serviceAccount);
}
// ─── Karpenter ───────────────────────────────────────────────────────────────
private createKarpenterRole(
cluster: eks.Cluster,
oidcProvider: iam.OpenIdConnectProvider
): iam.Role {
const issuerHostPath = cluster.clusterOpenIdConnectIssuerUrl.replace("https://", "");
const role = new iam.Role(this, "KarpenterControllerRole", {
assumedBy: new iam.WebIdentityPrincipal(
oidcProvider.openIdConnectProviderArn,
{
StringEquals: {
[`${issuerHostPath}:sub`]:
"system:serviceaccount:karpenter:karpenter",
[`${issuerHostPath}:aud`]: "sts.amazonaws.com",
},
}
),
description: "IRSA role for the Karpenter controller",
});
// Karpenter creates, describes, and terminates EC2 instances and interacts
// with Auto Scaling, SSM (for AMI resolution), IAM (to pass the node role
// to new instances), and EKS (to register nodes).
role.addToPolicy(
new iam.PolicyStatement({
sid: "KarpenterEC2",
actions: [
"ec2:CreateFleet",
"ec2:CreateLaunchTemplate",
"ec2:CreateTags",
"ec2:DeleteLaunchTemplate",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeImages",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeInstanceTypes",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:RunInstances",
"ec2:TerminateInstances",
],
resources: ["*"],
})
);
role.addToPolicy(
new iam.PolicyStatement({
sid: "KarpenterIAM",
actions: ["iam:PassRole"],
// Karpenter must be able to pass the node role to new EC2 instances.
resources: [
`arn:aws:iam::${this.account}:role/*`,
],
conditions: {
StringEquals: {
"iam:PassedToService": "ec2.amazonaws.com",
},
},
})
);
role.addToPolicy(
new iam.PolicyStatement({
sid: "KarpenterSSM",
actions: ["ssm:GetParameter"],
// SSM parameter store paths where EKS-optimized AMI IDs are published.
resources: [
`arn:aws:ssm:${this.region}::parameter/aws/service/eks/optimized-ami/*`,
],
})
);
role.addToPolicy(
new iam.PolicyStatement({
sid: "KarpenterPricing",
actions: ["pricing:GetProducts"],
resources: ["*"],
})
);
role.addToPolicy(
new iam.PolicyStatement({
sid: "KarpenterSQS",
actions: [
"sqs:DeleteMessage",
"sqs:GetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:ReceiveMessage",
],
// Karpenter watches an SQS queue for EC2 interruption events (Spot
// termination notices, rebalance recommendations) to drain nodes cleanly.
resources: [
`arn:aws:sqs:${this.region}:${this.account}:karpenter-interruption-${config.clusterName}`,
],
})
);
return role;
}
private installKarpenter(cluster: eks.Cluster, role: iam.Role): void {
const serviceAccount = cluster.addServiceAccount("KarpenterSA", {
name: "karpenter",
namespace: "karpenter",
annotations: {
"eks.amazonaws.com/role-arn": role.roleArn,
},
});
// The karpenter namespace must exist before the service account.
const ns = cluster.addManifest("KarpenterNamespace", {
apiVersion: "v1",
kind: "Namespace",
metadata: { name: "karpenter" },
});
serviceAccount.node.addDependency(ns);
const chart = cluster.addHelmChart("Karpenter", {
chart: "karpenter",
repository: "oci://public.ecr.aws/karpenter",
namespace: "karpenter",
release: "karpenter",
version: "1.3.3",
values: {
serviceAccount: {
create: false,
name: "karpenter",
},
settings: {
clusterName: config.clusterName,
clusterEndpoint: cluster.clusterEndpoint,
interruptionQueue: `karpenter-interruption-${config.clusterName}`,
},
controller: {
resources: {
requests: { cpu: "250m", memory: "256Mi" },
limits: { cpu: "1", memory: "512Mi" },
},
},
},
});
chart.node.addDependency(serviceAccount);
}
private createKarpenterNodePools(cluster: eks.Cluster): void {
// EC2NodeClass: describes *how* to launch instances — AMI family, subnet
// selectors, security group selectors. One class per OS/AMI type.
cluster.addManifest("SystemNodeClass", {
apiVersion: "karpenter.k8s.aws/v1",
kind: "EC2NodeClass",
metadata: { name: "system" },
spec: {
amiFamily: "AL2023",
role: `KarpenterNodeRole-${config.clusterName}`,
subnetSelectorTerms: [
{ tags: { "kubernetes.io/role/internal-elb": "1" } },
],
securityGroupSelectorTerms: [
{ tags: { [`kubernetes.io/cluster/${config.clusterName}`]: "owned" } },
],
tags: { "karpenter.sh/discovery": config.clusterName },
},
});
cluster.addManifest("GpuNodeClass", {
apiVersion: "karpenter.k8s.aws/v1",
kind: "EC2NodeClass",
metadata: { name: "gpu" },
spec: {
amiFamily: "AL2",
// EKS-optimized GPU AMI — Karpenter resolves the latest version automatically.
amiSelectorTerms: [
{
alias: `eks/1.32@latest`,
},
],
role: `KarpenterNodeRole-${config.clusterName}`,
subnetSelectorTerms: [
{ tags: { "kubernetes.io/role/internal-elb": "1" } },
],
securityGroupSelectorTerms: [
{ tags: { [`kubernetes.io/cluster/${config.clusterName}`]: "owned" } },
],
tags: { "karpenter.sh/discovery": config.clusterName },
instanceStorePolicy: "RAID0",
},
});
// NodePool: describes *what* to launch — instance families, capacity type,
// limits, and consolidation policy. Separate pools for system and GPU.
cluster.addManifest("SystemNodePool", {
apiVersion: "karpenter.sh/v1",
kind: "NodePool",
metadata: { name: "system" },
spec: {
template: {
metadata: {
labels: { "node.kubernetes.io/purpose": "system" },
},
spec: {
nodeClassRef: { group: "karpenter.k8s.aws", kind: "EC2NodeClass", name: "system" },
requirements: [
{ key: "karpenter.sh/capacity-type", operator: "In", values: ["on-demand"] },
{ key: "kubernetes.io/arch", operator: "In", values: ["amd64"] },
{ key: "karpenter.k8s.aws/instance-family", operator: "In", values: ["m7i", "m6i"] },
{ key: "karpenter.k8s.aws/instance-size", operator: "In", values: ["xlarge", "2xlarge"] },
],
},
},
limits: { cpu: "32", memory: "128Gi" },
disruption: {
consolidationPolicy: "WhenEmptyOrUnderutilized",
consolidateAfter: "30s",
},
},
});
cluster.addManifest("GpuNodePool", {
apiVersion: "karpenter.sh/v1",
kind: "NodePool",
metadata: { name: "gpu" },
spec: {
template: {
metadata: {
labels: {
"node.kubernetes.io/purpose": "gpu-inference",
"nvidia.com/gpu-present": "true",
},
},
spec: {
nodeClassRef: { group: "karpenter.k8s.aws", kind: "EC2NodeClass", name: "gpu" },
requirements: [
{ key: "karpenter.sh/capacity-type", operator: "In", values: ["spot", "on-demand"] },
{ key: "kubernetes.io/arch", operator: "In", values: ["amd64"] },
{ key: "karpenter.k8s.aws/instance-family", operator: "In", values: ["g5", "g4dn"] },
],
taints: [
{ key: "nvidia.com/gpu", value: "present", effect: "NoSchedule" },
],
},
},
limits: { "nvidia.com/gpu": "8" },
disruption: {
// Only consolidate when a node is completely empty — a GPU node
// with even one vLLM replica should not be disrupted.
consolidationPolicy: "WhenEmpty",
consolidateAfter: "1m",
},
},
});
}
}
Update bin/app.ts to add the fourth stack:
// bin/app.ts
import * as cdk from "aws-cdk-lib";
import { NetworkStack } from "../lib/network-stack";
import { ClusterStack } from "../lib/cluster-stack";
import { NodeGroupStack } from "../lib/node-group-stack";
import { AddonsStack } from "../lib/addons-stack";
import { config } from "../lib/config";
const app = new cdk.App();
const env = { region: config.region };
const network = new NetworkStack(app, "HybridLlmNetwork", { env });
const cluster = new ClusterStack(app, "HybridLlmCluster", {
env,
vpc: network.vpc,
});
new NodeGroupStack(app, "HybridLlmNodeGroups", {
env,
cluster: cluster.cluster,
nodeRole: cluster.nodeRole,
});
new AddonsStack(app, "HybridLlmAddons", {
env,
cluster: cluster.cluster,
oidcProvider: cluster.oidcProvider,
});
Walking Through the Decisions
Why two add-ons, not more
There are a dozen Kubernetes add-ons that commonly appear in "production EKS" checklists — ExternalDNS, cert-manager, Secrets Store CSI, Velero, and more. We are not installing all of them here. The two we install are the ones the platform cannot function without: something to provision load balancers (or vLLM never gets traffic), and something to autoscale nodes (or you pay for idle GPU capacity around the clock). Everything else is optional or gets introduced with the workload that needs it.
AWS Load Balancer Controller IRSA
The IRSA trust policy for the load balancer controller locks the role to a single service account: system:serviceaccount:kube-system:aws-load-balancer-controller. The StringEquals condition means no other service account on the cluster — not even one in the same namespace with a similar name — can assume this role. This is the IRSA precision Part 2 described.
The serviceAccount.create: false Helm value is deliberate. We create the Kubernetes ServiceAccount resource ourselves, using CDK's cluster.addServiceAccount, because we need to control the IRSA annotation (eks.amazonaws.com/role-arn) at create time. If we let the Helm chart create the service account, the annotation must be passed through Helm values, which is fine, but CDK's addServiceAccount is cleaner — it creates the service account and the IAM trust policy in a single construct, and the chart.node.addDependency(serviceAccount) ensures CDK deploys the service account before the chart that references it.
The IAM policy for the load balancer controller is large. AWS maintains an official policy document for it, but we inline the relevant permissions here rather than loading an external JSON file, because the list rarely changes between minor chart versions and keeping it in TypeScript means cdk diff shows the exact permission set rather than a policy ARN you have to look up.
Karpenter vs. Cluster Autoscaler
Cluster Autoscaler (CAS) scales node groups based on pending pod counts and node group min/max. It works, but it has a design constraint that matters for this platform: it operates on EC2 Auto Scaling Groups, and scaling decisions are bounded by the instance types and configuration of the ASG you have already defined. If you want to add a new instance type, you change the ASG and redeploy.
Karpenter operates differently. Its NodePool is a policy, not a group — it describes constraints (instance families, capacity type, architecture) and Karpenter selects the best available match from EC2's full catalog at launch time. For a GPU inference platform this is valuable: g5 and g4dn instances are both in the GPU NodePool, so if g5.xlarge Spot capacity is tight in a zone, Karpenter can immediately try g4dn.xlarge from the same family rather than waiting for ASG instance type expansion.
The bigger advantage is consolidation. CAS can scale down an empty node group to its minSize. Karpenter can actively consolidate underutilized nodes — it evaluates whether the pods on a node could fit on other existing nodes, and if so, terminates the excess node. For the GPU pool with consolidationPolicy: WhenEmpty, Karpenter will terminate a GPU node once its last pod exits, which is exactly the "scales to zero overnight" behavior we need. With CAS and a managed node group minSize: 0, you would need an external mechanism to shrink the group when pods are gone; Karpenter does it natively.
NodePool and EC2NodeClass split
Karpenter v1 separates what infrastructure to launch (EC2NodeClass) from what workloads go where (NodePool). The EC2NodeClass is AWS-specific: subnet selectors, security group selectors, AMI family. The NodePool is cloud-agnostic: instance requirements, labels, taints, limits, consolidation policy.
We define separate classes and pools for system and GPU nodes because the AMI families differ — AL2023 for system, AL2 for GPU (the GPU-optimized AMI). If we used a single EC2NodeClass, Karpenter would have to pick one AMI family for all nodes, and AL2023 does not (yet) have a fully production-ready GPU AMI for all instance types. Two classes, two pools, clean separation.
GPU NodePool consolidation policy
The GPU pool uses consolidationPolicy: WhenEmpty. System nodes use WhenEmptyOrUnderutilized. The difference matters.
WhenEmptyOrUnderutilized means Karpenter will actively bin-pack: if two system nodes are each 40% utilized, it may consolidate to one 80% node, terminate the other, and save money. That is desirable for CPU workloads — pods migrate, there is a brief disruption, and you save on instance cost.
WhenEmpty means Karpenter will only terminate a GPU node when all its pods are gone. vLLM model servers hold large GPU models in memory; reloading a model after an eviction-driven migration takes 1–5 minutes depending on model size. Aggressively bin-packing GPU nodes to save $0.35/hr while causing multi-minute model reload delays is not a good trade. The GPU pool stays up while anything is running and terminates only when truly empty.
The Spot interruption queue
The Karpenter IAM policy includes permissions on an SQS queue named karpenter-interruption-<cluster-name>. Karpenter watches this queue for EC2 interruption events — Spot termination two-minute notices, instance rebalance recommendations, and scheduled maintenance events. When it receives one, it begins draining the affected node before the instance disappears, which gives running pods their full terminationGracePeriodSeconds to shut down cleanly.
Without this queue, Spot interruptions arrive as a SIGTERM with no advance notice at the Kubernetes layer. Two-minute Spot notices exist at the EC2 level but nothing surfaces them to the pod graceful shutdown flow. Creating the SQS queue and wiring it to EC2 instance state change events is a prerequisite for well-behaved Spot usage; we create it in the next section.
The interruption SQS queue
The queue itself is not in the CDK snippet above — add it to AddonsStack or to a shared config stack:
private createInterruptionQueue(): void {
const queue = new cdk.aws_sqs.Queue(this, "KarpenterInterruptionQueue", {
queueName: `karpenter-interruption-${config.clusterName}`,
retentionPeriod: cdk.Duration.seconds(300),
});
// EventBridge rules that route EC2 lifecycle events to the queue.
const rule = new cdk.aws_events.Rule(this, "KarpenterInterruptionRule", {
eventPattern: {
source: ["aws.ec2"],
detailType: [
"EC2 Instance State-change Notification",
"EC2 Spot Instance Interruption Warning",
"EC2 Instance Rebalance Recommendation",
"EC2 Instance Scheduled Change",
],
},
targets: [new cdk.aws_events_targets.SqsQueue(queue)],
});
}
Deploy the Add-ons
cdk deploy HybridLlmAddons
The deploy installs both Helm charts via the CDK EKS addHelmChart mechanism, which runs helm install as part of the CloudFormation stack update using a Lambda-backed custom resource. It also applies the four Karpenter manifests (two EC2NodeClass and two NodePool) via kubectl apply. Total time is 2–4 minutes.
Verify the Add-ons
Load Balancer Controller
kubectl -n kube-system get deployment aws-load-balancer-controller
# Expected: READY 2/2
kubectl -n kube-system logs deployment/aws-load-balancer-controller | head -5
# Should show controller startup, no errors about missing credentials
If the controller pods are stuck in Pending or CrashLoopBackOff, check IRSA first:
kubectl -n kube-system describe pod -l app.kubernetes.io/name=aws-load-balancer-controller \
| grep -A5 "Environment"
# Look for AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN — if absent, the
# service account annotation is missing or the trust policy is wrong.
Karpenter
kubectl -n karpenter get deployment karpenter
# Expected: READY 1/1
kubectl get nodepools
# NAME NODECLASS NODES ...
# system system ...
# gpu gpu ...
kubectl get ec2nodeclasses
# NAME ...
# system ...
# gpu ...
Smoke test: let Karpenter provision a node
Create a pod that requests more CPU than the current system nodes can provide, wait for Karpenter to provision a new node, and then clean up:
kubectl run karpenter-test \
--image=public.ecr.aws/amazonlinux/amazonlinux:2023 \
--requests='cpu=4' \
--restart=Never \
-- sleep 60
# Watch Karpenter logs — you should see a provisioning decision within ~15s.
kubectl -n karpenter logs deployment/karpenter -f | grep -i "launched\|nodeclaim"
# Once the node is Ready and the pod completes, Karpenter will consolidate.
kubectl delete pod karpenter-test
Test the Load Balancer Controller End-to-End
A minimal test Ingress confirms the controller can talk to the AWS API and provision a real ALB:
# alb-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo
spec:
selector:
matchLabels:
app: echo
template:
metadata:
labels:
app: echo
spec:
containers:
- name: echo
image: hashicorp/http-echo
args: ["-text=hello from eks"]
ports:
- containerPort: 5678
---
apiVersion: v1
kind: Service
metadata:
name: echo
spec:
selector:
app: echo
ports:
- port: 80
targetPort: 5678
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: echo
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: echo
port:
number: 80
kubectl apply -f alb-test.yaml
# Wait ~60s for the ALB to provision.
kubectl get ingress echo
# ADDRESS column should show an AWS ALB hostname: *.elb.amazonaws.com
ALB=$(kubectl get ingress echo -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
curl http://${ALB}
# Expected: hello from eks
kubectl delete -f alb-test.yaml
If the ADDRESS column stays empty after two minutes, check the controller logs for IAM errors (not authorized to perform: elasticloadbalancing:*) or missing subnet tags — the kubernetes.io/role/elb tag on public subnets from Part 1 is what the controller uses for ALB subnet discovery.
Tearing Down
cdk destroy HybridLlmAddons
cdk destroy HybridLlmNodeGroups
cdk destroy HybridLlmCluster
cdk destroy HybridLlmNetwork
Before destroying HybridLlmAddons, delete any Ingress objects that have provisioned ALBs — the controller creates real AWS resources that CDK does not track, and a live ALB will block VPC deletion with a dependency error.
What's Next
The cluster is now fully operational: network, control plane, node pools, and the two add-ons that make it production-ready. Karpenter will provision GPU nodes when vLLM pods are pending and terminate them when they are gone. The load balancer controller will provision ALBs for anything with an Ingress.
In Part 5 we deploy the inference layer: vLLM model servers on the GPU pool, loading open-source weights, and KEDA-based autoscaling that adjusts the replica count based on the request queue depth — so GPU capacity and model server replicas both track real traffic.