Writing

Blog

Insights and practical guidance on AI infrastructure, GPU optimization, Kubernetes, and platform engineering.

Featured

July 20, 2026

What It Takes to Own Your Agent Platform: The Firecracker Fleet Series in One Read

A capstone to the seven-part series on running eve.dev agents on a fleet of Firecracker microVMs in AWS. The whole platform in one picture, a guided tour of the parts, and the design threads — bare-metal constraints, desired-vs-actual state, event-driven placement, and packing-as-economics — that hold it together.

aws aws-cdk firecracker eve agents typescript platform-engineering ai-infrastructure

Featured

May 24, 2026

The Cost-Efficient AI Stack: Ship AI Features Without the Runaway Bill

Most teams overpay for AI by routing every request to a frontier model. This is the architecture we build instead — hybrid cloud+local routing, self-hosted inference, agent orchestration, and cost-per-request observability — and the single principle that ties it together: send each unit of work to the cheapest model that can do it well.

ai llm cost-optimization hybrid infrastructure finops

July 26, 2026

Run Your Own Buzz: A Local Community on Hardware You Control

Buzz is a self-hostable workspace where people and AI agents share the same rooms, built as a Nostr relay. A hands-on walkthrough: standing up a local community, the host-binding gotcha that will stop you cold, and adding an agent backed by OpenRouter, a local Hermes model, or Claude Code.

buzz nostr self-hosting agents rust ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 3: Packaging an Agent as a microVM Image

Part 3 of the hands-on series. We turn an ordinary eve agent directory into a bootable ext4 rootfs with a build-rootfs.sh, store it in a versioned S3 artifact bucket the hosts pull from, inject secrets per-microVM with MMDS, and use Firecracker snapshots to turn a cold multi-second boot into a warm sub-second resume.

aws aws-cdk firecracker eve agents typescript s3 ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 4: The Control Plane

Part 4 of the hands-on series. We build the scheduler that turns a deploy request into a running agent — API Gateway and Lambda over a DynamoDB registry of hosts, agents, and placements, a race-safe bin-packing placement algorithm, and EventBridge decoupling the API from the work, all in AWS CDK.

aws aws-cdk firecracker eve agents typescript lambda dynamodb ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 6: The Deploy Workflow

Part 6 of the hands-on series. We turn the moving parts into one command — a deploy CLI that chains build, upload, place, and wait-for-URL; IAM auth on the control-plane API; and a GitHub Actions pipeline that ships an agent on merge and gives every pull request its own preview agent.

aws aws-cdk firecracker eve agents typescript github-actions cicd ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 2: The Host Fleet

Part 2 of the hands-on series. We put the first machines into the network from Part 1 — an Auto Scaling Group of bare-metal EC2 hosts, a launch template whose user data installs Firecracker and a host-agent daemon, the IAM role each host runs under, and the capacity model that decides how many agents a host can hold.

aws aws-cdk firecracker eve agents typescript ec2 autoscaling ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 7: Fleet Operations

The final part of the series. We make the fleet operable — event-driven host autoscaling with lifecycle-hook draining, a reconciliation loop that reschedules agents off a dead host, per-agent logs and metrics rolled up across the fleet, and the FinOps view that turns packing density into cost per agent.

aws aws-cdk firecracker eve agents typescript autoscaling observability finops ai-infrastructure

July 19, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 5: Networking & Routing

Part 5 of the hands-on series. We make a placed agent reachable — per-microVM tap devices and NAT on each host, an internet-facing ALB with a wildcard certificate, a per-agent subdomain scheme, and a host-local front proxy that resolves an agent to its microVM wherever the control plane placed it.

aws aws-cdk firecracker eve agents typescript alb networking ai-infrastructure

July 18, 2026

A 101 Guide: Running an Eve Agent in a Firecracker microVM on AWS

A beginner's walkthrough of provisioning bare-metal AWS infrastructure with CDK TypeScript, then booting a Firecracker microVM to self-host a Vercel eve agent.

aws aws-cdk firecracker eve agents typescript ai-development observability opentelemetry finops

July 18, 2026

Running a Fleet of Firecracker microVMs for eve.dev Agents, Part 1: Architecture & the Network Foundation

Part 1 of a hands-on series turning a single Firecracker host into a fleet that hosts eve.dev agents on demand. We map the whole platform — a control plane that places agents onto bare-metal hosts, and a routing layer that gets requests back to them — then provision the VPC, subnets, and security groups in AWS CDK.

aws aws-cdk firecracker eve agents typescript ai-infrastructure

July 9, 2026

A Quick Example: Building an Agent with Vercel's Eve Framework

A short, hands-on walkthrough of scaffolding, running, and deploying an AI agent with Vercel's open-source eve framework.

ai-development vercel eve agents

June 19, 2026

The Local AI Inflection Point: What the Next Three Years Actually Look Like

Local AI is crossing a threshold where on-device and self-hosted models stop being cost-cutting compromises and start being the default choice. Here's what's driving that shift and what it means for how you build software.

ai local-models llm inference edge-computing

June 7, 2026

Building a Hybrid LLM Platform on EKS, Part 5: Serving Local Models with vLLM and KEDA

Part 5 of our hands-on EKS series. We deploy vLLM model servers on the GPU pool from Part 4, load Qwen2.5-7B model weights from Amazon S3 via an init container, and wire KEDA autoscaling that scales replicas with live queue depth and drives GPU nodes to zero overnight.

eks kubernetes aws-cdk vllm keda gpu autoscaling llm ai-infrastructure typescript

June 7, 2026

Building a Hybrid LLM Platform on EKS, Part 7: Observability and Cost Telemetry

Part 7 of our hands-on EKS series. We instrument the TypeScript router with OpenTelemetry, upgrade Prometheus to kube-prometheus-stack for GPU and vLLM metrics, add Grafana Tempo for distributed traces, and wire Langfuse so every request shows its backend, token count, and dollar cost.

eks kubernetes aws-cdk opentelemetry prometheus grafana langfuse observability typescript ai-infrastructure

June 7, 2026

Building a Hybrid LLM Platform on EKS, Part 6: The Hybrid Router

Part 6 of our hands-on EKS series. We build a TypeScript/Hono router that sits in front of both vLLM and the Anthropic API, routes each request to the right backend based on model name and complexity heuristics, and falls back to cloud when the local model is cold-starting.

eks kubernetes aws-cdk hono typescript llm routing hybrid-ai ai-infrastructure

June 7, 2026

Building a Hybrid LLM Platform on EKS, Part 8: Testing, Load, and Examples

The final part of our EKS series. We write integration tests with Vitest, load-test the ALB with k6, build three real-world TypeScript workloads that prove the hybrid routing works, and use the Grafana and Langfuse dashboards from Part 7 to verify the platform under traffic.

eks kubernetes aws-cdk vitest k6 testing typescript llm ai-infrastructure

June 6, 2026

Building a Hybrid LLM Platform on EKS, Part 4: Platform Add-ons, the Load Balancer Controller, and Karpenter

Part 4 of our hands-on EKS series. We install the two add-ons every production EKS cluster needs: the AWS Load Balancer Controller so Kubernetes Ingress objects provision real ALBs, and Karpenter for cost-aware autoscaling — including the GPU NodePool that scales to zero between inference workloads.

eks kubernetes aws-cdk karpenter load-balancer-controller autoscaling irsa ai-infrastructure typescript

June 6, 2026

Building a Hybrid LLM Platform on EKS, Part 3: Node Groups, GPU AMIs, and the NVIDIA Device Plugin

Part 3 of our hands-on EKS series. We add worker nodes to the empty cluster from Part 2: a CPU system pool for add-ons and the hybrid router, a GPU pool for vLLM model servers, the NVIDIA device plugin DaemonSet, and the taints and labels that make scheduling predictable.

eks kubernetes aws-cdk gpu nvidia node-groups ai-infrastructure typescript

May 30, 2026

Building a Hybrid LLM Platform on EKS, Part 2: The Control Plane, IAM, and IRSA

Part 2 of our hands-on EKS series. We provision the EKS cluster into the VPC from Part 1, wire up OIDC federation and IRSA so pods authenticate without static credentials, and end with a working kubectl connection to a real cluster.

eks kubernetes aws-cdk iam irsa oidc ai-infrastructure typescript

May 29, 2026

Securing Self-Hosted LLMs and AI Agents on Kubernetes

Harden self-hosted vLLM and AI agents on Kubernetes: an auth/rate-limit gateway, gVisor tool sandboxing, prompt-injection guardrails, scoped secrets, and signed model weights — mapped to the OWASP LLM Top 10.

security ai agents kubernetes llm prompt-injection supply-chain

May 24, 2026

Building a Hybrid LLM Platform on EKS, Part 1: Architecture and the Network Foundation

Part 1 of a hands-on series building the EKS-based hybrid LLM platform referenced throughout this blog. We map out the full architecture, then provision the VPC, subnets, NAT, and VPC endpoints with AWS CDK — the network foundation every later part builds on.

eks kubernetes aws-cdk llm ai-infrastructure hybrid-ai vpc typescript

May 23, 2026

Build a Personal AI Dev Environment: Hybrid Models, Local Inference, and a Workflow That Costs Almost Nothing

The production patterns we deploy for teams — hybrid cloud/local routing, self-hosted models, agent orchestration — scaled down to a single developer's workstation. A practical guide to building a personal AI dev environment with Ollama, Claude Code, and a local router that keeps your token bill near zero.

ai llm local-models ollama claude-code developer-tools

May 22, 2026

The Agent Control Plane: Frontier Models Plan, Your Kubernetes Fleet Executes

How to orchestrate a fleet of AI agents using a shared task queue — frontier models like Claude handle planning and decomposition, while a local Kubernetes worker pool runs the high-volume execution tasks. Covers the task ledger, dynamic task creation, lane-based routing, and KEDA autoscaling.

ai agents orchestration kubernetes llm hybrid

May 21, 2026

Observability for LLM Applications on Kubernetes: Tokens, Traces, and Cost per Request

How to instrument self-hosted and hybrid LLM workloads with OpenTelemetry, Prometheus, and Langfuse — tracking time-to-first-token, tokens per second, GPU utilization, and unit economics down to the individual request.

kubernetes llm observability opentelemetry finops ai-infrastructure

May 14, 2026

The Hybrid AI Playbook: Cloud Models for Thinking, Local Models for Doing

How to cut your AI costs by 60-80% using a hybrid approach — Claude or GPT for planning and complex reasoning, local models like Llama and Qwen for execution tasks like code generation, summarization, and data extraction.

ai llm cost-optimization local-models ollama

April 3, 2026

Self-Hosting LLMs on Kubernetes: A Practical Guide

How to deploy, serve, and autoscale open-source large language models on Kubernetes with vLLM — from GPU node pools and deployment manifests to KEDA-based autoscaling and production guardrails.

kubernetes llm gpu ai-infrastructure self-hosting

February 8, 2026

Container Security on Kubernetes: A Practical Guide with Trivy, Falco, and Kyverno

Most Kubernetes clusters are running containers with known vulnerabilities, no runtime monitoring, and no policy enforcement. Here is how to fix that with three open-source tools.

kubernetes security trivy falco kyverno containers

February 8, 2026

How to Cut Your AWS Bill in Half Without Changing Your Architecture

Most growing teams are overpaying on AWS by 30-50%. Here is the exact checklist we use in every infrastructure audit to find and eliminate wasted spend — no migrations, no rearchitecting.

aws cost-optimization infrastructure cloud

February 7, 2025

Using AI to Monitor Kubernetes Clusters and Make Dynamic Scaling Decisions

How to move beyond static thresholds and use AI-driven observability to detect anomalies, predict traffic patterns, and automate scaling decisions across your Kubernetes infrastructure.

kubernetes ai monitoring autoscaling observability

February 6, 2025

A Practical Guide to AI for Small and Mid-Size Businesses

No hype, no jargon — a straightforward guide for business owners evaluating where AI actually makes sense and how to adopt it without wasting money.

ai small-business strategy automation

January 29, 2025

Building a CI/CD Pipeline with Dagger That Deploys to Kubernetes

A practical guide to building a containerized CI/CD pipeline using Dagger's TypeScript SDK — from local Kind clusters to production EKS with GitHub Actions, AWS CDK, and multi-environment promotion.

dagger kubernetes cicd helm eks github-actions aws-cdk

January 22, 2025

Building a Production Feature Flag Service with Claude Code

How we built FlagSignals, a full-stack feature flag platform with A/B testing and billing, using AI-assisted development.

feature-flags ai-development nextjs supabase

January 15, 2025

GPU Cost Optimization on Kubernetes: A Practical Guide

Learn how to reduce GPU infrastructure costs by up to 60% with proper Kubernetes scheduling, time-slicing, and right-sizing strategies.

kubernetes gpu cost-optimization infrastructure

January 8, 2025

Platform Engineering for AI/ML Teams: Building the Foundation

How platform engineering principles transform AI/ML infrastructure from artisanal setups to scalable, self-service platforms.

platform-engineering ai-ml infrastructure devops

January 1, 2025

FinOps for AI Infrastructure: Beyond Cloud Cost Tags

Traditional FinOps practices fall short for AI workloads. Here's how to build a cost management strategy that accounts for GPU economics.

finops cost-management ai-infrastructure cloud