Homelab Platform (in-house), A private AI platform, engineered to be boring.

Context

The platform that hosts everything else in this portfolio. Two mini PCs in a 4U rack draw about 50W idle, run 15 LXC containers between them, and serve every in-house project I ship, the FB-Media news pipeline, the ContentForge video stack, the live monitoring dashboard, the homelab notification service.

It exists because cloud bills add up fast for projects that aren’t done yet, and because a one-person studio is its own most-demanding client. The platform has to be boring: predictable, reproducible, recoverable from a single command.

Brief

Two-node cluster on consumer hardware (no enterprise gear).
All services in LXC containers, no per-VM kernel overhead.
GPU passthrough to the inference container, local LLM serving without a separate GPU host.
Single-command rebuild: lose a node, run one playbook, get the node back.
Public services fronted by Cloudflare; private services on a flat LAN.
Idle power budget under 60W combined.

Architecture

Two Proxmox nodes:

Services node (pve-desktop), 15 containers across the services subnet. Hosts the dashboard (CT 208), the notification fan-out (CT 209), the in-house Git-style toolchain, and shared infrastructure (Pi-hole DNS, WireGuard, Traefik, Prometheus, Grafana).
Lab node (pve-asus), fewer, beefier containers for inference and experimentation. Hosts the GPU-passthrough Ollama container with an NVIDIA card for local LLM serving.

Both nodes are Ansible-provisioned end-to-end. The playbook tree covers: kernel + Proxmox configuration, container provisioning per host, service-specific role files (one per CT), firewall rules at the host level (nftables), TLS certificate management, and Cloudflare DNS sync.

A Terraform migration is underway, the goal is to replace the Ansible provisioning playbooks with Terraform modules for cloud-shape primitives (containers, networks, certificates) while keeping Ansible for in-container service configuration. The pattern is honest: declarative infrastructure where Terraform fits, imperative configuration where Ansible already wins.

Outcomes

15 LXC containers running across two Proxmox nodes.
GPU passthrough to the inference container, same workstation that runs the dashboard sidecar.
Idle power: ~50W combined, measured at the wall.
Single-command provisioning, ansible-playbook site.yml rebuilds either node from bare metal in under 20 minutes.
Five public services fronted by Cloudflare with *.bytecraftmedia.eu certificates auto-renewed.
Zero unplanned outages since the Ansible playbooks reached parity.

Screens

[FILL: replace with anonymized screenshots of the Proxmox web UI showing both nodes, the Ansible inventory tree, and a Grafana panel showing host-level metrics. Avoid screenshots of any client-facing service running on the platform.]

What’s next

Three items on the next-iteration list:

Lock in the Terraform module structure before bootstrapping any third node. Adapting the Ansible inventory mid-migration was a week of avoidable rework, the second node would have been faster if the Terraform/Ansible split had been decided up front.
Promote the firewall rules from host-level nftables to a centrally-managed source-of-truth. Currently each node has its own ruleset; a small Ansible role + a Terraform variable would reduce drift.
Off-rack snapshot replication, currently the platform’s container snapshots live on the same nodes that produce them. A small monthly job that ships compressed snapshots to a Cloudflare R2 bucket would close the disaster-recovery gap without adding hardware.

A private AI platform, engineered to be boring.