Skip to main content

Adding a Worker Node to Single-Node OpenShift (SNO)

·644 words·4 mins
Author
Ifesinachi Osude
Writing about infrastructure, automation, observability, networking, security, and homelab engineering.
OKD Homelab Series - This article is part of a series.
Part 2: This Article

Install OKD on Bare Metal

Single-Node OpenShift gets you a working cluster on one machine. The moment you actually start deploying things, two limits show up:

  • Anti-affinity blocks replicas. A PerconaXtraDBCluster with size: 2 will sit Pending forever because both pods want different hosts.
  • Maintenance is scary. Reboot the SNO node and the whole cluster goes dark.

Both go away as soon as you add a worker. Here’s the actual flow on OKD 4.22 SCOS, no Assisted Installer, no agent ISO.

The shape of the trick
#

SNO doesn’t ship a worker-ignition by default — there’s no machine-api operator wired to spit out a worker bootstrap image. What you do have:

  1. A working API server at https://api.okd.example.com:6443
  2. A machine-config-server (MCS) at https://api-int.okd.example.com:22623
  3. Existing worker MachineConfigPool with all the desired config baked in

So the recipe is:

  1. Boot the new host on a plain SCOS live ISO.
  2. Have it coreos-installer install from the ISO and embed an ignition URL that points at the running MCS’s worker endpoint.
  3. The new host pulls config, pivots, joins the cluster as a worker.

That’s it. Three commands, mostly.

Pre-flight on the new box
#

Anything that boots SCOS will work — bare metal, Proxmox VM, NUC. Mine is a Beelink mini PC.

  • 4+ cores, 16 GB RAM, 250 GB SSD minimum
  • Reachable from the SNO (the new node will pull from api-int.okd.example.com:22623)
  • DNS resolves api, api-int, and *.apps of your cluster (point Technitium at it like SNO did)

Step 1: ignition file from the running cluster
#

From your workstation with the SNO kubeconfig:

oc -n openshift-machine-config-operator get secret \
   worker-user-data -o jsonpath='{.data.userData}' \
   | base64 -d > worker.ign

That single JSON file is what makes the new node “a worker for this cluster.” It points at the MCS, embeds the cluster CA, and tells coreos-installer how to pivot.

Step 2: serve the ignition
#

The new node will fetch the ignition during install. Easiest way — python3 -m http.server on your workstation:

python3 -m http.server 8080 --bind 10.10.0.40

Now http://10.10.0.40:8080/worker.ign is reachable from the new node.

Step 3: install
#

Boot the new node on the SCOS live ISO (same one you used for SNO). At the live shell:

sudo coreos-installer install /dev/sda \
   --ignition-url=http://10.10.0.40:8080/worker.ign \
   --copy-network

--copy-network carries your live-session network config (DNS, static IP if you set one) into the installed disk so the box comes back with the same identity.

Reboot. Pull the ISO. Wait.

Step 4: approve the CSR
#

The new node will phone home, request a kubelet certificate, and sit at Pending until you approve the CSR:

oc get csr | grep Pending
oc adm certificate approve <csr-name>
# wait 30s, second CSR appears for kubelet-serving
oc get csr | grep Pending
oc adm certificate approve <csr-name>

Within 2 minutes:

$ oc get nodes
NAME                         STATUS   ROLES                         AGE
master-0.okd.mikeosude.com   Ready    control-plane,master,worker   2d
node6                        Ready    worker                        90s

Removing the master’s worker role (optional)
#

SNO masters are both control-plane and worker so they can run pods at all. Once you have a real worker you can drop the worker role from the master:

oc label node master-0.okd.mikeosude.com node-role.kubernetes.io/worker-

I keep mine dual-roled — homelab, no real reason to be ascetic about it.

Things that bit me
#

  • Wrong DNS — the new node needs api-int.okd.example.com to resolve to the SNO. Without it, ignition fetch hangs.
  • api-int and api point at the same SNO. There’s only one. The split exists because in a real OCP cluster they go to different load balancers; for SNO they’re the same A record.
  • Worker pulls into “Provisioning, then SchedulingDisabled” for a few minutes while the MCO renders the config. That’s normal. Leave it alone, it joins.
  • CSR approval is a separate step. If you blink past oc get csr you’ll wonder why your node never goes Ready.

Now anti-affinity works, replicas: 2 schedules, and you can reboot one host without losing the cluster.

OKD Homelab Series - This article is part of a series.
Part 2: This Article