marco/veda

Marco van Zijl 594be97e44 Add hostpath configuration to talosctl patches for local storage

2025-05-09 18:57:49 +02:00

7.2 KiB

Raw Permalink Blame History

Veda

The new setup of my homelab will be based on Kubernetes, which will prevent all of my services going down when I need to do physical maintenance of a host.

Services

Core

Ceph for all storage: cephfs, object storage and block storage
Nextcloud: file storage interface for the entire family
Jellyfin: Web based media streaming
Authentik: Central identification and authentication server
Nginx reverse proxy
ACME client: SSL certificate handling
ArgoCD: Revision control for all Kubernetes configuration
Homeassistant + Zigbee2mqtt
Prometheus
Grafana
Grafana Loki + FluentD
Cilium
Harbor: Container image storage

Nice-to-have

Jellyseerr: Nice interface to request movies and series
Sonarr: Automated downloading and handling of series
Radarr: Automated downloading and handling of movies
Flaresolverr: Fetching data hidden behind captcha’s
Torrent client (qBittorrent): To download all the linux ISO’s
ExternalDNS
Paperless-ngx

Look-into-later

Mastodon: federated social platform
Forgejo: Git platform. Maybe this should not be hosted on the cluster as it will depend on it.
CloudNativePG: K8s operator for PostgreSQL

Installing

Configuration

export CLUSTER_NAME="veda"
export API_ENDPOINT="https://192.168.0.1:6443"

talosctl gen secrets --output-file secrets.yaml

talosctl gen config             \
    --with-secrets secrets.yaml \
    --output-types talosconfig  \
    --output talosconfig        \
    $CLUSTER_NAME               \
    $API_ENDPOINT

talosctl config merge ./talosconfig

Then correct the endpoint in the Talos client configuration:

# ~/.talos/config
context: veda
contexts:
    veda:
        endpoints: 
            - 192.168.0.1
# (...)

For controlplane nodes:

talosctl gen config \
        --output rendered/master1.yaml                            \
        --output-types controlplane                               \
        --with-secrets secrets.yaml                               \
        --config-patch @nodes/master1.yaml                        \
        --config-patch @patches/argocd.yaml                       \
        --config-patch @patches/cilium.yaml                       \
        --config-patch @patches/scheduling.yaml                   \
        --config-patch @patches/discovery.yaml                    \
        --config-patch @patches/disk.yaml                         \
        --config-patch @patches/vip.yaml                          \
        --config-patch @patches/metrics.yaml                      \
        --config-patch @patches/hostpath.yaml                     \
        $CLUSTER_NAME                                             \
        $API_ENDPOINT

For worker nodes:

talosctl gen config \
        --output rendered/worker1.yaml                            \
        --output-types worker                                     \
        --with-secrets secrets.yaml                               \
        --config-patch @nodes/worker1.yaml                        \
        --config-patch @patches/argocd.yaml                       \
        --config-patch @patches/cilium.yaml                       \
        --config-patch @patches/scheduling.yaml                   \
        --config-patch @patches/discovery.yaml                    \
        --config-patch @patches/diskselector.yaml                 \
        --config-patch @patches/metrics.yaml                      \
        --config-patch @patches/hostpath.yaml                     \
        $CLUSTER_NAME                                             \
        $API_ENDPOINT

Bootstrapping

Apply the configuration to each node:

talosctl apply-config --insecure --file rendered/master1.yaml  --nodes 192.168.0.10

Optionally, check the status. Point the Talos API endpoint directly to the node, since etcd, and thereby kube-vip, is not up.

talosctl -n 192.168.0.10 -e 192.168.0.10 dashboard

To start the cluster, we need to bootstrap the etcd cluster. This only has to be done for a single node.

talosctl -n 192.168.0.10 -e 192.168.0.10 bootstrap

Finally, retrieve the kubeconfig, it will merge with ~/.kube/config, if it exists.

talosctl -n 192.168.0.10 kubeconfig

Check nodes:

kubectl get nodes

TODO

Remove secrets from config

Misc

Applying patches

talosctl patch machineconfig -p @argocd.yaml -n 192.168.0.0

Reset node

talosctl reset --system-labels-to-wipe EPHEMERAL,STATE --reboot -n 192.168.0.0

User: admin, password can be retrieved with (ignore the '%' at the end):

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

User: admin on http://ceph.noxxos.nl

kubectl -n ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

Wiping disks for Ceph

Start a temporary pod on each node where the disks are:

kubectl run -it --rm \
    -n ceph                \
    --image quay.io/ceph/ceph:v19.2.2       \
    --privileged                \
    --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname": "master3"}}}' fix

Search for the correct disk with blkid, set DISK=/dev/sdX, then run (some of) the following commands:

ceph-volume lvm zap $DISK --destroy
wipefs -a $DISK
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
sgdisk --zap-all $DISK

# Wipe portions of the disk to remove more LVM metadata that may be present
dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=0 # Clear at offset 0
dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1 * 1024**2)) # Clear at offset 1GB
dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((10 * 1024**2)) # Clear at offset 10GB
dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((100 * 1024**2)) # Clear at offset 100GB
dd if=/dev/zero of="$DISK" bs=1K count=200 oflag=direct,dsync seek=$((1000 * 1024**2)) # Clear at offset 1000GB

# SSDs may be better cleaned with blkdiscard instead of dd
blkdiscard $DISK

# Inform the OS of partition table changes
partprobe $DISK

Certificate lifetimes

Talos Linux automatically manages and rotates all server side certificates for etcd, Kubernetes, and the Talos API. Note however that the kubelet needs to be restarted at least once a year in order for the certificates to be rotated. Any upgrade/reboot of the node will suffice for this effect.

You can check the Kubernetes certificates with the command talosctl get KubernetesDynamicCerts -o yaml on the controlplane.

Client certificates (talosconfig and kubeconfig) are the user’s responsibility. Each time you download the kubeconfig file from a Talos Linux cluster, the client certificate is regenerated giving you a kubeconfig which is valid for a year.

The talosconfig file should be renewed at least once a year, using the talosctl config new command.

Ceph host networking

For some reason the Ceph object gateway is not properly configured in the dashboard.

See this issue for similiar symptons

7.2 KiB Raw Permalink Blame History Unescape Escape