* Fedora CoreOS is beginning to switch from cgroups v1 to
cgroups v2 by default, which changes the sysfs hierarchy
* This will be needed when using a Fedora Coreos OS image
that enables cgroups v2 (`next` stream as of this writing)
Rel: https://github.com/coreos/fedora-coreos-tracker/issues/292
* Remove Kubelet `/etc/iscsi` and `iscsiadm` host mounts that
were added on bare-metal, since these no longer work on either
Fedora CoreOS or Flatcar Linux with newer `iscsiadm`
* These special mounts on bare-metal date back to #350 which
added them to provide a way to use iSCSI in Kubernetes v1.10
* Today, storage should be handled by external CSI providers
which handle different storage systems, which doesn't rely
on Kubelet storage utils
Close #907
* Change control plane static pods to mount `/etc/kubernetes/pki`,
instead of `/etc/kubernetes/bootstrap-secrets` to better reflect
their purpose and match some loose conventions upstream
* Place control plane and bootstrap TLS assets and kubeconfig's
in `/etc/kubernetes/pki`
* Mount to `/etc/kubernetes/pki` (rather than `/etc/kubernetes/secrets`)
to match the host location (less surprise)
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/233
* Generate TLS client certificates for `kube-scheduler` and
`kube-controller-manager` with `system:kube-scheduler` and
`system:kube-controller-manager` CNs
* Template separate kubeconfigs for kube-scheduler and
kube-controller manager (`scheduler.conf` and
`controller-manager.conf`). Rename admin for clarity
* Before v1.16.0, Typhoon scheduled a self-hosted control
plane, which allowed the steady-state kube-scheduler and
kube-controller-manager to use a scoped ServiceAccount.
With a static pod control plane, separate CN TLS client
certificates are the nearest equiv.
* https://kubernetes.io/docs/setup/best-practices/certificates/
* Remove unused Kubelet certificate, TLS bootstrap is used
instead
* Allow a snippet with a systemd dropin to set an alternate
image via `ETCD_IMAGE`, for consistency across Fedora CoreOS
and Flatcar Linux
* Drop comments about integrating system containers with
systemd-notify
* Fix race condition for bootstrap-secrets SELinux context on non-bootstrap controllers in multi-controller FCOS clusters
* On first boot from disk on non-bootstrap controllers, adding bootstrap-secrets races with kubelet.service starting, which can cause the secrets assets to have the wrong label until kubelet.service restarts (service, reboot, auto-update)
* This can manifest as `kube-apiserver`, `kube-controller-manager`, and `kube-scheduler` pods crashlooping on spare controllers on first cluster creation
* Fedora CoreOS now ships systemd-udev's `default.link` while
Flannel relies on being able to pick its own MAC address for
the `flannel.1` link for tunneled traffic to reach cni0 on
the destination side, without being dropped
* This change first appeared in FCOS testing-devel 32.20200624.20.1
and is the behavior going forward in FCOS since it was added
to align FCOS network naming / configs with the rest of Fedora
and address issues related to the default being missing
* Flatcar Linux (and Container Linux) has a specific flannel.link
configuration builtin, so it was not affected
* https://github.com/coreos/fedora-coreos-tracker/issues/574#issuecomment-665487296
Note: Typhoon's recommended and default CNI provider is Calico,
unless `networking` is set to flannel directly.
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
* IPAM: Divide pod_cidr into /24 subnets per node
* CNI networking pod-to-pod, pod-to-external
* BPF masquerade
* NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
* Require UDP 8472 for vxlan (Linux kernel default) between nodes
* Optional ICMP echo(8) between nodes for host reachability
(health)
* Optional TCP 4240 between nodes for endpoint reachability (health)
Known Issues:
* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256
Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice
Related:
* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
* Kubelet `--lock-file` and `--exit-on-lock-contention` date
back to usage of bootkube and at one point running Kubelet
in a "self-hosted" style whereby an on-host Kubelet (rkt)
started pods, but then a Kubelet DaemonSet was scheduled
and able to take over (hence self-hosted). `lock-file` and
`exit-on-lock-contention` flags supported this pivot. The
pattern has been out of favor (in bootkube too) for years
because of dueling Kubelet complexity
* Typhoon runs Kubelet as a container via an on-host systemd
unit using podman (Fedora CoreOS) or rkt (Flatcar Linux). In
fact, Typhoon no longer uses bootkube or control plane pivot
(let alone Kubelet pivot) and uses static pods since v1.16.0
* https://github.com/poseidon/typhoon/pull/536
* Build Kubelet container images internally and publish
to Quay and Dockerhub (new) as an alternative in case of
registry outage or breach
* Use our infra to provide single and multi-arch (default)
Kublet images for possible future use
* Docs: Show how to use alternative Kubelet images via
snippets and a systemd dropin (builds on #737)
Changes:
* Update docs with changes to Kubelet image building
* If you prefer to trust images built by Quay/Dockerhub,
automated image builds are still available with unique
tags (albeit with some limitations):
* Quay automated builds are tagged `build-{short_sha}`
(limit: only amd64)
* Dockerhub automated builts are tagged `build-{tag}`
and `build-master` (limit: only amd64, no shas)
Links:
* Kubelet: https://github.com/poseidon/kubelet
* Docs: https://typhoon.psdn.io/topics/security/#container-images
* Registries:
* quay.io/poseidon/kubelet
* docker.io/psdn/kubelet
* Write the systemd kubelet.service to use `KUBELET_IMAGE`
as the Kubelet. This provides a nice way to use systemd
dropins to temporarily override the image (e.g. during a
registry outage)
Note: Only Typhoon Kubelet images and registries are supported.