* Using spot instances, when an instance is deleted it actually
lowers the desired number of nodes in the VMSS so the node is
not replaced
* Restore the auto-scale setting needed to maintain a consistent
desired number of workers while spot instances come and go. This
was mistakely removed in refactoring
* Azure Load Balancers include 5 rules (3 LB rules, 2 outbound) whether used or not
* [#1468](https://github.com/poseidon/typhoon/pull/1468) added 3 LB rules to support IPv6 load balancing,
raising the rules count from 5 to 8 and added ~$21/mo to the cost of the load balancer. If you use an edge
(e.g. Cloudflare) a cluster does not need to load balance IPv6, so this additional cost can be avoided
* I noticed this because my load balancing costs were up for the last
few months. The gotcha is that outbound rules count toward the 5 rules
included with the base cost of the LB (~$18/mo)
Docs: https://azure.microsoft.com/en-us/pricing/details/load-balancer/
* flannel and Cilium default to UDP 8472 for VXLAN traffic to
avoid conflicts with other VXLAN usage (e.g. Open vSwith)
* Aligning flannel and Cilium to use the same vxlan port makes
firewall rules or security policies simpler across clouds
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/403
* Explicitly load the `nf_conntrack` and `br_netfilter` kernel
modules that are needed for flannel CNI setups
* Specifically, flannel needs `br_netfilter` and kube-proxy (used
in flannel setups) needs `nf_conntrack`. Previously these kernel
modules were loaded by default but no longer seem to be
* Cilium has been the default for about 3 years and is the defacto
standard CNI choice. flannel is supported as a simple alternative
* Remove various historical options that were needed that are
specific to Calico
* By default, Kubelet will pull container images one by one
(in series), which is mostly related to Docker-era bugs in
parallel image pulls. These days we use containerd so parallel
pulls should be fine
* Serial image pulls are undesirable because one slow registry
or image can cause other image pulls to wait. Parallel image
pulls ensure only large images / slow registries see that impact
Docs: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
* Change the default Pod CIDR from 10.2.0.0/16 to 10.20.0.0/14
(10.20.0.0 - 10.23.255.255) to support 1024 nodes by default
* Most CNI providers divide the Pod CIDR so that each node has
a /24 to allocate to local pods (256). The previous `10.2.0.0/16`
default only fits 256 /24's so 256 nodes were supported without
customizing the pod_cidr
* Configure the regional worker instance templates with the
region of the cluster. This defaults to the provider's region
which isn't always what you want and if left off causes an error
* Close #1512