typhoon

mirror of https://github.com/poseidon/typhoon synced 2025-01-10 04:48:03 +01:00

Author	SHA1	Message	Date
Dalton Hubble	111b1206ba	azure: Add `enable_ipv6_load_balancing` variable and default false * Azure Load Balancers include 5 rules (3 LB rules, 2 outbound) whether used or not * [#1468](https://github.com/poseidon/typhoon/pull/1468) added 3 LB rules to support IPv6 load balancing, raising the rules count from 5 to 8 and added ~$21/mo to the cost of the load balancer. If you use an edge (e.g. Cloudflare) a cluster does not need to load balance IPv6, so this additional cost can be avoided * I noticed this because my load balancing costs were up for the last few months. The gotcha is that outbound rules count toward the 5 rules included with the base cost of the LB (~$18/mo) Docs: https://azure.microsoft.com/en-us/pricing/details/load-balancer/	2024-12-30 16:22:41 -08:00
Dalton Hubble	1955b23819	Change flannel port from 4789 to 8472 * flannel and Cilium default to UDP 8472 for VXLAN traffic to avoid conflicts with other VXLAN usage (e.g. Open vSwith) * Aligning flannel and Cilium to use the same vxlan port makes firewall rules or security policies simpler across clouds Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/403	2024-12-30 15:10:02 -08:00
Dalton Hubble	ec1d9bc415	Remove Calico BGP and IPIP firewall/security rules * These rules are no longer needed since Calico is no longer supported	2024-12-30 14:53:33 -08:00
Dalton Hubble	cc790bfc45	Fix Fedora CoreOS support for flannel CNI * Explicitly load the `nf_conntrack` and `br_netfilter` kernel modules that are needed for flannel CNI setups * Specifically, flannel needs `br_netfilter` and kube-proxy (used in flannel setups) needs `nf_conntrack`. Previously these kernel modules were loaded by default but no longer seem to be	2024-12-29 20:31:00 -08:00
Dalton Hubble	8059eb9f0c	Remove support for Calico CNI * Cilium has been the default for about 3 years and is the defacto standard CNI choice. flannel is supported as a simple alternative * Remove various historical options that were needed that are specific to Calico	2024-12-28 20:45:28 -08:00
Dalton Hubble	a8eae32b53	Configure Kubelets for parallel image pulls * By default, Kubelet will pull container images one by one (in series), which is mostly related to Docker-era bugs in parallel image pulls. These days we use containerd so parallel pulls should be fine * Serial image pulls are undesirable because one slow registry or image can cause other image pulls to wait. Parallel image pulls ensure only large images / slow registries see that impact Docs: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/	2024-12-27 20:03:18 -08:00
Dalton Hubble	e1072283c5	Update Kubernetes from v1.31.4 to v1.32.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md#v1320	2024-12-20 17:00:20 -08:00
Dalton Hubble	cbedda4b28	Update Kubernets from v1.31.3 to v1.31.4 * Update flannel from v0.26.0 to v0.26.2 * Update Cilium from v1.16.4 to v1.16.5	2024-12-20 15:10:51 -08:00
Dalton Hubble	bc59d5153e	Update Kubernetes from v1.31.2 to v1.31.3 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1313 * Update CoreDNS from v1.11.3 to v1.11.4 * Update Cilium from v1.16.3 to v1.16.4 * Plan to drop support for using Calico CNI, recommend everyone use the Cilium default	2024-11-24 08:43:54 -08:00
Dalton Hubble	61ffc0bc19	Update Kubernetes from v1.31.1 to v1.31.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1312 * Update Cilium from v1.16.1 to v1.16.3 * Update flannel from v0.25.6 to v0.26.0	2024-10-26 08:33:43 -07:00
Dalton Hubble	6a5b808b17	Add region to gcp instance template resource * Configure the regional worker instance templates with the region of the cluster. This defaults to the provider's region which isn't always what you want and if left off causes an error * Close #1512	2024-10-08 21:28:29 -07:00
Dalton Hubble	598f707cbd	Update Kubernetes from v1.31.0 to v1.31.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1311	2024-09-20 14:43:39 -07:00
Jordan Pittier	3f844e3c57	google: Add controller_disk_type and worker_disk_type variables (#1513 ) * Add controller_disk_type and worker_disk_type variables * Properly pass disk_type to worker nodes	2024-09-20 14:31:17 -07:00
Dalton Hubble	7d2d8e16e5	google: Use regional instance templates for workers * Use regional instance templates for the worker node regional managed instance groups. Regional instance templates are kept in the associated region, whereas the older "global" instance templates were kept in a particular region (regardless of where the MIG region) so outages in a region X could affect clusters in a region Y which is undesired	2024-08-27 21:35:02 -07:00
Dalton Hubble	3412060c3c	Use Cilium kube-proxy replacement when Cilium CNI is used * When using the Cilium component, disable bootstrapping the kube-proxy DaemonSet. Instead, configure Cilium to provide its kube-proxy replacement with BPF * Update the self-managed Cilium component to use kube-proxy replacement as well	2024-08-23 12:33:32 -07:00
Dalton Hubble	808b8a948f	aws: Switch EC2 instances to use resource-based hostnames * Use EC2 resource-based hostnames instead of IP-based hostnames. The Amazon DNS server can resolve A and AAAA queries to IPv4 and IPv6 node addresses * For example, nodes used to be named like `ip-10-11-12-13.us-east-1.compute.internal` but going forward use the instance id `i-0123456789abcdef.us-east-1.compute.internal` * Tag controller node EBS volumes with a name based on the controller node name	2024-08-22 20:02:53 -07:00
Dalton Hubble	effa13c141	Fix flannel-cni container image * Close #1496	2024-08-22 19:26:19 -07:00
Dalton Hubble	10be34daa2	Update Kubernetes from v1.30.4 to v1.31.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1310	2024-08-17 08:32:35 -07:00
Dalton Hubble	320d76c934	Update Kubernetes from v1.30.3 to v1.30.4 * Update Cilium from v1.16.0 to v1.16.1	2024-08-16 08:27:07 -07:00
Dalton Hubble	2daa23be50	Update default Cilium and CoreDNS components * Update the CoreDNS and Cilium versons used by default when folks aren't managing the components themselves	2024-08-05 08:47:06 -07:00
Dalton Hubble	6e2daded02	Remove some seldom used variables and set reasonable * Set reasonable values and remove some variable clutter * enable_reporting is only used with Calico and we can just default to false, I doubt anyone uses Calico and cares much about reporting metrics to upstream Calico	2024-08-02 20:45:37 -07:00
Dalton Hubble	83f1bd2373	Update ARM64 cluster and hybrid cluster docs * Typhoon now supports arbitrary combinations of controller, worker, and worker pool architectures so we can drop the specific details of full-cluster vs hybrid cluster. Just pick the architecture for each group of nodes accordingly. * However, if a custom node taint is set, continue to configure the cluster's daemonsets accordingly with `daemonset_tolerations`	2024-08-02 20:34:23 -07:00
Dalton Hubble	0120b9f38d	Remove the cluster_domain_suffix variable * Drop support for `cluster_domain_suffix` customization and always use `cluster.local`. Many components in the Kubernetes ecosystem assume this default suffix and its very rare to be setting a special value here these days * Cleanup a few variables that are seldom used	2024-08-02 15:05:25 -07:00
Dalton Hubble	af27661432	Configure controller and worker node architecture separately * On platforms that support ARM64 instances, configure controller and worker node host architectures separately * For example, you can run arm64 controllers and amd64 workers * Add `controller_arch` and `worker_arch` variables * Remove `arch` variable	2024-08-02 15:04:57 -07:00
Dalton Hubble	516786d7bb	google: Configure controller and worker disk sizes * Add `controller_disk_size` and `worker_disk_size` variables * Remove `disk_size` variable	2024-08-02 13:07:41 -07:00
Dalton Hubble	1104b4bf28	AWS: Add CPU pricing mode and controller/worker disk variables * Add `controller_disk_type`, `controller_disk_size`, and `controller_disk_iops` variables * Add `worker_disk_type`, `worker_disk_size`, and `worker_disk_iops` variables and fix propagation to worker nodes * Remove `disk_type`, `disk_size`, and `disk_iops` variables * Add `controller_cpu_credits` and `worker_cpu_credits` variables to set CPU pricing mode for burstable instance types	2024-07-31 15:02:28 -07:00
Dalton Hubble	0669d44026	Update Kubernetes from v1.30.2 to v1.30.3 * Update builtin Cilium manifests from v1.15.6 to v1.15.7 * Update builtin flannel manifests from v0.25.4 to v0.25.5	2024-07-20 11:04:32 -07:00
Dalton Hubble	0d10d180f8	Change worker node pools from uniform to flexible orchestration mode * Use flexible orchestration mode. Azure has started to recommend this mode because it allows interacting with VMSS instances like regular VMs via the CLI or via the Azure Portal * Add options to allow workers nodes to use ephemeral local disks * Add `controller_disk_type` and `controller_disk_size` variables * Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables	2024-07-14 11:58:15 -07:00
Dalton Hubble	a4fab61066	Remove an IPv4 address from Azure clusters * Consolidate load balancer frontend IPs to just the minimal IPv4 and IPv6 addresses that are needed per load balancer. apiserver and ingress use separate ports, so there is not a true need for a separate public IPv4 address just for apiserver * Some might prefer a separate IP just because it slightly hides the apiserver, but these are public hosted endpoints that can be discovered * Reduce the cost of an Azure cluster since IPv4 public IPs are billed ($3.60/mo/cluster)	2024-07-10 22:29:43 -07:00
Dalton Hubble	24b7f31c55	Rename Azure cluster region variable to location * Rename the region variable to location to align with Azure platform conventions, where resources are created within an Azure location, which are themselves part of broader geographical regions	2024-07-09 07:56:58 -07:00
Dalton Hubble	48d4973957	Add IPv6 support for Typhoon Azure clusters * Define a dual-stack virtual network with both IPv4 and IPv6 private address space. Change `host_cidr` variable (string) to a `network_cidr` variable (object) with "ipv4" and "ipv6" fields that list CIDR strings. * Define dual-stack controller and worker subnets. Disable Azure default outbound access (a deprecated fallback mechanism) * Enable dual-stack load balancing to Kubernetes Ingress by adding a public IPv6 frontend IP and LB rule to the load balancer. * Enable worker outbound IPv6 connectivity through load balancer SNAT by adding an IPv6 frontend IP and outbound rule * Configure controller nodes with a public IPv6 address to provide direct outbound IPv6 connectivity * Add an IPv6 worker backend pool. Azure requires separate IPv4 and IPv6 backend pools, though the health probe can be shared * Extend network security group rules for IPv6 source/destinations Checklist: Access to controller and worker nodes via IPv6 addresses: * SSH access to controller nodes via public IPv6 address * SSH access to worker nodes via (private) IPv6 address (via controller) Outbound IPv6 connectivity from controller and worker nodes: ``` nc -6 -zv ipv6.google.com 80 Ncat: Version 7.94 ( https://nmap.org/ncat ) Ncat: Connected to [2607:f8b0:4001:c16::66]:80. Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds. ``` Serve Ingress traffic via IPv4 or IPv6 just requires setting up A and AAAA records and running the ingress controller with `hostNetwork: true` since, hostPort only forwards IPv4 traffic	2024-07-09 07:55:00 -07:00
Dalton Hubble	7b8a51070f	Add Terraform modules for CoreDNS, Cilium, and flannel * With the new component system, these components can be managed independent from the cluster and rolled or edited in advanced ways	2024-05-19 17:00:10 -07:00
Dalton Hubble	533ace7011	Update Cilium from v1.15.4 to v1.15.5 * https://github.com/cilium/cilium/releases/tag/v1.15.5	2024-05-19 16:38:08 -07:00
Dalton Hubble	b3c384fbc0	Introduce the component system for managing pre-installed addons * Previously: Typhoon provisions clusters with kube-system components like CoreDNS, kube-proxy, and a chosen CNI provider (among flannel, Calico, or Cilium) pre-installed. This is convenient since clusters come with "batteries included". But it also means upgrading these components is generally done in lock-step, by upgrading to a new Typhoon / Kubernetes release * It can be valuable to manage these components with a separate plan/apply process or through automations and deploy systems. For example, this allows managing CoreDNS separately from the cluster's lifecycle. * These "components" will continue to be pre-installed by default, but a new `components` variable allows them to be disabled and managed as "addons", components you apply after cluster creation and manage on a rolling basis. For some of these, we may provide Terraform modules to aide in managing these components. ``` module "cluster" { # defaults components = { enable = true coredns = { enable = true } kube_proxy = { enable = true } # Only the CNI set in var.networking will be installed flannel = { enable = true } calico = { enable = true } cilium = { enable = true } } } ``` An earlier variable `install_container_networking = true/false` has been removed, since it can now be achieved with this more extensible and general components mechanism by setting the chosen networking provider enable field to false.	2024-05-19 16:33:57 -07:00
Dalton Hubble	563feacd29	Update Kubernetes from v1.30.0 to v1.30.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301	2024-05-15 21:59:00 -07:00
Dalton Hubble	3f34e047f1	azure: Add controller security group and subnet outputs * Output the network security group name and address prefixes for controller nodes, to allow adding custom network security rules that apply specifically to controller nodes	2024-05-14 21:34:31 -07:00
Dalton Hubble	cc80ec9b98	Add firewall and security rules for Cilium/Hubble metrics * Add firewall or security riles to allow node-to-node traffic on ports 9962-9965 for Cilium and Hubble metrics. Cilium runs with host network, so these require cloud firewall changes	2024-05-13 21:27:38 -07:00
Dalton Hubble	d08cd317d9	Allow CoreDNS and kube-proxy to be optional components * Allow for more minimal base cluster setups, that manage CoreDNS or kube-proxy as applications, with rolling updates, or deploy systems. Or in the case of kube-proxy, its becoming more common to not install it and instead use Cilium * Add a `components` pass-through variable to configure pre-installed components like kube-proxy and CoreDNS. These components can be disabled (individually or together) to allow for managing components with separate plan/apply processes or automations * terraform-render-bootstrap manifest assets are now structured as manifests/{coredns,kube-proxy,network} so adapt the controller layout scripts accordingly * This is similar to some changes in v1.29.2 that allowed for the container networking provider manifests to be skipped Related: https://github.com/poseidon/typhoon/pull/1419, https://github.com/poseidon/typhoon/pull/1421	2024-05-12 21:20:27 -07:00
Dalton Hubble	78d5100181	Update Cilium and flannel container images * Update Cilium from v1.15.3 to v1.25.4 * Update flannel from v0.24.4 to v0.25.1	2024-05-12 08:27:27 -07:00
Dalton Hubble	6ac5a0222b	Update Kubernetes from v1.29.3 to v1.30.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1300	2024-04-23 20:51:54 -07:00
Dalton Hubble	cafcdbc3e7	Update etcd from v3.5.12 to v3.5.13 and bump Calico/Cilium * Update Cilium from v1.15.2 to v1.15.3 * Update Calico from v3.27.2 to v3.27.3	2024-04-03 22:51:07 -07:00
Dalton Hubble	fbe36b8b16	Update Cilium and flannel container image versions * https://github.com/cilium/cilium/releases/tag/v1.15.2 * https://github.com/flannel-io/flannel/releases/tag/v0.24.4	2024-03-22 11:19:49 -07:00
Dalton Hubble	41907a0ba6	Update Calico from v3.26.3 to v3.27.2 * Update fixes Calico incompatibility with Fedora CoreOS Rel: https://github.com/projectcalico/calico/issues/8372	2024-02-25 12:11:56 -08:00
Dalton Hubble	2325a503e1	Add an `install_container_networking` variable (default `true`) * When `true`, the chosen container `networking` provider is installed during cluster bootstrap * Set `false` to self-manage the container networking provider. This allows flannel, Calico, or Cilium to be managed via Terraform (like any other Kubernetes resources). Nodes will be NotReady until you apply the self-managed container networking provider. This may become the default in future.	2024-02-24 18:49:38 -08:00
Dalton Hubble	7a46eb03ae	Update Cilium from v1.14.3 to v1.15.1 * https://github.com/cilium/cilium/releases/tag/v1.15.1	2024-02-23 22:59:31 -08:00
Dalton Hubble	0e7977694f	Allow CNI networking to be set to none * Set CNI networking to "none" to skip installing any CNI provider (i.e. no flannel, Calico, or Cilium). In this mode, cluster nodes will be NotReady until you add your own CNI stack * Motivation: I now tend to manage CNI components as addon modules just like other applications overlaid onto a cluster. It allows for faster iteration and may eventually become the recommendation	2024-02-23 22:57:47 -08:00
Dalton Hubble	f2f625984e	Update Kubernetes from v1.29.1 to v1.29.2 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1292	2024-02-18 18:31:31 -08:00
Dalton Hubble	e247673a20	Update Kubernetes from v1.29.0 to v1.29.1 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1291	2024-02-04 10:47:42 -08:00
Dalton Hubble	84e4f02917	Update Kubernetes from v1.28.4 to v1.29.0 * https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md	2023-12-22 10:27:24 -08:00
Dalton Hubble	0d997def31	Add release note for v1.28.4	2023-12-10 21:02:21 -08:00

1 2 3 4 5 ...

992 Commits