1
0
mirror of https://github.com/poseidon/typhoon synced 2025-01-10 04:48:03 +01:00
Commit Graph

992 Commits

Author SHA1 Message Date
Dalton Hubble
111b1206ba azure: Add enable_ipv6_load_balancing variable and default false
* Azure Load Balancers include 5 rules (3 LB rules, 2 outbound) whether used or not
* [#1468](https://github.com/poseidon/typhoon/pull/1468) added 3 LB rules to support IPv6 load balancing,
raising the rules count from 5 to 8 and added ~$21/mo to the cost of the load balancer. If you use an edge
(e.g. Cloudflare) a cluster does not need to load balance IPv6, so this additional cost can be avoided
* I noticed this because my load balancing costs were up for the last
few months. The gotcha is that outbound rules count toward the 5 rules
included with the base cost of the LB (~$18/mo)

Docs: https://azure.microsoft.com/en-us/pricing/details/load-balancer/
2024-12-30 16:22:41 -08:00
Dalton Hubble
1955b23819 Change flannel port from 4789 to 8472
* flannel and Cilium default to UDP 8472 for VXLAN traffic to
avoid conflicts with other VXLAN usage (e.g. Open vSwith)
* Aligning flannel and Cilium to use the same vxlan port makes
firewall rules or security policies simpler across clouds

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/403
2024-12-30 15:10:02 -08:00
Dalton Hubble
ec1d9bc415 Remove Calico BGP and IPIP firewall/security rules
* These rules are no longer needed since Calico is no longer
supported
2024-12-30 14:53:33 -08:00
Dalton Hubble
cc790bfc45 Fix Fedora CoreOS support for flannel CNI
* Explicitly load the `nf_conntrack` and `br_netfilter` kernel
modules that are needed for flannel CNI setups
* Specifically, flannel needs `br_netfilter` and kube-proxy (used
in flannel setups) needs `nf_conntrack`. Previously these kernel
modules were loaded by default but no longer seem to be
2024-12-29 20:31:00 -08:00
Dalton Hubble
8059eb9f0c Remove support for Calico CNI
* Cilium has been the default for about 3 years and is the defacto
standard CNI choice. flannel is supported as a simple alternative
* Remove various historical options that were needed that are
specific to Calico
2024-12-28 20:45:28 -08:00
Dalton Hubble
a8eae32b53 Configure Kubelets for parallel image pulls
* By default, Kubelet will pull container images one by one
(in series), which is mostly related to Docker-era bugs in
parallel image pulls. These days we use containerd so parallel
pulls should be fine
* Serial image pulls are undesirable because one slow registry
or image can cause other image pulls to wait. Parallel image
pulls ensure only large images / slow registries see that impact

Docs: https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
2024-12-27 20:03:18 -08:00
Dalton Hubble
e1072283c5 Update Kubernetes from v1.31.4 to v1.32.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.32.md#v1320
2024-12-20 17:00:20 -08:00
Dalton Hubble
cbedda4b28 Update Kubernets from v1.31.3 to v1.31.4
* Update flannel from v0.26.0 to v0.26.2
* Update Cilium from v1.16.4 to v1.16.5
2024-12-20 15:10:51 -08:00
Dalton Hubble
bc59d5153e Update Kubernetes from v1.31.2 to v1.31.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1313
* Update CoreDNS from v1.11.3 to v1.11.4
* Update Cilium from v1.16.3 to v1.16.4
* Plan to drop support for using Calico CNI, recommend everyone use the Cilium default
2024-11-24 08:43:54 -08:00
Dalton Hubble
61ffc0bc19
Update Kubernetes from v1.31.1 to v1.31.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1312
* Update Cilium from v1.16.1 to v1.16.3
* Update flannel from v0.25.6 to v0.26.0
2024-10-26 08:33:43 -07:00
Dalton Hubble
6a5b808b17
Add region to gcp instance template resource
* Configure the regional worker instance templates with the
region of the cluster. This defaults to the provider's region
which isn't always what you want and if left off causes an error
* Close #1512
2024-10-08 21:28:29 -07:00
Dalton Hubble
598f707cbd
Update Kubernetes from v1.31.0 to v1.31.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1311
2024-09-20 14:43:39 -07:00
Jordan Pittier
3f844e3c57
google: Add controller_disk_type and worker_disk_type variables (#1513)
* Add controller_disk_type and worker_disk_type variables
* Properly pass disk_type to worker nodes
2024-09-20 14:31:17 -07:00
Dalton Hubble
7d2d8e16e5
google: Use regional instance templates for workers
* Use regional instance templates for the worker node regional
managed instance groups. Regional instance templates are kept in
the associated region, whereas the older "global" instance templates
were kept in a particular region (regardless of where the MIG region)
so outages in a region X could affect clusters in a region Y which
is undesired
2024-08-27 21:35:02 -07:00
Dalton Hubble
3412060c3c
Use Cilium kube-proxy replacement when Cilium CNI is used
* When using the Cilium component, disable bootstrapping the
kube-proxy DaemonSet. Instead, configure Cilium to provide its
kube-proxy replacement with BPF
* Update the self-managed Cilium component to use kube-proxy
replacement as well
2024-08-23 12:33:32 -07:00
Dalton Hubble
808b8a948f
aws: Switch EC2 instances to use resource-based hostnames
* Use EC2 resource-based hostnames instead of IP-based hostnames. The Amazon
DNS server can resolve A and AAAA queries to IPv4 and IPv6 node addresses
* For example, nodes used to be named like `ip-10-11-12-13.us-east-1.compute.internal`
but going forward use the instance id `i-0123456789abcdef.us-east-1.compute.internal`
* Tag controller node EBS volumes with a name based on the controller node name
2024-08-22 20:02:53 -07:00
Dalton Hubble
effa13c141
Fix flannel-cni container image
* Close #1496
2024-08-22 19:26:19 -07:00
Dalton Hubble
10be34daa2
Update Kubernetes from v1.30.4 to v1.31.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.31.md#v1310
2024-08-17 08:32:35 -07:00
Dalton Hubble
320d76c934
Update Kubernetes from v1.30.3 to v1.30.4
* Update Cilium from v1.16.0 to v1.16.1
2024-08-16 08:27:07 -07:00
Dalton Hubble
2daa23be50
Update default Cilium and CoreDNS components
* Update the CoreDNS and Cilium versons used by default when
folks aren't managing the components themselves
2024-08-05 08:47:06 -07:00
Dalton Hubble
6e2daded02
Remove some seldom used variables and set reasonable
* Set reasonable values and remove some variable clutter
* enable_reporting is only used with Calico and we can just default
to false, I doubt anyone uses Calico and cares much about reporting
metrics to upstream Calico
2024-08-02 20:45:37 -07:00
Dalton Hubble
83f1bd2373
Update ARM64 cluster and hybrid cluster docs
* Typhoon now supports arbitrary combinations of controller, worker,
and worker pool architectures so we can drop the specific details of
full-cluster vs hybrid cluster. Just pick the architecture for each
group of nodes accordingly.
* However, if a custom node taint is set, continue to configure the
cluster's daemonsets accordingly with `daemonset_tolerations`
2024-08-02 20:34:23 -07:00
Dalton Hubble
0120b9f38d
Remove the cluster_domain_suffix variable
* Drop support for `cluster_domain_suffix` customization and
always use `cluster.local`. Many components in the Kubernetes
ecosystem assume this default suffix and its very rare to be
setting a special value here these days
* Cleanup a few variables that are seldom used
2024-08-02 15:05:25 -07:00
Dalton Hubble
af27661432 Configure controller and worker node architecture separately
* On platforms that support ARM64 instances, configure controller
and worker node host architectures separately
* For example, you can run arm64 controllers and amd64 workers
* Add `controller_arch` and `worker_arch` variables
* Remove `arch` variable
2024-08-02 15:04:57 -07:00
Dalton Hubble
516786d7bb
google: Configure controller and worker disk sizes
* Add `controller_disk_size` and `worker_disk_size` variables
* Remove `disk_size` variable
2024-08-02 13:07:41 -07:00
Dalton Hubble
1104b4bf28 AWS: Add CPU pricing mode and controller/worker disk variables
* Add `controller_disk_type`, `controller_disk_size`, and `controller_disk_iops`
variables
* Add `worker_disk_type`, `worker_disk_size`, and `worker_disk_iops` variables
and fix propagation to worker nodes
* Remove `disk_type`, `disk_size`, and `disk_iops` variables
* Add `controller_cpu_credits` and `worker_cpu_credits` variables to set CPU
pricing mode for burstable instance types
2024-07-31 15:02:28 -07:00
Dalton Hubble
0669d44026
Update Kubernetes from v1.30.2 to v1.30.3
* Update builtin Cilium manifests from v1.15.6 to v1.15.7
* Update builtin flannel manifests from v0.25.4 to v0.25.5
2024-07-20 11:04:32 -07:00
Dalton Hubble
0d10d180f8
Change worker node pools from uniform to flexible orchestration mode
* Use flexible orchestration mode. Azure has started to recommend this
mode because it allows interacting with VMSS instances like regular VMs
via the CLI or via the Azure Portal
* Add options to allow workers nodes to use ephemeral local disks
  * Add `controller_disk_type` and `controller_disk_size` variables
  * Add `worker_disk_type`, `worker_disk_size`, and `worker_ephemeral_disk` variables
2024-07-14 11:58:15 -07:00
Dalton Hubble
a4fab61066
Remove an IPv4 address from Azure clusters
* Consolidate load balancer frontend IPs to just the minimal IPv4
and IPv6 addresses that are needed per load balancer. apiserver and
ingress use separate ports, so there is not a true need for a separate
public IPv4 address just for apiserver
* Some might prefer a separate IP just because it slightly hides the
apiserver, but these are public hosted endpoints that can be discovered
* Reduce the cost of an Azure cluster since IPv4 public IPs are billed
($3.60/mo/cluster)
2024-07-10 22:29:43 -07:00
Dalton Hubble
24b7f31c55
Rename Azure cluster region variable to location
* Rename the region variable to location to align with Azure
platform conventions, where resources are created within an
Azure location, which are themselves part of broader geographical
regions
2024-07-09 07:56:58 -07:00
Dalton Hubble
48d4973957
Add IPv6 support for Typhoon Azure clusters
* Define a dual-stack virtual network with both IPv4 and IPv6 private
address space. Change `host_cidr` variable (string) to a `network_cidr`
variable (object) with "ipv4" and "ipv6" fields that list CIDR strings.
* Define dual-stack controller and worker subnets. Disable Azure
default outbound access (a deprecated fallback mechanism)
* Enable dual-stack load balancing to Kubernetes Ingress by adding
a public IPv6 frontend IP and LB rule to the load balancer.
* Enable worker outbound IPv6 connectivity through load balancer
SNAT by adding an IPv6 frontend IP and outbound rule
* Configure controller nodes with a public IPv6 address to provide
direct outbound IPv6 connectivity
* Add an IPv6 worker backend pool. Azure requires separate IPv4 and
IPv6 backend pools, though the health probe can be shared
* Extend network security group rules for IPv6 source/destinations

Checklist:

Access to controller and worker nodes via IPv6 addresses:

  * SSH access to controller nodes via public IPv6 address
  * SSH access to worker nodes via (private) IPv6 address (via
    controller)

Outbound IPv6 connectivity from controller and worker nodes:

```
nc -6 -zv ipv6.google.com 80
Ncat: Version 7.94 ( https://nmap.org/ncat )
Ncat: Connected to [2607:f8b0:4001:c16::66]:80.
Ncat: 0 bytes sent, 0 bytes received in 0.02 seconds.
```

Serve Ingress traffic via IPv4 or IPv6 just requires setting
up A and AAAA records and running the ingress controller with
`hostNetwork: true` since, hostPort only forwards IPv4 traffic
2024-07-09 07:55:00 -07:00
Dalton Hubble
7b8a51070f Add Terraform modules for CoreDNS, Cilium, and flannel
* With the new component system, these components can be managed
independent from the cluster and rolled or edited in advanced
ways
2024-05-19 17:00:10 -07:00
Dalton Hubble
533ace7011 Update Cilium from v1.15.4 to v1.15.5
* https://github.com/cilium/cilium/releases/tag/v1.15.5
2024-05-19 16:38:08 -07:00
Dalton Hubble
b3c384fbc0 Introduce the component system for managing pre-installed addons
* Previously: Typhoon provisions clusters with kube-system components
like CoreDNS, kube-proxy, and a chosen CNI provider (among flannel,
Calico, or Cilium) pre-installed. This is convenient since clusters
come with "batteries included". But it also means upgrading these
components is generally done in lock-step, by upgrading to a new
Typhoon / Kubernetes release
* It can be valuable to manage these components with a separate
plan/apply process or through automations and deploy systems. For
example, this allows managing CoreDNS separately from the cluster's
lifecycle.
* These "components" will continue to be pre-installed by default,
but a new `components` variable allows them to be disabled and
managed as "addons", components you apply after cluster creation
and manage on a rolling basis. For some of these, we may provide
Terraform modules to aide in managing these components.

```
module "cluster" {
  # defaults
  components = {
    enable = true
    coredns = {
      enable = true
    }
    kube_proxy = {
      enable = true
    }
    # Only the CNI set in var.networking will be installed
    flannel = {
      enable = true
    }
    calico = {
      enable = true
    }
    cilium = {
      enable = true
    }
  }
}
```

An earlier variable `install_container_networking = true/false` has
been removed, since it can now be achieved with this more extensible
and general components mechanism by setting the chosen networking
provider enable field to false.
2024-05-19 16:33:57 -07:00
Dalton Hubble
563feacd29 Update Kubernetes from v1.30.0 to v1.30.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1301
2024-05-15 21:59:00 -07:00
Dalton Hubble
3f34e047f1 azure: Add controller security group and subnet outputs
* Output the network security group name and address prefixes
for controller nodes, to allow adding custom network security
rules that apply specifically to controller nodes
2024-05-14 21:34:31 -07:00
Dalton Hubble
cc80ec9b98 Add firewall and security rules for Cilium/Hubble metrics
* Add firewall or security riles to allow node-to-node traffic
on ports 9962-9965 for Cilium and Hubble metrics. Cilium runs
with host network, so these require cloud firewall changes
2024-05-13 21:27:38 -07:00
Dalton Hubble
d08cd317d9 Allow CoreDNS and kube-proxy to be optional components
* Allow for more minimal base cluster setups, that manage CoreDNS or
kube-proxy as applications, with rolling updates, or deploy systems.
Or in the case of kube-proxy, its becoming more common to not install
it and instead use Cilium
* Add a `components` pass-through variable to configure pre-installed
components like kube-proxy and CoreDNS. These components can be
disabled (individually or together) to allow for managing components
with separate plan/apply processes or automations
* terraform-render-bootstrap manifest assets are now structured as
manifests/{coredns,kube-proxy,network} so adapt the controller
layout scripts accordingly
* This is similar to some changes in v1.29.2 that allowed for the
container networking provider manifests to be skipped

Related: https://github.com/poseidon/typhoon/pull/1419, https://github.com/poseidon/typhoon/pull/1421
2024-05-12 21:20:27 -07:00
Dalton Hubble
78d5100181 Update Cilium and flannel container images
* Update Cilium from v1.15.3 to v1.25.4
* Update flannel from v0.24.4 to v0.25.1
2024-05-12 08:27:27 -07:00
Dalton Hubble
6ac5a0222b
Update Kubernetes from v1.29.3 to v1.30.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.30.md#v1300
2024-04-23 20:51:54 -07:00
Dalton Hubble
cafcdbc3e7
Update etcd from v3.5.12 to v3.5.13 and bump Calico/Cilium
* Update Cilium from v1.15.2 to v1.15.3
* Update Calico from v3.27.2 to v3.27.3
2024-04-03 22:51:07 -07:00
Dalton Hubble
fbe36b8b16 Update Cilium and flannel container image versions
* https://github.com/cilium/cilium/releases/tag/v1.15.2
* https://github.com/flannel-io/flannel/releases/tag/v0.24.4
2024-03-22 11:19:49 -07:00
Dalton Hubble
41907a0ba6 Update Calico from v3.26.3 to v3.27.2
* Update fixes Calico incompatibility with Fedora CoreOS

Rel: https://github.com/projectcalico/calico/issues/8372
2024-02-25 12:11:56 -08:00
Dalton Hubble
2325a503e1 Add an install_container_networking variable (default true)
* When `true`, the chosen container `networking` provider is installed during cluster bootstrap
* Set `false` to self-manage the container networking provider. This allows flannel, Calico, or Cilium
to be managed via Terraform (like any other Kubernetes resources). Nodes will be NotReady until you
apply the self-managed container networking provider. This may become the default in future.
2024-02-24 18:49:38 -08:00
Dalton Hubble
7a46eb03ae Update Cilium from v1.14.3 to v1.15.1
* https://github.com/cilium/cilium/releases/tag/v1.15.1
2024-02-23 22:59:31 -08:00
Dalton Hubble
0e7977694f Allow CNI networking to be set to none
* Set CNI networking to "none" to skip installing any CNI provider
(i.e. no flannel, Calico, or Cilium). In this mode, cluster nodes
will be NotReady until you add your own CNI stack
* Motivation: I now tend to manage CNI components as addon modules
just like other applications overlaid onto a cluster. It allows for
faster iteration and may eventually become the recommendation
2024-02-23 22:57:47 -08:00
Dalton Hubble
f2f625984e Update Kubernetes from v1.29.1 to v1.29.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1292
2024-02-18 18:31:31 -08:00
Dalton Hubble
e247673a20 Update Kubernetes from v1.29.0 to v1.29.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md#v1291
2024-02-04 10:47:42 -08:00
Dalton Hubble
84e4f02917 Update Kubernetes from v1.28.4 to v1.29.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.29.md
2023-12-22 10:27:24 -08:00
Dalton Hubble
0d997def31 Add release note for v1.28.4 2023-12-10 21:02:21 -08:00