typhoon/search/search_index.json


			
				
				
					
						
						
						
							
							
							{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Typhoon","text":"<p>Typhoon is a minimal and free Kubernetes distribution.</p> <ul> <li>Minimal, stable base Kubernetes distribution</li> <li>Declarative infrastructure and configuration</li> <li>Free (freedom and cost) and privacy-respecting</li> <li>Practical for labs, datacenters, and clouds</li> </ul> <p>Typhoon distributes upstream Kubernetes, architectural conventions, and cluster addons, much like a GNU/Linux distribution provides the Linux kernel and userspace components.</p>"},{"location":"#features","title":"Features","text":"<ul> <li>Kubernetes v1.29.3 (upstream)</li> <li>Single or multi-master, Calico or Cilium or flannel networking</li> <li>On-cluster etcd with TLS, RBAC-enabled, network policy, SELinux enforcing</li> <li>Advanced features like worker pools, preemptible workers, and snippets customization</li> <li>Ready for Ingress, Prometheus, Grafana, CSI, or other addons</li> </ul>"},{"location":"#modules","title":"Modules","text":"<p>Typhoon provides a Terraform Module for each supported operating system and platform.</p> <p>Typhoon is available for Fedora CoreOS.</p> Platform Operating System Terraform Module Status AWS Fedora CoreOS aws/fedora-coreos/kubernetes stable Azure Fedora CoreOS azure/fedora-coreos/kubernetes alpha Bare-Metal Fedora CoreOS bare-metal/fedora-coreos/kubernetes stable DigitalOcean Fedora CoreOS digital-ocean/fedora-coreos/kubernetes beta Google Cloud Fedora CoreOS google-cloud/fedora-coreos/kubernetes stable Platform Operating System Terraform Module Status AWS Fedora CoreOS (ARM64) aws/fedora-coreos/kubernetes alpha <p>Typhoon is available for Flatcar Linux.</p> Platform Operating System Terraform Module Status AWS Flatcar Linux aws/flatcar-linux/kubernetes stable Azure Flatcar Linux azure/flatcar-linux/kubernetes alpha Bare-Metal Flatcar Linux bare-metal/flatcar-linux/kubernetes stable DigitalOcean Flatcar Linux digital-ocean/flatcar-linux/kubernetes beta Google Cloud Flatcar Linux google-cloud/flatcar-linux/kubernetes stable Platform Operating System Terraform Module Status AWS Flatcar Linux (ARM64) aws/flatcar-linux/kubernetes alpha Azure Flatcar Linux (ARM64) azure/flatcar-linux/kubernetes alpha"},{"location":"#documentation","title":"Documentation","text":"<ul> <li>Architecture concepts and operating-systems</li> <li>Fedora CoreOS tutorials for AWS, Azure, Bare-Metal, DigitalOcean, and Google Cloud</li> <li>Flatcar Linux tutorials for AWS, Azure, Bare-Metal, DigitalOcean, and Google Cloud</li> </ul>"},{"location":"#example","title":"Example","text":"<p>Define a Kubernetes cluster by using the Terraform module for your chosen platform and operating system. Here's a minimal example.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  region        = \"us-central1\"\n  dns_zone      = \"example.com\"\n  dns_zone_name = \"example-zone\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  worker_count = 2\n}\n\n# Obtain cluster kubeconfig\nresource \"local_file\" \"kubeconfig-yavin\" {\n  content  = module.yavin.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/yavin-config\"\n}\n</code></pre> <p>Initialize modules, plan the changes to be made, and apply the changes.</p> <pre><code>$ terraform init\n$ terraform plan\nPlan: 62 to add, 0 to change, 0 to destroy.\n$ terraform apply\nApply complete! Resources: 62 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes (varies by platform), the cluster will be ready. This Google Cloud example creates a <code>yavin.example.com</code> DNS record to resolve to a network load balancer across controller nodes.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/yavin-config\n$ kubectl get nodes\nNAME                                       ROLES    STATUS  AGE  VERSION\nyavin-controller-0.c.example-com.internal  &lt;none&gt;   Ready   6m   v1.29.3\nyavin-worker-jrbf.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\nyavin-worker-mzdm.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                      READY  STATUS    RESTARTS  AGE\nkube-system   calico-node-1cs8z                         2/2    Running   0         6m\nkube-system   calico-node-d1l5b                         2/2    Running   0         6m\nkube-system   calico-node-sp9ps                         2/2    Running   0         6m\nkube-system   coredns-1187388186-dkh3o                  1/1    Running   0         6m\nkube-system   coredns-1187388186-zj5dl                  1/1    Running   0         6m\nkube-system   kube-apiserver-controller-0               1/1    Running   0         6m\nkube-system   kube-controller-manager-controller-0      1/1    Running   0         6m\nkube-system   kube-proxy-117v6                          1/1    Running   0         6m\nkube-system   kube-proxy-9886n                          1/1    Running   0         6m\nkube-system   kube-proxy-njn47                          1/1    Running   0         6m\nkube-system   kube-scheduler-controller-0               1/1    Running   0         6m\n</code></pre>"},{"location":"#help","title":"Help","text":"<p>Schedule a meeting via Github Sponsors to discuss your use case.</p>"},{"location":"#motivation","title":"Motivation","text":"<p>Typhoon powers the author's cloud and colocation clusters. The project has evolved through operational experience and Kubernetes changes. Typhoon is shared under a free license to allow others to use the work freely and contribute to its upkeep.</p> <p>Typhoon addresses real world needs, which you may share. It is honest about limitations or areas that aren't mature yet. It avoids buzzword bingo and hype. It does not aim to be the one-solution-fits-all distro. An ecosystem of Kubernetes distributions is healthy.</p>"},{"location":"#social-contract","title":"Social Contract","text":"<p>Typhoon is not a product, trial, or free-tier. Typhoon does not offer support, services, or charge money. And Typhoon is independent of operating system or platform vendors.</p> <p>Typhoon clusters will contain only free components. Cluster components will not collect data on users without their permission.</p>"},{"location":"#sponsors","title":"Sponsors","text":"<p>Poseidon's Github Sponsors support the infrastructure and operational costs of providing Typhoon.</p> <p> </p> <p>If you'd like your company here, please contact dghubble at psdn.io.</p>"},{"location":"announce/","title":"Announce","text":""},{"location":"announce/#jan-23-2020","title":"Jan 23, 2020","text":"<p>Typhoon for Fedora CoreOS promoted to alpha!</p> <p>Last summer, Typhoon released the first preview of Kubernetes on Fedora CoreOS for bare-metal and AWS, developing many ideas and patterns from Typhoon for Container Linux and Fedora Atomic. Since then, Typhoon for Fedora CoreOS has evolved and gained features alongside Typhoon, while Fedora CoreOS itself has evolved and improved too.</p> <p>Fedora recently announced that Fedora CoreOS is available for general use. To align with that change and to better indicate the maturing status, Typhoon for Fedora CoreOS has been promoted to alpha. Many thanks to folks who have worked to make this possbile!</p> <p>About: For newcomers, Typhoon is a minimal and free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via Terraform, and support for AWS, Azure, Google Cloud, DigitalOcean, and bare-metal. It is run by former CoreOS engineer @dghubble to power his clusters, with freedom motivations.</p>"},{"location":"announce/#jul-18-2019","title":"Jul 18, 2019","text":"<p>Introducing a preview of Typhoon Kubernetes clusters with Fedora CoreOS!</p> <p>Fedora recently announced the first preview release of Fedora CoreOS, aiming to blend the best of CoreOS and Fedora for containerized workloads. To spur testing, Typhoon is sharing preview modules for Kubernetes v1.15 on AWS and bare-metal using the new Fedora CoreOS preview. What better way to test drive than by running Kubernetes?</p> <p>While Typhoon uses Container Linux (or Flatcar Linux) for stable modules, the project hasn't been a stranger to Fedora ideas, once developing a Fedora Atomic variant in 2018. That makes the Fedora CoreOS fushion both exciting and familiar. Typhoon with Fedora CoreOS uses Ignition v3 for provisioning, uses rpm-ostree for layering and updates, tries swapping system containers for podman, and brings SELinux enforcement (table). This is an early preview (don't go to prod), but do try it out and help identify and solve issues (getting started links above).</p>"},{"location":"announce/#march-27-2019","title":"March 27, 2019","text":"<p>Last April, Typhoon introduced alpha support for creating Kubernetes clusters with Fedora Atomic on AWS, Google Cloud, DigitalOcean, and bare-metal. Fedora Atomic shared many of Container Linux's aims for a container-optimized operating system, introduced novel ideas, and provided technical diversification for an uncertain future. However, Project Atomic efforts were merged into Fedora CoreOS and future Fedora Atomic releases are not expected. Typhoon modules for Fedora Atomic will not be updated much beyond Kubernetes v1.13. They may later be removed.</p> <p>Typhoon for Fedora Atomic fell short of goals to provide a consistent, practical experience across operating systems and platforms. The modules have remained alpha, despite improvements. Features like coordinated OS updates and boot-time declarative customization were not realized. Inelegance of Cloud-Init/kickstart loomed large. With that brief but obligatory summary, I'd like to change gears and celebrate the many positives.</p> <p>Fedora Atomic showcased rpm-ostree as a different approach to Container Linux's AB update scheme. It provided a viable route toward CRI-O to replace Docker as the container engine. And Fedora Atomic devised system containers as a way to package and run raw OCI images through runc for host-level containers<sup>1</sup>. Many of these ideas will live on in Fedora CoreOS, which is exciting!</p> <p>For Typhoon, Fedora Atomic brought fresh ideas and broader perspectives about different container-optimized base operating systems and related tools. Its sad to let go of so much work, but I think its time. Many of the concepts and technologies that were explored will surface again and Typhoon is better positioned as a result.</p> <p>Thank you Project Atomic team members for your work! - dghubble</p>"},{"location":"announce/#may-23-2018","title":"May 23, 2018","text":"<p>Starting in v1.10.3, Typhoon AWS and bare-metal <code>container-linux</code> modules allow picking between the Red Hat Container Linux (formerly CoreOS Container Linux) and Kinvolk Flatcar Linux operating system. Flatcar Linux serves as a drop-in compatible \"friendly fork\" of Container Linux. Flatcar Linux publishes the same channels and versions as Container Linux and gets provisioned, managed, and operated in an identical way (e.g. login as user \"core\").</p> <p>On AWS, pick the Container Linux derivative channel by setting <code>os_image</code> to coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, or flatcar-alpha.</p> <p>On bare-metal, pick the Container Linux derivative channel by setting <code>os_channel</code> to coreos-stable, coreos-beta, coreos-alpha, flatcar-stable, flatcar-beta, or flatcar-alpha. Set the <code>os_version</code> number to PXE boot and install. Variables <code>container_linux_channel</code> and <code>container_linux_version</code> have been dropped.</p> <p>Flatcar Linux provides a familar Container Linux experience, with support from Kinvolk as an alternative to Red Hat. Typhoon offers the choice of Container Linux vendor to satisfy differing preferences and to diversify technology underpinnings, while providing a consistent Kubernetes experience across operating systems, clouds, and on-premise.</p>"},{"location":"announce/#april-26-2018","title":"April 26, 2018","text":"<p>Introducing Typhoon Kubernetes clusters for Fedora Atomic!</p> <p>Fedora Atomic is a container-optimized operating system designed for large-scale clustered operation, immutable infrastructure, and atomic operating system upgrades. Its part of Fedora and Project Atomic, a Red Hat sponsored project working on rpm-ostree, buildah, skopeo, CRI-O, and the related CentOS/RHEL Atomic.</p> <p>For newcomers, Typhoon is a free (cost and freedom) Kubernetes distribution providing upstream Kubernetes, declarative configuration via Terraform, and support for AWS, Google Cloud, DigitalOcean, and bare-metal. Typhoon clusters use a self-hosted control plane, support Calico and flannel CNI networking, and enable etcd TLS, RBAC, and network policy.</p> <p>Typhoon for Fedora Atomic reflects many of the same principles that created Typhoon for Container Linux. Clusters are declared using plain Terraform configs that can be versioned. In lieu of Ignition, instances are declaratively provisioned with Cloud-Init and kickstart (bare-metal only). TLS assets are generated. Hosts run only a kubelet service, other components are scheduled (i.e. self-hosted). The upstream hyperkube is used directly<sup>2</sup>. And clusters are kept minimal by offering optional addons for Ingress, Prometheus, and Grafana. Typhoon compliments and enhances Fedora Atomic as a choice of operating system for Kubernetes.</p> <p>Meanwhile, Fedora Atomic adds some promising new low-level technologies:</p> <ul> <li> <p>ostree &amp; rpm-ostree - a hybrid, layered, image and package system that lets you perform atomic updates and rollbacks, layer on packages, \"rebase\" your system, or manage a remote tree repo. See Dusty Mabe's great intro.</p> </li> <li> <p>system containers - OCI container images that embed systemd and runc metadata for starting low-level host services before container runtimes are ready. Typhoon uses system containers under runc for <code>etcd</code>, <code>kubelet</code>, and <code>bootkube</code> on Fedora Atomic (instead of rkt-fly).</p> </li> <li> <p>CRI-O - CRI-O is a kubernetes-incubator implementation of the Kubernetes Container Runtime Interface. Typhoon uses Docker as the container runtime today, but its a goal to gradually introduce CRI-O as an alternative runtime as it matures.</p> </li> </ul> <p>Typhoon has long aspired to add a dissimilar operating system to compliment Container Linux. Operating Typhoon clusters across colocations and multiple clouds was driven by our own real need and has provided healthy perspective and clear direction. Adding Fedora Atomic is exciting for the same reasons. Fedora Atomic diversifies Typhoon's technology underpinnings, uniting the Container Linux and Fedora Atomic ecosystems to provide a consistent Kubernetes experience across operating systems, clouds, and on-premise.</p> <p>Get started with the basics or read the OS comparison. If you're familiar with Terraform, follow the new tutorials for Fedora Atomic on AWS, Google Cloud, DigitalOcean, and bare-metal.</p> <p>Typhoon is not affiliated with Red Hat or Project Atomic.</p> <p>Warning</p> <p>Heed the warnings. Typhoon for Fedora Atomic is still alpha. Container Linux continues to be the recommended flavor for production clusters. Atomic is not meant to detract from efforts on Container Linux or its derivatives.</p> <p>Tip</p> <p>For bare-metal, you may continue to use your v0.7+ Matchbox service and <code>terraform-provider-matchbox</code> plugin to provision both Container Linux and Fedora Atomic clusters. No changes needed.</p> <ol> <li> <p>Container Linux's own primordial rkt-fly shim dates back to the pre-OCI era. In some ways, rkt drove the OCI standards that made newer ideas, like system containers, appealing.\u00a0\u21a9</p> </li> <li> <p>Using <code>etcd</code>, <code>kubelet</code>, and <code>bootkube</code> as system containers required metadata files be added in system-containers \u21a9</p> </li> </ol>"},{"location":"addons/fleetlock/","title":"fleetlock","text":""},{"location":"addons/fleetlock/#fleetlock","title":"fleetlock","text":"<p>fleetlock is a reboot coordinator for Fedora CoreOS nodes. It implements the FleetLock protocol for use as a Zincati lock strategy backend.</p> <p>Declare a Zincati <code>fleet_lock</code> strategy when provisioning Fedora CoreOS nodes via snippets.</p> <pre><code>variant: fcos\nversion: 1.5.0\nstorage:\n  files:\n    - path: /etc/zincati/config.d/55-update-strategy.toml\n      contents:\n        inline: |\n          [updates]\n          strategy = \"fleet_lock\"\n          [updates.fleet_lock]\n          base_url = \"http://10.3.0.15/\"\n</code></pre> <pre><code>module \"nemo\" {\n  ...\n  controller_snippets = [\n    file(\"./snippets/zincati-strategy.yaml\"),\n  ]\n  worker_snippets = [\n    file(\"./snippets/zincati-strategy.yaml\"),\n  ]\n}\n</code></pre> <p>Apply fleetlock based on the example manifests.</p> <pre><code>git clone git@github.com:poseidon/fleetlock.git\nkubectl apply -f examples/k8s\n</code></pre>"},{"location":"addons/grafana/","title":"Grafana","text":""},{"location":"addons/grafana/#grafana","title":"Grafana","text":"<p>Grafana can be used to build dashboards and visualizations that use Prometheus as the datasource. Create the grafana deployment and service.</p> <pre><code>kubectl apply -f addons/grafana -R\n</code></pre> <p>Use <code>kubectl</code> to authenticate to the apiserver and create a local port-forward to the Grafana pod.</p> <pre><code>kubectl port-forward grafana-POD-ID 8080 -n monitoring\n</code></pre> <p>Visit 127.0.0.1:8080 to view the bundled dashboards.</p> <p> </p>"},{"location":"addons/ingress/","title":"Nginx Ingress Controller","text":"<p>Nginx Ingress controller pods accept and demultiplex HTTP, HTTPS, TCP, or UDP traffic to backend services. Ingress controllers watch the Kubernetes API for Ingress resources and update their configuration accordingly. Ingress resources for HTTP(S) applications support virtual hosts (FQDNs), path rules, TLS termination, and SNI.</p>"},{"location":"addons/ingress/#aws","title":"AWS","text":"<p>On AWS, a network load balancer (NLB) distributes TCP traffic across two target groups (port 80 and 443) of worker nodes running an Ingress controller deployment. Security groups rules allow traffic to ports 80 and 443. Health checks ensure only workers with a healthy Ingress controller receive traffic.</p> <p>Create the Ingress controller deployment, service, RBAC roles, RBAC bindings, and namespace.</p> <pre><code>kubectl apply -R -f addons/nginx-ingress/aws\n</code></pre> <p>For each application, add a DNS CNAME resolving to the NLB's DNS record.</p> <pre><code>app1.example.com -&gt; tempest-ingress.123456.us-west2.elb.amazonaws.com\napp2.example.com -&gt; tempest-ingress.123456.us-west2.elb.amazonaws.com\napp3.example.com -&gt; tempest-ingress.123456.us-west2.elb.amazonaws.com\n</code></pre> <p>Find the NLB's DNS name through the console or use the Typhoon module's output <code>ingress_dns_name</code>. For example, you might use Terraform to manage a Google Cloud DNS record:</p> <pre><code>resource \"google_dns_record_set\" \"some-application\" {\n  # DNS zone name\n  managed_zone = \"example-zone\"\n\n  # DNS record\n  name    = \"app.example.com.\"\n  type    = \"CNAME\"\n  ttl     = 300\n  rrdatas = [\"${module.tempest.ingress_dns_name}.\"]\n}\n</code></pre>"},{"location":"addons/ingress/#azure","title":"Azure","text":"<p>On Azure, a load balancer distributes traffic across a backend address pool of worker nodes running an Ingress controller deployment. Security group rules allow traffic to ports 80 and 443. Health probes ensure only workers with a healthy Ingress controller receive traffic.</p> <p>Create the Ingress controller deployment, service, RBAC roles, RBAC bindings, and namespace.</p> <pre><code>kubectl apply -R -f addons/nginx-ingress/azure\n</code></pre> <p>For each application, add a DNS record resolving to the load balancer's IPv4 address.</p> <pre><code>app1.example.com -&gt; 11.22.33.44\napp2.example.com -&gt; 11.22.33.44\napp3.example.com -&gt; 11.22.33.44\n</code></pre> <p>Find the load balancer's IPv4 address with the Azure console or use the Typhoon module's output <code>ingress_static_ipv4</code>. For example, you might use Terraform to manage a Google Cloud DNS record:</p> <pre><code>resource \"google_dns_record_set\" \"some-application\" {\n  # DNS zone name\n  managed_zone = \"example-zone\"\n\n  # DNS record\n  name    = \"app.example.com.\"\n  type    = \"A\"\n  ttl     = 300\n  rrdatas = [module.ramius.ingress_static_ipv4]\n}\n</code></pre>"},{"location":"addons/ingress/#bare-metal","title":"Bare-Metal","text":"<p>On bare-metal, routing traffic to Ingress controller pods can be done in number of ways.</p>"},{"location":"addons/ingress/#equal-cost-multi-path","title":"Equal-Cost Multi-Path","text":"<p>Create the Ingress controller deployment, service, RBAC roles, and RBAC bindings. The service should use a fixed ClusterIP (e.g. 10.3.0.12) in the Kubernetes service IPv4 CIDR range.</p> <pre><code>kubectl apply -R -f addons/nginx-ingress/bare-metal\n</code></pre> <p>There is no need for pods to use host networking or for the ingress service to use NodePort or LoadBalancer. Nodes already proxy packets destined for the service's ClusterIP to node(s) with a pod endpoint.</p> <p>Configure the network router or load balancer with a static route for the Kubernetes service range and set the next hop to a node. Repeat for each node, as desired, and set the metric (i.e. cost) of each. Finally, DNAT traffic destined for the WAN on ports 80 or 443 to the service's fixed ClusterIP.</p> <p>For each application, add a DNS record resolving to the WAN(s).</p> <pre><code>resource \"google_dns_record_set\" \"some-application\" {\n  # Managed DNS Zone name\n  managed_zone = \"zone-name\"\n\n  # Name of the DNS record\n  name    = \"app.example.com.\"\n  type    = \"A\"\n  ttl     = 300\n  rrdatas = [\"SOME-WAN-IP\"]\n}\n</code></pre>"},{"location":"addons/ingress/#digital-ocean","title":"Digital Ocean","text":"<p>On DigitalOcean, DNS A and AAAA records (e.g. FQDN <code>nemo-workers.example.com</code>) resolve to each worker<sup>1</sup> running an Ingress controller DaemonSet on host ports 80 and 443. Firewall rules allow IPv4 and IPv6 traffic to ports 80 and 443.</p> <p>Create the Ingress controller daemonset, service, RBAC roles, RBAC bindings, and namespace.</p> <pre><code>kubectl apply -R -f addons/nginx-ingress/digital-ocean\n</code></pre> <p>For each application, add a CNAME record resolving to the worker(s) DNS record. Use the Typhoon module's output <code>workers_dns</code> to find the worker DNS value. For example, you might use Terraform to manage a Google Cloud DNS record:</p> <pre><code>resource \"google_dns_record_set\" \"some-application\" {\n  # DNS zone name\n  managed_zone = \"example-zone\"\n\n  # DNS record\n  name    = \"app.example.com.\"\n  type    = \"CNAME\"\n  ttl     = 300\n  rrdatas = [\"${module.nemo.workers_dns}.\"]\n}\n</code></pre> <p>Note</p> <p>Hosting IPv6 apps is possible, but requires editing the nginx-ingress addon to use <code>hostNetwork: true</code>.</p>"},{"location":"addons/ingress/#google-cloud","title":"Google Cloud","text":"<p>On Google Cloud, a TCP Proxy load balancer distributes IPv4 and IPv6 TCP traffic across a backend service of worker nodes running an Ingress controller deployment. Firewall rules allow traffic to ports 80 and 443. Health check rules ensure only workers with a healthy Ingress controller receive traffic.</p> <p>Create the Ingress controller deployment, service, RBAC roles, RBAC bindings, and namespace.</p> <pre><code>kubectl apply -R -f addons/nginx-ingress/google-cloud\n</code></pre> <p>For each application, add DNS A records resolving to the load balancer's IPv4 address and DNS AAAA records resolving to the load balancer's IPv6 address.</p> <pre><code>app1.example.com -&gt; 11.22.33.44\napp2.example.com -&gt; 11.22.33.44\napp3.example.com -&gt; 11.22.33.44\n</code></pre> <p>Find the IPv4 address with <code>gcloud compute addresses list</code> or use the Typhoon module's outputs <code>ingress_static_ipv4</code> and <code>ingress_static_ipv6</code>. For example, you might use Terraform to manage a Google Cloud DNS record:</p> <pre><code>resource \"google_dns_record_set\" \"app-record-a\" {\n  # DNS zone name\n  managed_zone = \"example-zone\"\n\n  # DNS record\n  name    = \"app.example.com.\"\n  type    = \"A\"\n  ttl     = 300\n  rrdatas = [module.yavin.ingress_static_ipv4]\n}\n\nresource \"google_dns_record_set\" \"app-record-aaaa\" {\n  # DNS zone name\n  managed_zone = \"example-zone\"\n\n  # DNS record\n  name    = \"app.example.com.\"\n  type    = \"AAAA\"\n  ttl     = 300\n  rrdatas = [module.yavin.ingress_static_ipv6]\n}\n</code></pre> <ol> <li> <p>DigitalOcean does offer load balancers. We've opted not to use them to keep the DigitalOcean cluster cheap for developers.\u00a0\u21a9</p> </li> </ol>"},{"location":"addons/overview/","title":"Addons","text":"<p>Typhoon clusters are verified to work well with several post-install addons.</p> <ul> <li>Nginx Ingress Controller</li> <li>Prometheus</li> <li>Grafana</li> <li>fleetlock</li> </ul>"},{"location":"addons/prometheus/","title":"Prometheus","text":"<p>Prometheus collects metrics (e.g. <code>node_memory_usage_bytes</code>) from targets by scraping their HTTP metrics endpoints. Targets are organized into jobs, defined in the Prometheus config. Targets may expose counter, gauge, histogram, or summary metrics.</p> <p>Here's a simple config from the Prometheus tutorial.</p> <pre><code>global:\n  scrape_interval: 15s\nscrape_configs:\n  - job_name: 'prometheus'\n    scrape_interval: 5s\n    static_configs:\n      - targets: ['localhost:9090']\n</code></pre> <p>On Kubernetes clusters, Prometheus is run as a Deployment, configured with a ConfigMap, and accessed via a Service or Ingress.</p> <pre><code>kubectl apply -f addons/prometheus -R\n</code></pre> <p>The ConfigMap configures Prometheus to discover apiservers, kubelets, cAdvisor, services, endpoints, and exporters. By default, data is kept in an <code>emptyDir</code> so it is persisted until the pod is rescheduled.</p>"},{"location":"addons/prometheus/#exporters","title":"Exporters","text":"<p>Exporters expose metrics for 3<sup>rd</sup>-party systems that don't natively expose Prometheus metrics.</p> <ul> <li>node_exporter - DaemonSet that exposes a machine's hardware and OS metrics</li> <li>kube-state-metrics - Deployment that exposes Kubernetes object metrics</li> <li>blackbox_exporter - Scrapes HTTP, HTTPS, DNS, TCP, or ICMP endpoints and exposes availability as metrics</li> </ul>"},{"location":"addons/prometheus/#queries-and-alerts","title":"Queries and Alerts","text":"<p>Prometheus provides a basic UI for querying metrics and viewing alerts. Use <code>kubectl</code> to authenticate to the apiserver and create a local port-forward to the Prometheus pod.</p> <pre><code>kubectl get pods -n monitoring\nkubectl port-forward prometheus-POD-ID 9090 -n monitoring\n</code></pre> <p>Visit 127.0.0.1:9090 to query expressions, view targets, or check alerts.</p> <p> </p> <p>Use Grafana to view or build dashboards that use Prometheus as the datasource.</p>"},{"location":"advanced/arm64/","title":"ARM64","text":"<p>Typhoon supports ARM64 Kubernetes clusters with ARM64 controller and worker nodes (full-cluster) or adding worker pools of ARM64 nodes to clusters with an x86/amd64 control plane for a hybdrid (mixed-arch) cluster.</p> <p>Typhoon ARM64 clusters (full-cluster or mixed-arch) are available on:</p> <ul> <li>AWS with Fedora CoreOS or Flatcar Linux</li> <li>Azure with Flatcar Linux</li> </ul>"},{"location":"advanced/arm64/#cluster","title":"Cluster","text":"<p>Create a cluster on AWS with ARM64 controller and worker nodes. Container workloads must be <code>arm64</code> compatible and use <code>arm64</code> (or multi-arch) container images.</p> Fedora CoreOS Cluster (arm64)Flatcar Linux Cluster (arm64) <pre><code>module \"gravitas\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"gravitas\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  arch         = \"arm64\"\n  networking   = \"cilium\"\n  worker_count = 2\n  worker_price = \"0.0168\"\n\n  controller_type = \"t4g.small\"\n  worker_type     = \"t4g.small\"\n}\n</code></pre> <pre><code>module \"gravitas\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"gravitas\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  arch         = \"arm64\"\n  networking   = \"cilium\"\n  worker_count = 2\n  worker_price = \"0.0168\"\n\n  controller_type = \"t4g.small\"\n  worker_type     = \"t4g.small\"\n}\n</code></pre> <p>Verify the cluster has only arm64 (<code>aarch64</code>) nodes. For Flatcar Linux, describe nodes.</p> <pre><code>$ kubectl get nodes -o wide\nNAME             STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                        KERNEL-VERSION            CONTAINER-RUNTIME\nip-10-0-21-119   Ready    &lt;none&gt;   77s   v1.29.3   10.0.21.119   &lt;none&gt;        Fedora CoreOS 35.20211215.3.0   5.15.7-200.fc35.aarch64   containerd://1.5.8\nip-10-0-32-166   Ready    &lt;none&gt;   80s   v1.29.3   10.0.32.166   &lt;none&gt;        Fedora CoreOS 35.20211215.3.0   5.15.7-200.fc35.aarch64   containerd://1.5.8\nip-10-0-5-79     Ready    &lt;none&gt;   77s   v1.29.3   10.0.5.79     &lt;none&gt;        Fedora CoreOS 35.20211215.3.0   5.15.7-200.fc35.aarch64   containerd://1.5.8\n</code></pre>"},{"location":"advanced/arm64/#hybrid","title":"Hybrid","text":"<p>Create a hybrid/mixed arch cluster by defining an AWS cluster. Then define a worker pool with ARM64 workers. Optional taints are added to aid in scheduling.</p> FCOS ClusterFlatcar ClusterFCOS ARM64 WorkersFlatcar ARM64 Workers <pre><code>module \"gravitas\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"gravitas\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  networking   = \"cilium\"\n  worker_count = 2\n  worker_price = \"0.021\"\n\n  daemonset_tolerations = [\"arch\"]     # important\n}\n</code></pre> <pre><code>module \"gravitas\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"gravitas\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  networking   = \"cilium\"\n  worker_count = 2\n  worker_price = \"0.021\"\n\n  daemonset_tolerations = [\"arch\"]     # important\n}\n</code></pre> <pre><code>module \"gravitas-arm64\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # AWS\n  vpc_id          = module.gravitas.vpc_id\n  subnet_ids      = module.gravitas.subnet_ids\n  security_groups = module.gravitas.worker_security_groups\n\n  # configuration\n  name               = \"gravitas-arm64\"\n  kubeconfig         = module.gravitas.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  arch          = \"arm64\"\n  instance_type = \"t4g.small\"\n  spot_price    = \"0.0168\"\n  node_taints   = [\"arch=arm64:NoSchedule\"]\n}\n</code></pre> <pre><code>module \"gravitas-arm64\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/flatcar-linux/kubernetes/workers?ref=v1.29.3\"\n\n  # AWS\n  vpc_id          = module.gravitas.vpc_id\n  subnet_ids      = module.gravitas.subnet_ids\n  security_groups = module.gravitas.worker_security_groups\n\n  # configuration\n  name               = \"gravitas-arm64\"\n  kubeconfig         = module.gravitas.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  arch          = \"arm64\"\n  instance_type = \"t4g.small\"\n  spot_price    = \"0.0168\"\n  node_taints   = [\"arch=arm64:NoSchedule\"]\n}\n</code></pre> <p>Verify amd64 (x86_64) and arm64 (aarch64) nodes are present.</p> <pre><code>$ kubectl get nodes -o wide\nNAME                       STATUS   ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                             KERNEL-VERSION            CONTAINER-RUNTIME\nip-10-0-1-73               Ready    &lt;none&gt;   111m   v1.29.3   10.0.1.73     &lt;none&gt;        Fedora CoreOS 35.20211215.3.0                        5.15.7-200.fc35.x86_64    containerd://1.5.8\nip-10-0-22-79...           Ready    &lt;none&gt;   111m   v1.29.3   10.0.22.79    &lt;none&gt;        Flatcar Container Linux by Kinvolk 3033.2.0 (Oklo)   5.10.84-flatcar           containerd://1.5.8\nip-10-0-24-130             Ready    &lt;none&gt;   111m   v1.29.3   10.0.24.130   &lt;none&gt;        Fedora CoreOS 35.20211215.3.0                        5.15.7-200.fc35.x86_64    containerd://1.5.8\nip-10-0-39-19              Ready    &lt;none&gt;   111m   v1.29.3   10.0.39.19    &lt;none&gt;        Fedora CoreOS 35.20211215.3.0                        5.15.7-200.fc35.x86_64    containerd://1.5.8\n</code></pre>"},{"location":"advanced/arm64/#azure","title":"Azure","text":"<p>Create a cluster on Azure with ARM64 controller and worker nodes. Container workloads must be <code>arm64</code> compatible and use <code>arm64</code> (or multi-arch) container images.</p> <pre><code>module \"ramius\" {\n  source = \"git::https://github.com/poseidon/typhoon//azure/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # Azure\n  cluster_name   = \"ramius\"\n  region         = \"centralus\"\n  dns_zone       = \"azure.example.com\"\n  dns_zone_group = \"example-group\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # optional\n  arch            = \"arm64\"\n  controller_type = \"Standard_D2pls_v5\"\n  worker_type     = \"Standard_D2pls_v5\"\n  worker_count    = 2\n  host_cidr       = \"10.0.0.0/20\"\n}\n</code></pre>"},{"location":"advanced/customization/","title":"Customization","text":"<p>Typhoon provides Kubernetes clusters with defaults recommended for production. Terraform variables expose supported customization options. Advanced options are available for customizing the architecture or hosts as well.</p>"},{"location":"advanced/customization/#variables","title":"Variables","text":"<p>Typhoon modules accept Terraform input variables for customizing clusters in meritorious ways (e.g. <code>worker_count</code>, etc). Variables are carefully considered to provide essentials, while limiting complexity and test matrix burden. See each platform's tutorial for options.</p>"},{"location":"advanced/customization/#addons","title":"Addons","text":"<p>Clusters are kept to a minimal Kubernetes control plane by offering components like Nginx Ingress Controller, Prometheus, and Grafana as optional post-install addons. Customize addons by modifying a copy of our addon manifests.</p>"},{"location":"advanced/customization/#hosts","title":"Hosts","text":""},{"location":"advanced/customization/#background","title":"Background","text":"<p>Typhoon uses the Ignition system of Fedora CoreOS and Flatcar Linux to immutably declare a system via first-boot disk provisioning. Human-friendly Butane Configs define disk partitions, filesystems, systemd units, dropins, config files, mount units, raid arrays, users, and more before being converted to Ignition.</p> <p>Controller and worker instances form a minimal and secure Kubernetes cluster on each platform. Typhoon provides the snippets feature to accept custom Butane Configs that are merged with instance declarations. This allows advanced host customization and experimentation.</p> <p>Note</p> <p>Snippets cannot be used to modify an already existing instance, the antithesis of immutable provisioning. Ignition fully declares a system on first boot only.</p> <p>Danger</p> <p>Snippets provide the powerful host customization abilities of Ignition. You are responsible for additional units, configs, files, and conflicts.</p> <p>Danger</p> <p>Edits to snippets for controller instances can (correctly) cause Terraform to observe a diff (if not otherwise suppressed) and propose destroying and recreating controller(s). Recognize that this is destructive since controllers run etcd and are stateful. See blue/green clusters.</p>"},{"location":"advanced/customization/#usage","title":"Usage","text":"<p>Define a Butane Config (docs, config) in version control near your Terraform workspace directory (e.g. perhaps in a <code>snippets</code> subdirectory). You may organize snippets into multiple files, if desired.</p> <p>For example, ensure an <code>/opt/hello</code> file is created with permissions 0644 before boot.</p> Fedora CoreOSFlatcar Linux <pre><code># custom-files.yaml\nvariant: fcos\nversion: 1.5.0\nstorage:\n  files:\n    - path: /opt/hello\n      contents:\n        inline: |\n          Hello World\n      mode: 0644\n</code></pre> <pre><code># custom-files.yaml\nvariant: flatcar\nversion: 1.0.0\nstorage:\n  files:\n    - path: /opt/hello\n      contents:\n        inline: |\n          Hello World\n      mode: 0644\n</code></pre> <p>Or ensure a systemd unit <code>hello.service</code> is created.</p> Fedora CoreOSFlatcar Linux <pre><code># custom-units.yaml\nvariant: fcos\nversion: 1.5.0\nsystemd:\n  units:\n    - name: hello.service\n      enabled: true\n      contents: |\n        [Unit]\n        Description=Hello World\n        [Service]\n        Type=oneshot\n        ExecStart=/usr/bin/echo Hello World!\n        [Install]\n        WantedBy=multi-user.target\n</code></pre> <pre><code># custom-units.yaml\nvariant: flatcar\nversion: 1.0.0\nsystemd:\n  units:\n    - name: hello.service\n      enabled: true\n      contents: |\n        [Unit]\n        Description=Hello World\n        [Service]\n        Type=oneshot\n        ExecStart=/usr/bin/echo Hello World!\n        [Install]\n        WantedBy=multi-user.target\n</code></pre> <p>Reference the Butane contents by location (e.g. <code>file(\"./custom-units.yaml\")</code>). On AWS, Azure, DigitalOcean, or Google Cloud extend the <code>controller_snippets</code> or <code>worker_snippets</code> list variables.</p> <pre><code>module \"nemo\" {\n  ...\n  worker_count            = 2\n  controller_snippets = [\n    file(\"./custom-files.yaml\"),\n    file(\"./custom-units.yaml\"),\n  ]\n  worker_snippets = [\n    file(\"./custom-files.yaml\"),\n    file(\"./custom-units.yaml\")\",\n  ]\n  ...\n}\n</code></pre> <p>On Bare-Metal, different Butane configs may be used for each node (since hardware may be heterogeneous). Extend the <code>snippets</code> map variable by mapping a controller or worker name key to a list of snippets.</p> <pre><code>module \"mercury\" {\n  ...\n  snippets = {\n    \"node2\" = [file(\"./units/hello.yaml\")]\n    \"node3\" = [\n      file(\"./units/world.yaml\"),\n      file(\"./units/hello.yaml\"),\n    ]\n  }\n  ...\n}\n</code></pre>"},{"location":"advanced/customization/#architecture","title":"Architecture","text":"<p>Typhoon chooses variables to expose with purpose. If you must customize clusters in ways that aren't supported by input variables, fork Typhoon and maintain a repository with customizations. Reference the repository by changing the username.</p> <pre><code>module \"nemo\" {\n  source = \"git::https://github.com/USERNAME/typhoon//digital-ocean/flatcar-linux/kubernetes?ref=myspecialcase\"\n  ...\n}\n</code></pre> <p>To customize low-level Kubernetes control plane bootstrapping, see the poseidon/terraform-render-bootstrap Terraform module.</p>"},{"location":"advanced/customization/#system-images","title":"System Images","text":"<p>Typhoon publishes Kubelet container images to Quay.io (default) and to Dockerhub (in case of a Quay outage or breach). Quay automated builds also provide the option for fully verifiable tagged images (<code>build-{short_sha}</code>).</p> <p>To set an alternative etcd image or Kubelet image, use a snippet to set a systemd dropin.</p> Kubeletetcd <pre><code># kubelet-image-override.yaml\nvariant: fcos           &lt;- remove for Flatcar Linux\nversion: 1.5.0          &lt;- remove for Flatcar Linux\nsystemd:\n  units:\n    - name: kubelet.service\n      dropins:\n        - name: 10-image-override.conf\n          contents: |\n            [Service]\n            Environment=KUBELET_IMAGE=docker.io/psdn/kubelet:v1.18.3\n</code></pre> <pre><code># etcd-image-override.yaml\nvariant: fcos           &lt;- remove for Flatcar Linux\nversion: 1.5.0          &lt;- remove for Flatcar Linux\nsystemd:\n  units:\n    - name: etcd-member.service\n      dropins:\n        - name: 10-image-override.conf\n          contents: |\n            [Service]\n            Environment=ETCD_IMAGE=quay.io/mymirror/etcd:v3.4.12\n</code></pre> <p>Then reference the snippet in the cluster or worker pool definition.</p> <pre><code>module \"nemo\" {\n  ...\n\n  worker_snippets = [\n    file(\"./snippets/kubelet-image-override.yaml\")\n  ]\n  ...\n}\n</code></pre>"},{"location":"advanced/nodes/","title":"Nodes","text":"<p>Typhoon clusters consist of controller node(s) and a (default) set of worker nodes.</p>"},{"location":"advanced/nodes/#overview","title":"Overview","text":"<p>Typhoon nodes use the standard set of Kubernetes node labels.</p> <pre><code>Labels: kubernetes.io/arch=amd64\n        kubernetes.io/hostname=node-name\n        kubernetes.io/os=linux\n</code></pre> <p>Controller node(s) are labeled to allow node selection (for rare components that run on controllers) and tainted to prevent ordinary workloads running on controllers.</p> <pre><code>Labels: node.kubernetes.io/controller=true\nTaints: node-role.kubernetes.io/controller:NoSchedule\n</code></pre> <p>Worker nodes are labeled to allow node selection and untainted. Workloads will schedule on worker nodes by default, baring any contraindications.</p> <pre><code>Labels: node.kubernetes.io/node=\nTaints: &lt;none&gt;\n</code></pre> <p>On auto-scaling cloud platforms, you may add worker pools with different groups of nodes with their own labels and taints. On platforms like bare-metal, with heterogeneous machines, you may manage node labels and taints per node.</p>"},{"location":"advanced/nodes/#node-labels","title":"Node Labels","text":"<p>Add custom initial worker node labels to default workers or worker pool nodes to allow workloads to select among nodes that differ.</p> ClusterWorker Pool <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  region        = \"us-central1\"\n  dns_zone      = \"example.com\"\n  dns_zone_name = \"example-zone\"\n\n  # configuration\n  ssh_authorized_key = local.ssh_key\n\n  # optional\n  worker_count = 2\n  worker_node_labels = [\"pool=default\"]\n}\n</code></pre> <pre><code>module \"yavin-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name = \"yavin\"\n  region       = \"europe-west2\"\n  network      = module.yavin.network_name\n\n  # configuration\n  name               = \"yavin-16x\"\n  kubeconfig         = module.yavin.kubeconfig\n  ssh_authorized_key = local.ssh_key\n\n  # optional\n  worker_count = 1\n  machine_type = \"n1-standard-16\"\n  node_labels  = [\"pool=big\"]\n}\n</code></pre> <p>In the example above, the two default workers would be labeled <code>pool: default</code> and the additional worker would be labeled <code>pool: big</code>.</p>"},{"location":"advanced/nodes/#node-taints","title":"Node Taints","text":"<p>Add custom initial taints on worker pool nodes to indicate a node is unique and should only schedule workloads that explicitly tolerate a given taint key.</p> <p>Warning</p> <p>Since taints prevent workloads scheduling onto a node, you must decide whether <code>kube-system</code> DaemonSets (e.g. flannel, Calico, Cilium) should tolerate your custom taint by setting <code>daemonset_tolerations</code>. If you don't list your custom taint(s), important components won't run on these nodes.</p> ClusterWorker Pool <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  region        = \"us-central1\"\n  dns_zone      = \"example.com\"\n  dns_zone_name = \"example-zone\"\n\n  # configuration\n  ssh_authorized_key = local.ssh_key\n\n  # optional\n  worker_count = 2\n  daemonset_tolerations = [\"role\"]\n}\n</code></pre> <pre><code>module \"yavin-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name = \"yavin\"\n  region       = \"europe-west2\"\n  network      = module.yavin.network_name\n\n  # configuration\n  name               = \"yavin-16x\"\n  kubeconfig         = module.yavin.kubeconfig\n  ssh_authorized_key = local.ssh_key\n\n  # optional\n  worker_count      = 1\n  accelerator_type  = \"nvidia-tesla-p100\"\n  accelerator_count = 1\n  node_taints       = [\"role=gpu:NoSchedule\"]\n}\n</code></pre> <p>In the example above, the the additional worker would be tainted with <code>role=gpu:NoSchedule</code> to prevent workloads scheduling, but <code>kube-system</code> components like flannel, Calico, or Cilium would tolerate that custom taint to run there.</p>"},{"location":"advanced/overview/","title":"Advanced","text":"<p>Typhoon clusters offer several advanced features for skilled users.</p> <ul> <li>ARM64</li> <li>Customization</li> <li>Nodes</li> <li>Worker Pools</li> </ul>"},{"location":"advanced/worker-pools/","title":"Worker Pools","text":"<p>Typhoon AWS, Azure, and Google Cloud allow additional groups of workers to be defined and joined to a cluster. For example, add worker pools of instances with different types, disk sizes, Container Linux channels, or preemptibility modes.</p> <p>Internal Terraform Modules:</p> <ul> <li><code>aws/flatcar-linux/kubernetes/workers</code></li> <li><code>aws/fedora-coreos/kubernetes/workers</code></li> <li><code>azure/flatcar-linux/kubernetes/workers</code></li> <li><code>azure/fedora-coreos/kubernetes/workers</code></li> <li><code>google-cloud/flatcar-linux/kubernetes/workers</code></li> <li><code>google-cloud/fedora-coreos/kubernetes/workers</code></li> </ul>"},{"location":"advanced/worker-pools/#aws","title":"AWS","text":"<p>Create a cluster following the AWS tutorial. Define a worker pool using the AWS internal <code>workers</code> module.</p> Fedora CoreOSFlatcar Linux <pre><code>module \"tempest-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # AWS\n  vpc_id          = module.tempest.vpc_id\n  subnet_ids      = module.tempest.subnet_ids\n  security_groups = module.tempest.worker_security_groups\n\n  # configuration\n  name               = \"tempest-pool\"\n  kubeconfig         = module.tempest.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count  = 2\n  instance_type = \"m5.large\"\n  os_stream     = \"next\"\n}\n</code></pre> <pre><code>module \"tempest-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/flatcar-linux/kubernetes/workers?ref=v1.29.3\"\n\n  # AWS\n  vpc_id          = module.tempest.vpc_id\n  subnet_ids      = module.tempest.subnet_ids\n  security_groups = module.tempest.worker_security_groups\n\n  # configuration\n  name               = \"tempest-pool\"\n  kubeconfig         = module.tempest.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count  = 2\n  instance_type = \"m5.large\"\n  os_image      = \"flatcar-beta\"\n}\n</code></pre> <p>Apply the change.</p> <pre><code>terraform apply\n</code></pre> <p>Verify an auto-scaling group of workers joins the cluster within a few minutes.</p>"},{"location":"advanced/worker-pools/#variables","title":"Variables","text":"<p>The AWS internal <code>workers</code> module supports a number of variables.</p>"},{"location":"advanced/worker-pools/#required","title":"Required","text":"Name Description Example name Unique name (distinct from cluster name) \"tempest-m5s\" vpc_id Must be set to <code>vpc_id</code> output by cluster module.cluster.vpc_id subnet_ids Must be set to <code>subnet_ids</code> output by cluster module.cluster.subnet_ids security_groups Must be set to <code>worker_security_groups</code> output by cluster module.cluster.worker_security_groups kubeconfig Must be set to <code>kubeconfig</code> output by cluster module.cluster.kubeconfig ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\""},{"location":"advanced/worker-pools/#optional","title":"Optional","text":"Name Description Default Example worker_count Number of instances 1 3 instance_type EC2 instance type \"t3.small\" \"t3.medium\" os_image AMI channel for a Container Linux derivative \"flatcar-stable\" flatcar-stable, flatcar-beta, flatcar-alpha os_stream Fedora CoreOS stream for compute instances \"stable\" \"testing\", \"next\" disk_size Size of the EBS volume in GB 40 100 disk_type Type of the EBS volume \"gp3\" standard, gp2, gp3, io1 disk_iops IOPS of the EBS volume 0 (i.e. auto) 400 spot_price Spot price in USD for worker instances or 0 to use on-demand instances 0 0.10 snippets Fedora CoreOS or Container Linux Config snippets [] examples service_cidr Must match <code>service_cidr</code> of cluster \"10.3.0.0/16\" \"10.3.0.0/24\" node_labels List of initial node labels [] [\"worker-pool=foo\"] node_taints List of initial node taints [] [\"role=gpu:NoSchedule\"] <p>Check the list of valid instance types or per-region and per-type spot prices.</p>"},{"location":"advanced/worker-pools/#azure","title":"Azure","text":"<p>Create a cluster following the Azure tutorial. Define a worker pool using the Azure internal <code>workers</code> module.</p> Fedora CoreOSFlatcar Linux <pre><code>module \"ramius-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//azure/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # Azure\n  region                  = module.ramius.region\n  resource_group_name     = module.ramius.resource_group_name\n  subnet_id               = module.ramius.subnet_id\n  security_group_id       = module.ramius.security_group_id\n  backend_address_pool_id = module.ramius.backend_address_pool_id\n\n  # configuration\n  name               = \"ramius-spot\"\n  kubeconfig         = module.ramius.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count = 2\n  vm_type      = \"Standard_F4\"\n  priority     = \"Spot\"\n  os_image     = \"/subscriptions/some/path/Microsoft.Compute/images/fedora-coreos-31.20200323.3.2\"\n}\n</code></pre> <pre><code>module \"ramius-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//azure/flatcar-linux/kubernetes/workers?ref=v1.29.3\"\n\n  # Azure\n  region                  = module.ramius.region\n  resource_group_name     = module.ramius.resource_group_name\n  subnet_id               = module.ramius.subnet_id\n  security_group_id       = module.ramius.security_group_id\n  backend_address_pool_id = module.ramius.backend_address_pool_id\n\n  # configuration\n  name               = \"ramius-spot\"\n  kubeconfig         = module.ramius.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count = 2\n  vm_type      = \"Standard_F4\"\n  priority     = \"Spot\"\n  os_image     = \"flatcar-beta\"\n}\n</code></pre> <p>Apply the change.</p> <pre><code>terraform apply\n</code></pre> <p>Verify a scale set of workers joins the cluster within a few minutes.</p>"},{"location":"advanced/worker-pools/#variables_1","title":"Variables","text":"<p>The Azure internal <code>workers</code> module supports a number of variables.</p>"},{"location":"advanced/worker-pools/#required_1","title":"Required","text":"Name Description Example name Unique name (distinct from cluster name) \"ramius-f4\" region Must be set to <code>region</code> output by cluster module.cluster.region resource_group_name Must be set to <code>resource_group_name</code> output by cluster module.cluster.resource_group_name subnet_id Must be set to <code>subnet_id</code> output by cluster module.cluster.subnet_id security_group_id Must be set to <code>security_group_id</code> output by cluster module.cluster.security_group_id backend_address_pool_id Must be set to <code>backend_address_pool_id</code> output by cluster module.cluster.backend_address_pool_id kubeconfig Must be set to <code>kubeconfig</code> output by cluster module.cluster.kubeconfig ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\""},{"location":"advanced/worker-pools/#optional_1","title":"Optional","text":"Name Description Default Example worker_count Number of instances 1 3 vm_type Machine type for instances \"Standard_D2as_v5\" See below os_image Channel for a Container Linux derivative \"flatcar-stable\" flatcar-stable, flatcar-beta, flatcar-alpha priority Set priority to Spot to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time \"Regular\" \"Spot\" snippets Container Linux Config snippets [] examples service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" node_labels List of initial node labels [] [\"worker-pool=foo\"] node_taints List of initial node taints [] [\"role=gpu:NoSchedule\"] <p>Check the list of valid machine types and their specs. Use <code>az vm list-skus</code> to get the identifier.</p>"},{"location":"advanced/worker-pools/#google-cloud","title":"Google Cloud","text":"<p>Create a cluster following the Google Cloud tutorial. Define a worker pool using the Google Cloud internal <code>workers</code> module.</p> Fedora CoreOSFlatcar Linux <pre><code>module \"yavin-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes/workers?ref=v1.29.3\"\n\n  # Google Cloud\n  region       = \"europe-west2\"\n  network      = module.yavin.network_name\n  cluster_name = \"yavin\"\n\n  # configuration\n  name               = \"yavin-16x\"\n  kubeconfig         = module.yavin.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count = 2\n  machine_type = \"n1-standard-16\"\n  os_stream    = \"testing\"\n  preemptible  = true\n}\n</code></pre> <pre><code>module \"yavin-worker-pool\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/flatcar-linux/kubernetes/workers?ref=v1.29.3\"\n\n  # Google Cloud\n  region       = \"europe-west2\"\n  network      = module.yavin.network_name\n  cluster_name = \"yavin\"\n\n  # configuration\n  name               = \"yavin-16x\"\n  kubeconfig         = module.yavin.kubeconfig\n  ssh_authorized_key = var.ssh_authorized_key\n\n  # optional\n  worker_count = 2\n  machine_type = \"n1-standard-16\"\n  os_image     = \"flatcar-stable\"\n  preemptible  = true\n}\n</code></pre> <p>Apply the change.</p> <pre><code>terraform apply\n</code></pre> <p>Verify a managed instance group of workers joins the cluster within a few minutes.</p> <pre><code>$ kubectl get nodes\nNAME                                             STATUS   AGE    VERSION\nyavin-controller-0.c.example-com.internal        Ready    6m     v1.29.3\nyavin-worker-jrbf.c.example-com.internal         Ready    5m     v1.29.3\nyavin-worker-mzdm.c.example-com.internal         Ready    5m     v1.29.3\nyavin-16x-worker-jrbf.c.example-com.internal     Ready    3m     v1.29.3\nyavin-16x-worker-mzdm.c.example-com.internal     Ready    3m     v1.29.3\n</code></pre>"},{"location":"advanced/worker-pools/#variables_2","title":"Variables","text":"<p>The Google Cloud internal <code>workers</code> module supports a number of variables.</p>"},{"location":"advanced/worker-pools/#required_2","title":"Required","text":"Name Description Example name Unique name (distinct from cluster name) \"yavin-16x\" cluster_name Must be set to <code>cluster_name</code> of cluster \"yavin\" region Region for the worker pool instances. May differ from the cluster's region \"europe-west2\" network Must be set to <code>network_name</code> output by cluster module.cluster.network_name kubeconfig Must be set to <code>kubeconfig</code> output by cluster module.cluster.kubeconfig os_image Container Linux image for compute instances \"uploaded-flatcar-image\" ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\" <p>Check the list of regions docs or with <code>gcloud compute regions list</code>.</p>"},{"location":"advanced/worker-pools/#optional_2","title":"Optional","text":"Name Description Default Example worker_count Number of instances 1 3 machine_type Compute instance machine type \"n1-standard-1\" See below os_stream Fedora CoreOS stream for compute instances \"stable\" \"testing\", \"next\" disk_size Size of the disk in GB 40 100 preemptible If true, Compute Engine will terminate instances randomly within 24 hours false true snippets Container Linux Config snippets [] examples service_cidr Must match <code>service_cidr</code> of cluster \"10.3.0.0/16\" \"10.3.0.0/24\" node_labels List of initial node labels [] [\"worker-pool=foo\"] node_taints List of initial node taints [] [\"role=gpu:NoSchedule\"] <p>Check the list of valid machine types.</p>"},{"location":"architecture/aws/","title":"AWS","text":""},{"location":"architecture/aws/#load-balancing","title":"Load Balancing","text":""},{"location":"architecture/aws/#kube-apiserver","title":"kube-apiserver","text":"<p>A network load balancer (NLB) distributes IPv4 TCP/6443 traffic across a target group of controller nodes with a healthy <code>kube-apiserver</code>. Clusters with multiple controllers span zones in a region to tolerate zone outages.</p>"},{"location":"architecture/aws/#httphttps-ingress","title":"HTTP/HTTPS Ingress","text":"<p>A network load balancer (NLB) distributes IPv4 TCP/80 and TCP/443 traffic across two target groups of worker nodes with a healthy Ingress controller. Workers span the zones in a region to tolerate zone outages.</p> <p>The AWS NLB has a DNS alias record (regional) resolving to 3 zonal IPv4 addresses. The alias record is output as <code>ingress_dns_name</code> for use in application DNS CNAME records. See Ingress on AWS.</p>"},{"location":"architecture/aws/#tcp-services","title":"TCP Services","text":"<p>Load balance TCP applications by adding a listener and target group. A listener and target group may map different ports (e.g 3333 external, 30333 internal).</p> <pre><code># Forward TCP traffic to a target group\nresource \"aws_lb_listener\" \"some-app\" {\n  load_balancer_arn = module.tempest.nlb_id\n  protocol          = \"TCP\"\n  port              = \"3333\"\n\n  default_action {\n    type             = \"forward\"\n    target_group_arn = aws_lb_target_group.some-app.arn\n  }\n}\n\n# Target group of workers for some-app\nresource \"aws_lb_target_group\" \"some-app\" {\n  name        = \"some-app\"\n  vpc_id      = module.tempest.vpc_id\n  target_type = \"instance\"\n\n  protocol = \"TCP\"\n  port     = 3333\n\n  health_check {\n    protocol = \"TCP\"\n    port     = 30333\n  }\n}\n</code></pre> <p>Pass <code>worker_target_groups</code> to the cluster to register worker instances into custom target groups.</p> <pre><code>module \"tempest\" {\n...\n  worker_target_groups = [\n    aws_lb_target_group.some-app.id,\n  ]\n}\n</code></pre> <p>Notes:</p> <ul> <li>AWS NLBs and target groups do not support UDP</li> <li>Global Accelerator does support UDP, but its expensive</li> </ul>"},{"location":"architecture/aws/#firewalls","title":"Firewalls","text":"<p>Add firewall rules to the worker security group.</p> <pre><code>resource \"aws_security_group_rule\" \"some-app\" {\n  security_group_id = module.tempest.worker_security_groups[0]\n\n  type        = \"ingress\"\n  protocol    = \"tcp\"\n  from_port   = 3333\n  to_port     = 30333\n  cidr_blocks = [\"0.0.0.0/0\"]\n}\n</code></pre>"},{"location":"architecture/aws/#routes","title":"Routes","text":"<p>Add a custom route to the VPC route table.</p> <pre><code>data \"aws_route_table\" \"default\" {\n  vpc_id = module.temptest.vpc_id\n  subnet_id = module.tempest.subnet_ids[0]\n}\n\nresource \"aws_route\" \"peering\" {\n  route_table_id = data.aws_route_table.default.id\n  destination_cidr_block = \"192.168.4.0/24\"\n  ...\n}\n</code></pre>"},{"location":"architecture/aws/#ipv6","title":"IPv6","text":"IPv6 Feature Supported Node IPv6 address Yes Node Outbound IPv6 Yes Kubernetes Ingress IPv6 Yes"},{"location":"architecture/azure/","title":"Azure","text":""},{"location":"architecture/azure/#load-balancing","title":"Load Balancing","text":""},{"location":"architecture/azure/#kube-apiserver","title":"kube-apiserver","text":"<p>A load balancer distributes IPv4 TCP/6443 traffic across a backend address pool of controllers with a healthy <code>kube-apiserver</code>. Clusters with multiple controllers use an availability set with 2 fault domains to tolerate hardware failures within Azure.</p>"},{"location":"architecture/azure/#httphttps-ingress","title":"HTTP/HTTPS Ingress","text":"<p>A load balancer distributes IPv4 TCP/80 and TCP/443 traffic across a backend address pool of workers with a healthy Ingress controller.</p> <p>The Azure LB IPv4 address is output as <code>ingress_static_ipv4</code> for use in DNS A records. See Ingress on Azure.</p>"},{"location":"architecture/azure/#tcpudp-services","title":"TCP/UDP Services","text":"<p>Load balance TCP/UDP applications by adding rules to the Azure LB (output). A rule may map different ports (e.g. 3333 external, 30333 internal).</p> <pre><code># Forward traffic to the worker backend address pool\nresource \"azurerm_lb_rule\" \"some-app-tcp\" {\n  resource_group_name = module.ramius.resource_group_name\n\n  name                           = \"some-app-tcp\"\n  loadbalancer_id                = module.ramius.loadbalancer_id\n  frontend_ip_configuration_name = \"ingress\"\n\n  protocol                = \"Tcp\"\n  frontend_port           = 3333\n  backend_port            = 30333\n  backend_address_pool_id = module.ramius.backend_address_pool_id\n  probe_id                = azurerm_lb_probe.some-app.id\n}\n\n# Health check some-app\nresource \"azurerm_lb_probe\" \"some-app\" {\n  resource_group_name = module.ramius.resource_group_name\n\n  name            = \"some-app\"\n  loadbalancer_id = module.ramius.loadbalancer_id\n  protocol        = \"Tcp\"\n  port            = 30333\n}\n</code></pre>"},{"location":"architecture/azure/#firewalls","title":"Firewalls","text":"<p>Add firewall rules to the worker security group.</p> <pre><code>resource \"azurerm_network_security_rule\" \"some-app\" {\n  resource_group_name = \"${module.ramius.resource_group_name}\"\n\n  name                         = \"some-app\"\n  network_security_group_name  = module.ramius.worker_security_group_name\n  priority                     = \"3001\"\n  access                       = \"Allow\"\n  direction                    = \"Inbound\"\n  protocol                     = \"Tcp\"\n  source_port_range            = \"*\"\n  destination_port_range       = \"30333\"\n  source_address_prefix        = \"*\"\n  destination_address_prefixes = module.ramius.worker_address_prefixes\n}\n</code></pre>"},{"location":"architecture/azure/#ipv6","title":"IPv6","text":"<p>Azure does not provide public IPv6 addresses at the standard SKU.</p> IPv6 Feature Supported Node IPv6 address No Node Outbound IPv6 No Kubernetes Ingress IPv6 No"},{"location":"architecture/bare-metal/","title":"Bare-Metal","text":""},{"location":"architecture/bare-metal/#load-balancing","title":"Load Balancing","text":""},{"location":"architecture/bare-metal/#kube-apiserver","title":"kube-apiserver","text":"<p>Load balancing across controller nodes with a healthy <code>kube-apiserver</code> is determined by your unique bare-metal environment and its capabilities.</p>"},{"location":"architecture/bare-metal/#httphttps-ingress","title":"HTTP/HTTPS Ingress","text":"<p>Load balancing across worker nodes with a healthy Ingress Controller is determined by your unique bare-metal environment and its capabilities.</p> <p>See the <code>nginx-ingress</code> addon to run Nginx as the Ingress Controller for bare-metal.</p>"},{"location":"architecture/bare-metal/#tcpudp-services","title":"TCP/UDP Services","text":"<p>Load balancing across worker nodes with TCP/UDP services is determined by your unique bare-metal environment and its capabilities.</p>"},{"location":"architecture/bare-metal/#ipv6","title":"IPv6","text":"<p>Status of IPv6 on Typhoon bare-metal clusters.</p> IPv6 Feature Supported Node IPv6 address Yes Node Outbound IPv6 Yes Kubernetes Ingress IPv6 Possible <p>IPv6 support depends upon the bare-metal network environment.</p>"},{"location":"architecture/concepts/","title":"Concepts","text":"<p>Let's cover the concepts you'll need to get started.</p>"},{"location":"architecture/concepts/#kubernetes","title":"Kubernetes","text":"<p>Kubernetes is an open-source cluster system for deploying, scaling, and managing containerized applications across a pool of compute nodes (bare-metal, droplets, instances).</p>"},{"location":"architecture/concepts/#nodes","title":"Nodes","text":"<p>All cluster nodes provision themselves from a declarative configuration upfront. Nodes run a <code>kubelet</code> service and register themselves with the control plane to join the cluster. All nodes run <code>kube-proxy</code> and <code>calico</code> or <code>flannel</code> pods.</p>"},{"location":"architecture/concepts/#controllers","title":"Controllers","text":"<p>Controller nodes are scheduled to run the Kubernetes <code>apiserver</code>, <code>scheduler</code>, <code>controller-manager</code>, <code>coredns</code>, and <code>kube-proxy</code>. A fully qualified domain name (e.g. cluster_name.domain.com) resolving to a network load balancer or round-robin DNS (depends on platform) is used to refer to the control plane.</p>"},{"location":"architecture/concepts/#workers","title":"Workers","text":"<p>Worker nodes register with the control plane and run application workloads.</p>"},{"location":"architecture/concepts/#terraform","title":"Terraform","text":"<p>Terraform config files declare resources that Terraform should manage. Resources include infrastructure components created through a provider API (e.g. Compute instances, DNS records) or local assets like TLS certificates and config files.</p> <pre><code># Declare an instance\nresource \"google_compute_instance\" \"pet\" {\n  # ...\n}\n</code></pre> <p>The <code>terraform</code> tool parses configs, reconciles the desired state with actual state, and updates resources to reach desired state.</p> <pre><code>$ terraform plan\nPlan: 4 to add, 0 to change, 0 to destroy.\n$ terraform apply\nApply complete! Resources: 4 added, 0 changed, 0 destroyed.\n</code></pre> <p>With Typhoon, you'll be able to manage clusters with Terraform.</p>"},{"location":"architecture/concepts/#modules","title":"Modules","text":"<p>Terraform modules allow a collection of resources to be configured and managed together. Typhoon provides a Kubernetes cluster Terraform module for each supported platform and operating system.</p> <p>Clusters are declared in Terraform by referencing the module.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes\"\n  cluster_name = \"yavin\"\n  ...\n}\n</code></pre>"},{"location":"architecture/concepts/#versioning","title":"Versioning","text":"<p>Modules are updated regularly, set the version to a release tag or commit hash.</p> <pre><code>...\nsource = \"git:https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=hash\"\n</code></pre> <p>Module versioning ensures <code>terraform get --update</code> only fetches the desired version, so plan and apply don't change cluster resources, unless the version is altered.</p>"},{"location":"architecture/concepts/#organize","title":"Organize","text":"<p>Maintain Terraform configs for \"live\" infrastructure in a versioned repository. Seek to organize configs to reflect resources that should be managed together in a <code>terraform apply</code> invocation.</p> <p>You may choose to organize resources all together, by team, by project, or some other scheme. Here's an example that manages clusters together:</p> <pre><code>.git/\ninfra/\n\u2514\u2500\u2500 terraform\n    \u2514\u2500\u2500 clusters\n        \u251c\u2500\u2500 aws-tempest.tf\n        \u251c\u2500\u2500 azure-ramius.tf\n        \u251c\u2500\u2500 bare-metal-mercury.tf\n        \u251c\u2500\u2500 google-cloud-yavin.tf\n        \u251c\u2500\u2500 digital-ocean-nemo.tf\n        \u251c\u2500\u2500 providers.tf\n        \u251c\u2500\u2500 terraform.tfvars\n        \u2514\u2500\u2500 remote-backend.tf\n</code></pre> <p>By convention, <code>providers.tf</code> registers provider APIs, <code>terraform.tfvars</code> stores shared values, and state is written to a remote backend.</p>"},{"location":"architecture/concepts/#state","title":"State","text":"<p>Terraform syncs its state with provider APIs to plan changes to reconcile to the desired state. By default, Terraform writes state data (including secrets!) to a <code>terraform.tfstate</code> file. At a minimum, add a <code>.gitignore</code> file (or equivalent) to prevent state from being committed to your infrastructure repository.</p> <pre><code># .gitignore\n*.tfstate\n*.tfstate.backup\n.terraform/\n</code></pre>"},{"location":"architecture/concepts/#remote-backend","title":"Remote Backend","text":"<p>Later, you may wish to checkout Terraform remote backends which store state in a remote bucket like Google Storage or S3.</p> <pre><code>terraform {\n  backend \"gcs\" {\n    credentials = \"/path/to/credentials.json\"\n    project     = \"project-id\"\n    bucket      = \"bucket-id\"\n    path        = \"metal.tfstate\"\n  }\n}\n</code></pre>"},{"location":"architecture/digitalocean/","title":"DigitalOcean","text":""},{"location":"architecture/digitalocean/#load-balancing","title":"Load Balancing","text":""},{"location":"architecture/digitalocean/#kube-apiserver","title":"kube-apiserver","text":"<p>DNS A records round-robin<sup>1</sup> resolve IPv4 TCP/6443 traffic to controller droplets (regardless of whether their <code>kube-apiserver</code> is healthy). Clusters with multiple controllers are supported, but round-robin means \u2153 down causes ~\u2153 of apiserver requests will fail).</p>"},{"location":"architecture/digitalocean/#httphttps-ingress","title":"HTTP/HTTPS Ingress","text":"<p>DNS records (A and AAAA) round-robin<sup>1</sup> resolve the <code>workers_dns</code> name (e.g. <code>nemo-workers.example.com</code>) to a worker droplet's IPv4 and IPv6 address. This allows running an Ingress controller Daemonset across workers (resolved regardless of whether its the controller is healthy).</p> <p>The DNS record name is output as <code>workers_dns</code> for use in application DNS CNAME records. See Ingess on DigitalOcean.</p>"},{"location":"architecture/digitalocean/#tcpudp-services","title":"TCP/UDP Services","text":"<p>DNS records (A and AAAA) round-robin<sup>1</sup> resolve the <code>workers_dns</code> name (e.g. <code>nemo-workers.example.com</code>) to a worker droplet's IPv4 and IPv6 address. The DNS record name is output as <code>workers_dns</code> for use in application DNS CNAME records.</p> <p>With round-robin as \"load balancing\", TCP/UDP services can be served via the same CNAME. Don't forget to add a firewall rule for the application.</p>"},{"location":"architecture/digitalocean/#custom-load-balancer","title":"Custom Load Balancer","text":"<p>Add a DigitalOcean load balancer to distribute IPv4 TCP traffic (HTTP/HTTPS Ingress or TCP service) across worker droplets (tagged with <code>worker_tag</code>) with a healthy Ingress controller. A load balancer adds cost, but adds redundancy against worker failures (closer to Typhoon clusters on other platforms).</p> <pre><code>resource \"digitalocean_loadbalancer\" \"ingress\" {\n  name        = \"ingress\"\n  region      = \"fra1\"\n  vpc_uuid    = module.nemo.vpc_id\n  droplet_tag = module.nemo.worker_tag\n\n  healthcheck {\n    protocol          = \"http\"\n    port              = \"10254\"\n    path              = \"/healthz\"\n    healthy_threshold = 2\n  }\n\n  forwarding_rule {\n    entry_protocol  = \"tcp\"\n    entry_port      = 80\n    target_protocol = \"tcp\"\n    target_port     = 80\n  }\n\n  forwarding_rule {\n    entry_protocol  = \"tcp\"\n    entry_port      = 443\n    target_protocol = \"tcp\"\n    target_port     = 443\n  }\n\n  forwarding_rule {\n    entry_protocol  = \"tcp\"\n    entry_port      = 3333\n    target_protocol = \"tcp\"\n    target_port     = 30300\n  }\n}\n</code></pre> <p>Define DNS A records to <code>digitalocean_loadbalancer.ingress.ip</code> instead of CNAMEs.</p>"},{"location":"architecture/digitalocean/#firewalls","title":"Firewalls","text":"<p>Add firewall rules matching worker droplets with <code>worker_tag</code>.</p> <pre><code>resource \"digitalocean_firewall\" \"some-app\" {\n  name = \"some-app\"\n  tags = [module.nemo.worker_tag]\n  inbound_rule {\n    protocol         = \"tcp\"\n    port_range       = \"30300\"\n    source_addresses = [\"0.0.0.0/0\"]\n  }\n}\n</code></pre>"},{"location":"architecture/digitalocean/#ipv6","title":"IPv6","text":"<p>DigitalOcean load balancers do not have an IPv6 address. Resolving individual droplets' IPv6 addresses and using an Ingress controller with <code>hostNetwork: true</code> is a possible way to serve IPv6 traffic, if one must.</p> IPv6 Feature Supported Node IPv6 address Yes Node Outbound IPv6 Yes Kubernetes Ingress IPv6 Possible <ol> <li> <p>DigitalOcean does offer load balancers. We've opted not to use them to keep the DigitalOcean cluster cheap for developers.\u00a0\u21a9\u21a9\u21a9</p> </li> </ol>"},{"location":"architecture/google-cloud/","title":"Google Cloud","text":""},{"location":"architecture/google-cloud/#load-balancing","title":"Load Balancing","text":""},{"location":"architecture/google-cloud/#kube-apiserver","title":"kube-apiserver","text":"<p>A global forwarding rule (IPv4 anycast) and TCP Proxy distribute IPv4 TCP/443 traffic across a backend service with zonal instance groups of controller(s) with a healthy <code>kube-apiserver</code> (TCP/6443). Clusters with multiple controllers span zones in a region to tolerate zone outages.</p> <p>Notes:</p> <ul> <li>GCP TCP Proxy limits external port options (e.g. must use 443, not 6443)</li> <li>A regional NLB cannot be used for multi-controller (see #190)</li> </ul>"},{"location":"architecture/google-cloud/#httphttp-ingress","title":"HTTP/HTTP Ingress","text":"<p>Global forwarding rules and a TCP Proxy distribute IPv4/IPv6 TCP/80 and TCP/443 traffic across a managed instance group of workers with a healthy Ingress Controller. Workers span zones in a region to tolerate zone outages.</p> <p>The IPv4 and IPv6 anycast addresses are output as <code>ingress_static_ipv4</code> and <code>ingress_static_ipv6</code> for use in DNS A and AAAA records. See Ingress on Google Cloud.</p>"},{"location":"architecture/google-cloud/#tcpudp-services","title":"TCP/UDP Services","text":"<p>Load balance TCP/UDP applications by adding a forwarding rule to the worker target pool (output).</p> <pre><code># Static IPv4 address for some-app Load Balancing\nresource \"google_compute_address\" \"some-app-ipv4\" {\n  name = \"some-app-ipv4\"\n}\n\n# Forward IPv4 TCP traffic to the target pool\nresource \"google_compute_forwarding_rule\" \"some-app-tcp\" {\n  name        = \"some-app-tcp\"\n  ip_address  = google_compute_address.some-app-ipv4.address\n  ip_protocol = \"TCP\"\n  port_range  = \"3333\"\n  target      = module.yavin.worker_target_pool\n}\n\n\n# Forward IPv4 UDP traffic to the target pool\nresource \"google_compute_forwarding_rule\" \"some-app-udp\" {\n  name        = \"some-app-udp\"\n  ip_address  = google_compute_address.some-app-ipv4.address\n  ip_protocol = \"UDP\"\n  port_range  = \"3333\"\n  target      = module.yavin.worker_target_pool\n}\n</code></pre> <p>Notes:</p> <ul> <li>GCP Global Load Balancers aren't appropriate for custom TCP/UDP.<ul> <li>Backend Services require a named port corresponding to an instance group (output by Typhoon) port. Typhoon shouldn't accept a list of every TCP/UDP service that may later be hosted on the cluster.</li> <li>Backend Services don't support UDP (i.e. rules out global load balancers)</li> </ul> </li> <li>IPv4 Only: Regional Load Balancers use a regional IPv4 address (e.g. <code>google_compute_address</code>), no IPv6.</li> <li>Forward rules don't support differing external and internal ports. Some Ingress controllers (e.g. nginx) can proxy TCP/UDP traffic to achieve this.</li> <li>Worker target pool health checks workers <code>HTTP:10254/healthz</code> (i.e. <code>nginx-ingress</code>)</li> </ul>"},{"location":"architecture/google-cloud/#firewalls","title":"Firewalls","text":"<p>Add firewall rules to the cluster's network.</p> <pre><code>resource \"google_compute_firewall\" \"some-app\" {\n  name    = \"some-app\"\n  network = module.yavin.network_self_link\n\n  allow {\n    protocol = \"tcp\"\n    ports    = [3333]\n  }\n\n  allow {\n    protocol = \"udp\"\n    ports    = [3333]\n  }\n\n  source_ranges = [\"0.0.0.0/0\"]\n  target_tags   = [\"yavin-worker\"]\n}\n</code></pre>"},{"location":"architecture/google-cloud/#ipv6","title":"IPv6","text":"<p>Applications exposed via HTTP/HTTPS Ingress can be served over IPv6.</p> IPv6 Feature Supported Node IPv6 address No Node Outbound IPv6 No Kubernetes Ingress IPv6 Yes"},{"location":"architecture/operating-systems/","title":"Operating Systems","text":"<p>Typhoon supports Fedora CoreOS and Flatcar Linux. These operating systems were chosen because they offer:</p> <ul> <li>Minimalism and focus on clustered operation</li> <li>Automated and atomic operating system upgrades</li> <li>Declarative and immutable configuration</li> <li>Optimization for containerized applications</li> </ul> <p>Together, they diversify Typhoon to support a range of container technologies.</p> <ul> <li>Fedora CoreOS: rpm-ostree, podman, containerd</li> <li>Flatcar Linux: Gentoo core, docker, containerd</li> </ul>"},{"location":"architecture/operating-systems/#host-properties","title":"Host Properties","text":"Property Flatcar Linux Fedora CoreOS Kernel ~5.15.x ~6.5.x systemd 252 254 Username core core Ignition system Ignition v3.x spec Ignition v3.x spec storage driver overlay2 (extfs) overlay2 (xfs) logging driver json-file journald cgroup driver systemd systemd cgroup version v2 v2 Networking systemd-networkd NetworkManager Resolver systemd-resolved systemd-resolved"},{"location":"architecture/operating-systems/#kubernetes-properties","title":"Kubernetes Properties","text":"Property Flatcar Linux Fedora CoreOS single-master all platforms all platforms multi-master all platforms all platforms control plane static pods static pods Container Runtime containerd 1.5.9 containerd 1.6.0 kubelet image kubelet image with upstream binary kubelet image with upstream binary control plane images upstream images upstream images on-host etcd docker podman on-host kubelet docker podman CNI plugins calico, cilium, flannel calico, cilium, flannel coordinated drain &amp; OS update FLUO addon fleetlock"},{"location":"architecture/operating-systems/#directory-locations","title":"Directory Locations","text":"<p>Typhoon conventional directories.</p> Kubelet setting Host location cni-conf-dir /etc/cni/net.d pod-manifest-path /etc/kubernetes/manifests volume-plugin-dir /var/lib/kubelet/volumeplugins"},{"location":"fedora-coreos/aws/","title":"AWS","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on AWS with Fedora CoreOS.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"fedora-coreos/aws/#requirements","title":"Requirements","text":"<ul> <li>AWS Account and IAM credentials</li> <li>AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"fedora-coreos/aws/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"fedora-coreos/aws/#provider","title":"Provider","text":"<p>Login to your AWS IAM dashboard and find your IAM user. Select \"Security Credentials\" and create an access key. Save the id and secret to a file that can be referenced in configs.</p> <pre><code>[default]\naws_access_key_id = xxx\naws_secret_access_key = yyy\n</code></pre> <p>Configure the AWS provider to use your access key credentials in a <code>providers.tf</code> file.</p> <pre><code>provider \"aws\" {\n  region                  = \"eu-central-1\"\n  shared_credentials_file = \"/home/user/.config/aws/credentials\"\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.13.0\"\n    }\n    aws = {\n      source = \"hashicorp/aws\"\n      version = \"4.61.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>aws</code> provider docs.</p> <p>Tip</p> <p>Regions are listed in docs or with <code>aws ec2 describe-regions</code>.</p>"},{"location":"fedora-coreos/aws/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>aws/fedora-coreos/kubernetes</code>.</p> <pre><code>module \"tempest\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"tempest\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  worker_count = 2\n  worker_type  = \"t3.small\"\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"fedora-coreos/aws/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_ed25519\nssh-add -L\n</code></pre>"},{"location":"fedora-coreos/aws/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 109 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\n...\nmodule.tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)\nmodule.tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)\nmodule.tempest.null_resource.bootstrap: Creation complete after 5m8s (ID: 3961816482286168143)\n\nApply complete! Resources: 109 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"fedora-coreos/aws/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-tempest\" {\n  content  = module.tempest.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/tempest-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/tempest-config\n$ kubectl get nodes\nNAME           STATUS  ROLES    AGE  VERSION\nip-10-0-3-155  Ready   &lt;none&gt;   10m  v1.29.3\nip-10-0-26-65  Ready   &lt;none&gt;   10m  v1.29.3\nip-10-0-41-21  Ready   &lt;none&gt;   10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                   READY  STATUS    RESTARTS  AGE\nkube-system   calico-node-1m5bf                      2/2    Running   0         34m\nkube-system   calico-node-7jmr1                      2/2    Running   0         34m\nkube-system   calico-node-bknc8                      2/2    Running   0         34m\nkube-system   coredns-1187388186-wx1lg               1/1    Running   0         34m\nkube-system   coredns-1187388186-qjnvp               1/1    Running   0         34m\nkube-system   kube-apiserver-ip-10-0-3-155           1/1    Running   0         34m\nkube-system   kube-controller-manager-ip-10-0-3-155  1/1    Running   0         34m\nkube-system   kube-proxy-14wxv                       1/1    Running   0         34m\nkube-system   kube-proxy-9vxh2                       1/1    Running   0         34m\nkube-system   kube-proxy-sbbsh                       1/1    Running   0         34m\nkube-system   kube-scheduler-ip-10-0-3-155           1/1    Running   1         34m\n</code></pre>"},{"location":"fedora-coreos/aws/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"fedora-coreos/aws/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"fedora-coreos/aws/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"tempest\" dns_zone AWS Route53 DNS zone \"aws.example.com\" dns_zone_id AWS Route53 DNS zone id \"Z3PAABBCFAKEC0\" ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\""},{"location":"fedora-coreos/aws/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a network load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>tempest.aws.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on AWS Route53. You can set this up once and create many clusters with unique names.</p> <pre><code>resource \"aws_route53_zone\" \"zone-for-clusters\" {\n  name = \"aws.example.com.\"\n}\n</code></pre> <p>Reference the DNS zone id with <code>aws_route53_zone.zone-for-clusters.zone_id</code>.</p> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Route53 (e.g. aws.mydomain.com) and update nameservers.</p>"},{"location":"fedora-coreos/aws/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type EC2 instance type for controllers \"t3.small\" See below worker_type EC2 instance type for workers \"t3.small\" See below os_stream Fedora CoreOS stream for compute instances \"stable\" \"testing\", \"next\" disk_size Size of the EBS volume in GB 30 100 disk_type Type of the EBS volume \"gp3\" standard, gp2, gp3, io1 disk_iops IOPS of the EBS volume 0 (i.e. auto) 400 worker_target_groups Target group ARNs to which worker instances should be added [] [aws_lb_target_group.app.id] worker_price Spot price in USD for worker instances or 0 to use on-demand instances 0 0.10 controller_snippets Controller Butane snippets [] examples worker_snippets Worker Butane snippets [] examples networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" network_mtu CNI interface MTU (calico only) 1480 8981 host_cidr CIDR IPv4 range to assign to EC2 instances \"10.0.0.0/16\" \"10.1.0.0/16\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid instance types.</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than <code>t2.small</code>. Smaller instances are not sufficient for running a controller.</p> <p>MTU</p> <p>If your EC2 instance type supports Jumbo frames (most do), we recommend you change the <code>network_mtu</code> to 8981! You will get better pod-to-pod bandwidth.</p>"},{"location":"fedora-coreos/aws/#spot","title":"Spot","text":"<p>Add <code>worker_price = \"0.10\"</code> to use spot instance workers (instead of \"on-demand\") and set a maximum spot price in USD. Clusters can tolerate spot market interuptions fairly well (reschedules pods, but cannot drain) to save money, with the tradeoff that requests for workers may go unfulfilled.</p>"},{"location":"fedora-coreos/azure/","title":"Azure","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on Azure with Fedora CoreOS.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"fedora-coreos/azure/#requirements","title":"Requirements","text":"<ul> <li>Azure account</li> <li>Azure DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"fedora-coreos/azure/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"fedora-coreos/azure/#provider","title":"Provider","text":"<p>Install the Azure <code>az</code> command line tool to authenticate with Azure.</p> <pre><code>az login\n</code></pre> <p>Configure the Azure provider in a <code>providers.tf</code> file.</p> <pre><code>provider \"azurerm\" {\n  features {}\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.13.0\"\n    }\n    azurerm = {\n      source = \"hashicorp/azurerm\"\n      version = \"3.50.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>azurerm</code> provider docs.</p>"},{"location":"fedora-coreos/azure/#fedora-coreos-images","title":"Fedora CoreOS Images","text":"<p>Fedora CoreOS publishes images for Azure, but does not yet upload them. Azure allows custom images to be uploaded to a storage account bucket and imported.</p> <p>Download a Fedora CoreOS Azure VHD image, decompress it, and upload it to an Azure storage account container (i.e. bucket) via the UI (quite slow).</p> <pre><code>xz -d fedora-coreos-36.20220716.3.1-azure.x86_64.vhd.xz\n</code></pre> <p>Create an Azure disk (note disk ID) and create an Azure image from it (note image ID).</p> <pre><code>az disk create --name fedora-coreos-36.20220716.3.1 -g GROUP --source https://BUCKET.blob.core.windows.net/fedora-coreos/fedora-coreos-36.20220716.3.1-azure.x86_64.vhd\n\naz image create --name fedora-coreos-36.20220716.3.1 -g GROUP --os-type=linux --source /subscriptions/some/path/providers/Microsoft.Compute/disks/fedora-coreos-36.20220716.3.1\n</code></pre> <p>Set the os_image in the next step.</p>"},{"location":"fedora-coreos/azure/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>azure/fedora-coreos/kubernetes</code>.</p> <pre><code>module \"ramius\" {\n  source = \"git::https://github.com/poseidon/typhoon//azure/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Azure\n  cluster_name   = \"ramius\"\n  region         = \"centralus\"\n  dns_zone       = \"azure.example.com\"\n  dns_zone_group = \"example-group\"\n\n  # configuration\n  os_image           = \"/subscriptions/some/path/Microsoft.Compute/images/fedora-coreos-36.20220716.3.1\"\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  worker_count    = 2\n  host_cidr       = \"10.0.0.0/20\"\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"fedora-coreos/azure/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_ed25519\nssh-add -L\n</code></pre>"},{"location":"fedora-coreos/azure/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 86 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\n...\nmodule.ramius.null_resource.bootstrap: Still creating... (6m50s elapsed)\nmodule.ramius.null_resource.bootstrap: Still creating... (7m0s elapsed)\nmodule.ramius.null_resource.bootstrap: Creation complete after 7m8s (ID: 3961816482286168143)\n\nApply complete! Resources: 69 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"fedora-coreos/azure/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-ramius\" {\n  content  = module.ramius.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/ramius-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/ramius-config\n$ kubectl get nodes\nNAME                  STATUS  ROLES   AGE  VERSION\nramius-controller-0   Ready   &lt;none&gt;  24m  v1.29.3\nramius-worker-000001  Ready   &lt;none&gt;  25m  v1.29.3\nramius-worker-000002  Ready   &lt;none&gt;  24m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                        READY  STATUS    RESTARTS  AGE\nkube-system   coredns-7c6fbb4f4b-b6qzx                    1/1    Running   0         26m\nkube-system   coredns-7c6fbb4f4b-j2k3d                    1/1    Running   0         26m\nkube-system   calico-node-1m5bf                           2/2    Running   0         26m\nkube-system   calico-node-7jmr1                           2/2    Running   0         26m\nkube-system   calico-node-bknc8                           2/2    Running   0         26m\nkube-system   kube-apiserver-ramius-controller-0          1/1    Running   0         26m\nkube-system   kube-controller-manager-ramius-controller-0 1/1    Running   0         26m\nkube-system   kube-proxy-j4vpq                            1/1    Running   0         26m\nkube-system   kube-proxy-jxr5d                            1/1    Running   0         26m\nkube-system   kube-proxy-lbdw5                            1/1    Running   0         26m\nkube-system   kube-scheduler-ramius-controller-0          1/1    Running   0         26m\n</code></pre>"},{"location":"fedora-coreos/azure/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"fedora-coreos/azure/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"fedora-coreos/azure/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"ramius\" region Azure region \"centralus\" dns_zone Azure DNS zone \"azure.example.com\" dns_zone_group Resource group where the Azure DNS zone resides \"global\" os_image Fedora CoreOS image for instances \"/subscriptions/..../custom-image\" ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\" <p>Tip</p> <p>Regions are shown in docs or with <code>az account list-locations --output table</code>.</p>"},{"location":"fedora-coreos/azure/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>ramius.azure.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on Azure DNS. You can set this up once and create many clusters with unique names.</p> <pre><code># Azure resource group for DNS zone\nresource \"azurerm_resource_group\" \"global\" {\n  name     = \"global\"\n  location = \"centralus\"\n}\n\n# DNS zone for clusters\nresource \"azurerm_dns_zone\" \"clusters\" {\n  resource_group_name = azurerm_resource_group.global.name\n\n  name      = \"azure.example.com\"\n  zone_type = \"Public\"\n}\n</code></pre> <p>Reference the DNS zone with <code>azurerm_dns_zone.clusters.name</code> and its resource group with <code>\"azurerm_resource_group.global.name</code>.</p> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Azure DNS (e.g. azure.mydomain.com) and update nameservers.</p>"},{"location":"fedora-coreos/azure/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type Machine type for controllers \"Standard_B2s\" See below worker_type Machine type for workers \"Standard_D2as_v5\" See below disk_size Size of the disk in GB 30 100 worker_priority Set priority to Spot to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time Regular Spot controller_snippets Controller Butane snippets [] example worker_snippets Worker Butane snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" host_cidr CIDR IPv4 range to assign to instances \"10.0.0.0/16\" \"10.0.0.0/20\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid machine types and their specs. Use <code>az vm list-skus</code> to get the identifier.</p> <p>Warning</p> <p>Unlike AWS and GCP, Azure requires its virtual networks to have non-overlapping IPv4 CIDRs (yeah, go figure). Instead of each cluster just using <code>10.0.0.0/16</code> for instances, each Azure cluster's <code>host_cidr</code> must be non-overlapping (e.g. 10.0.0.0/20 for the 1<sup>st</sup> cluster, 10.0.16.0/20 for the 2<sup>nd</sup> cluster, etc).</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than <code>Standard_B2s</code>. Smaller instances are not sufficient for running a controller.</p>"},{"location":"fedora-coreos/azure/#spot-priority","title":"Spot Priority","text":"<p>Add <code>worker_priority=Spot</code> to use Spot Priority workers that run on Azure's surplus capacity at lower cost, but with the tradeoff that they can be deallocated at random. Spot priority VMs are Azure's analog to AWS spot instances or GCP premptible instances.</p>"},{"location":"fedora-coreos/bare-metal/","title":"Bare-Metal","text":"<p>In this tutorial, we'll network boot and provision a Kubernetes v1.29.3 cluster on bare-metal with Fedora CoreOS.</p> <p>First, we'll deploy a Matchbox service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Fedora CoreOS to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"fedora-coreos/bare-metal/#requirements","title":"Requirements","text":"<ul> <li>Machines with 2GB RAM, 30GB disk, PXE-enabled NIC, IPMI</li> <li>PXE-enabled network boot environment (with HTTPS support)</li> <li>Matchbox v0.6+ deployment with API enabled</li> <li>Matchbox credentials <code>client.crt</code>, <code>client.key</code>, <code>ca.crt</code></li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"fedora-coreos/bare-metal/#machines","title":"Machines","text":"<p>Collect a MAC address from each machine. For machines with multiple PXE-enabled NICs, pick one of the MAC addresses. MAC addresses will be used to match machines to profiles during network boot.</p> <ul> <li>52:54:00:a1:9c:ae (node1)</li> <li>52:54:00:b2:2f:86 (node2)</li> <li>52:54:00:c3:61:77 (node3)</li> </ul> <p>Configure each machine to boot from the disk through IPMI or the BIOS menu.</p> <pre><code>ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent\n</code></pre> <p>During provisioning, you'll explicitly set the boot device to <code>pxe</code> for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.</p> <p>Ask your hardware vendor to provide MACs and preconfigure IPMI, if possible. With it, you can rack new servers, <code>terraform apply</code> with new info, and power on machines that network boot and provision into clusters.</p>"},{"location":"fedora-coreos/bare-metal/#dns","title":"DNS","text":"<p>Create a DNS A (or AAAA) record for each node's default interface. Create a record that resolves to each controller node (or re-use the node record if there's one controller).</p> <ul> <li>node1.example.com (node1)</li> <li>node2.example.com (node2)</li> <li>node3.example.com (node3)</li> <li>myk8s.example.com (node1)</li> </ul> <p>Cluster nodes will be configured to refer to the control plane and themselves by these fully qualified names and they'll be used in generated TLS certificates.</p>"},{"location":"fedora-coreos/bare-metal/#matchbox","title":"Matchbox","text":"<p>Matchbox is an open-source app that matches network-booted bare-metal machines (based on labels like MAC, UUID, etc.) to profiles to automate cluster provisioning.</p> <p>Install Matchbox on a Kubernetes cluster or dedicated server.</p> <ul> <li>Installing on Kubernetes (recommended)</li> <li>Installing on a server</li> </ul> <p>Tip</p> <p>Deploy Matchbox as service that can be accessed by all of your bare-metal machines globally. This provides a single endpoint to use Terraform to manage bare-metal clusters at different sites. Typhoon will never include secrets in provisioning user-data so you may even deploy matchbox publicly.</p> <p>Matchbox provides a TLS client-authenticated API that clients, like Terraform, can use to manage machine matching and profiles. Think of it like a cloud provider API, but for creating bare-metal instances.</p> <p>Generate TLS client credentials. Save the <code>ca.crt</code>, <code>client.crt</code>, and <code>client.key</code> where they can be referenced in Terraform configs.</p> <pre><code>mv ca.crt client.crt client.key ~/.config/matchbox/\n</code></pre> <p>Verify the matchbox read-only HTTP endpoints are accessible (port is configurable).</p> <pre><code>$ curl http://matchbox.example.com:8080\nmatchbox\n</code></pre> <p>Verify your TLS client certificate and key can be used to access the Matchbox API (port is configurable).</p> <pre><code>$ openssl s_client -connect matchbox.example.com:8081 \\\n  -CAfile ~/.config/matchbox/ca.crt \\\n  -cert ~/.config/matchbox/client.crt \\\n  -key ~/.config/matchbox/client.key\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#pxe-environment","title":"PXE Environment","text":"<p>Create an iPXE-enabled network boot environment. Configure PXE clients to chainload iPXE firmware compiled to support HTTPS downloads. Instruct iPXE clients to chainload from your Matchbox service's <code>/boot.ipxe</code> endpoint.</p> <p>For networks already supporting iPXE clients, you can add a <code>default.ipxe</code> config.</p> <pre><code># /var/www/html/ipxe/default.ipxe\nchain http://matchbox.foo:8080/boot.ipxe\n</code></pre> <p>For networks with Ubiquiti Routers, you can configure the router itself to chainload machines to iPXE and Matchbox.</p> <p>Read about the many ways to setup a compliant iPXE-enabled network. There is quite a bit of flexibility:</p> <ul> <li>Continue using existing DHCP, TFTP, or DNS services</li> <li>Configure specific machines, subnets, or architectures to chainload from Matchbox</li> <li>Place Matchbox behind a menu entry (timeout and default to Matchbox)</li> </ul> <p>TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.</p> <p>Warning</p> <p>Compile iPXE from source with support for HTTPS downloads. iPXE's pre-built firmware binaries do not enable this. Fedora CoreOS downloads are HTTPS-only.</p>"},{"location":"fedora-coreos/bare-metal/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#provider","title":"Provider","text":"<p>Configure the Matchbox provider to use your Matchbox API endpoint and client certificate in a <code>providers.tf</code> file.</p> <pre><code>provider \"matchbox\" {\n  endpoint    = \"matchbox.example.com:8081\"\n  client_cert = file(\"~/.config/matchbox/client.crt\")\n  client_key  = file(\"~/.config/matchbox/client.key\")\n  ca          = file(\"~/.config/matchbox/ca.crt\")\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.13.0\"\n    }\n    matchbox = {\n      source = \"poseidon/matchbox\"\n      version = \"0.5.2\"\n    }\n  }\n}\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>bare-metal/fedora-coreos/kubernetes</code>.</p> <pre><code>module \"mercury\" {\n  source = \"git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # bare-metal\n  cluster_name            = \"mercury\"\n  matchbox_http_endpoint  = \"http://matchbox.example.com\"\n  os_stream               = \"stable\"\n  os_version              = \"32.20201104.3.0\"\n\n  # configuration\n  k8s_domain_name    = \"node1.example.com\"\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # machines\n  controllers = [{\n    name   = \"node1\"\n    mac    = \"52:54:00:a1:9c:ae\"\n    domain = \"node1.example.com\"\n  }]\n  workers = [\n    {\n      name   = \"node2\",\n      mac    = \"52:54:00:b2:2f:86\"\n      domain = \"node2.example.com\"\n    },\n    {\n      name   = \"node3\",\n      mac    = \"52:54:00:c3:61:77\"\n      domain = \"node3.example.com\"\n    }\n  ]\n}\n</code></pre> <p>Workers with similar features can be defined inline using the <code>workers</code> field as shown above. It's also possible to define discrete workers that attach to the cluster. Discrete workers are more advanced, but more verbose.</p> <pre><code>module \"mercury-node1\" {\n  source = \"git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes/worker?ref=v1.29.3\"\n\n  # bare-metal\n  cluster_name = \"mercury\"\n  matchbox_http_endpoint  = \"http://matchbox.example.com\"\n  os_stream               = \"stable\"\n  os_version              = \"32.20201104.3.0\"\n\n  # configuration\n  name               = \"node2\"\n  mac                = \"52:54:00:b2:2f:86\"\n  domain             = \"node2.example.com\"\n  kubeconfig         = module.mercury.kubeconfig\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  snippets       = []\n  node_labels    = []\n  node_tains     = []\n  install_disk   = \"/dev/vda\"\n  cached_install = false\n}\n\n...\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"fedora-coreos/bare-metal/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_ed25519\nssh-add -L\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 55 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes. Terraform will generate bootstrap assets and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.</p> <pre><code>$ terraform apply\nmodule.mercury.null_resource.copy-kubeconfig.0: Provisioning with 'file'...\nmodule.mercury.null_resource.copy-etcd-secrets.0: Provisioning with 'file'...\nmodule.mercury.null_resource.copy-kubeconfig.0: Still creating... (10s elapsed)\nmodule.mercury.null_resource.copy-etcd-secrets.0: Still creating... (10s elapsed)\n...\n</code></pre> <p>Apply will then loop until it can successfully copy credentials to each machine and start the one-time Kubernetes bootstrap service. Proceed to the next step while this loops.</p>"},{"location":"fedora-coreos/bare-metal/#power","title":"Power","text":"<p>Power on each machine with the boot device set to <code>pxe</code> for the next boot only.</p> <pre><code>ipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe\nipmitool -H node1.example.com -U USER -P PASS power on\n</code></pre> <p>Machines will network boot, install Fedora CoreOS to disk, reboot into the disk install, and provision themselves as controllers or workers.</p> <p>If this is the first test of your PXE-enabled network boot environment, watch the SOL console of a machine to spot any misconfigurations.</p>"},{"location":"fedora-coreos/bare-metal/#bootstrap","title":"Bootstrap","text":"<p>Wait for the <code>bootstrap</code> step to finish bootstrapping the Kubernetes control plane. This may take 5-15 minutes depending on your network.</p> <pre><code>module.mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)\nmodule.mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)\n\nApply complete! Resources: 55 added, 0 changed, 0 destroyed.\n</code></pre> <p>To watch the bootstrap process in detail, SSH to the first controller and journal the logs.</p> <pre><code>$ ssh core@node1.example.com\n$ journalctl -f -u bootstrap\npodman[1750]: The connection to the server cluster.example.com:6443 was refused - did you specify the right host or port?\npodman[1750]: Waiting for static pod control plane\n...\npodman[1750]: serviceaccount/calico-node unchanged\nsystemd[1]: Started Kubernetes control plane.\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-mercury\" {\n  content  = module.mercury.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/mercury-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/mercury-config\n$ kubectl get nodes\nNAME                STATUS  ROLES   AGE  VERSION\nnode1.example.com   Ready   &lt;none&gt;  10m  v1.29.3\nnode2.example.com   Ready   &lt;none&gt;  10m  v1.29.3\nnode3.example.com   Ready   &lt;none&gt;  10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE\nkube-system   calico-node-6qp7f                          2/2       Running   1          11m\nkube-system   calico-node-gnjrm                          2/2       Running   0          11m\nkube-system   calico-node-llbgt                          2/2       Running   0          11m\nkube-system   coredns-1187388186-dj3pd                   1/1       Running   0          11m\nkube-system   coredns-1187388186-mx9rt                   1/1       Running   0          11m\nkube-system   kube-apiserver-node1.example.com           1/1       Running   0          11m\nkube-system   kube-controller-manager-node1.example.com  1/1       Running   1          11m\nkube-system   kube-proxy-50sd4                           1/1       Running   0          11m\nkube-system   kube-proxy-bczhp                           1/1       Running   0          11m\nkube-system   kube-proxy-mp2fw                           1/1       Running   0          11m\nkube-system   kube-scheduler-node1.example.com           1/1       Running   0          11m\n</code></pre>"},{"location":"fedora-coreos/bare-metal/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"fedora-coreos/bare-metal/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"fedora-coreos/bare-metal/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name \"mercury\" matchbox_http_endpoint Matchbox HTTP read-only endpoint \"http://matchbox.example.com:port\" os_stream Fedora CoreOS release stream \"stable\" os_version Fedora CoreOS version to PXE and install \"32.20201104.3.0\" k8s_domain_name FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint \"myk8s.example.com\" ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3Nz...\" controllers List of controller machine detail objects (unique name, identifying MAC address, FQDN) <code>[{name=\"node1\", mac=\"52:54:00:a1:9c:ae\", domain=\"node1.example.com\"}]</code>"},{"location":"fedora-coreos/bare-metal/#optional","title":"Optional","text":"Name Description Default Example workers List of worker machine detail objects (unique name, identifying MAC address, FQDN) [] <code>[{name=\"node2\", mac=\"52:54:00:b2:2f:86\", domain=\"node2.example.com\"}, {name=\"node3\", mac=\"52:54:00:c3:61:77\", domain=\"node3.example.com\"}]</code> cached_install PXE boot and install from the Matchbox <code>/assets</code> cache. Admin MUST have downloaded Fedora CoreOS images into the cache false true install_disk Disk device where Fedora CoreOS should be installed \"sda\" (not \"/dev/sda\" like Container Linux) \"sdb\" networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" network_mtu CNI interface MTU (calico-only) 1480 - snippets Map from machine names to lists of Butane snippets {} examples network_ip_autodetection_method Method to detect host IPv4 address (calico-only) \"first-found\" \"can-reach=10.0.0.1\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" kernel_args Additional kernel args to provide at PXE boot [] [\"kvm-intel.nested=1\"] worker_node_labels Map from worker name to list of initial node labels {} {\"node2\" = [\"role=special\"]} worker_node_taints Map from worker name to list of initial node taints {} {\"node2\" = [\"role=special:NoSchedule\"]}"},{"location":"fedora-coreos/digitalocean/","title":"DigitalOcean","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on DigitalOcean with Fedora CoreOS.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"fedora-coreos/digitalocean/#requirements","title":"Requirements","text":"<ul> <li>Digital Ocean Account and Token</li> <li>Digital Ocean Domain (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"fedora-coreos/digitalocean/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"fedora-coreos/digitalocean/#provider","title":"Provider","text":"<p>Login to DigitalOcean. Or if you don't have one, create an account with our referral link to get free credits.</p> <p>Generate a Personal Access Token with read/write scope from the API tab. Write the token to a file that can be referenced in configs.</p> <pre><code>mkdir -p ~/.config/digital-ocean\necho \"TOKEN\" &gt; ~/.config/digital-ocean/token\n</code></pre> <p>Configure the DigitalOcean provider to use your token in a <code>providers.tf</code> file.</p> <pre><code>provider \"digitalocean\" {\n  token = \"${chomp(file(\"~/.config/digital-ocean/token\"))}\"\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.13.0\"\n    }\n    digitalocean = {\n      source = \"digitalocean/digitalocean\"\n      version = \"2.27.1\"\n    }\n  }\n}\n</code></pre>"},{"location":"fedora-coreos/digitalocean/#fedora-coreos-images","title":"Fedora CoreOS Images","text":"<p>Fedora CoreOS publishes images for DigitalOcean, but does not yet upload them. DigitalOcean allows custom images to be uploaded via URL or file.</p> <p>Import a Fedora CoreOS image via URL to desired a region(s).</p> <pre><code>data \"digitalocean_image\" \"fedora-coreos-31-20200323-3-2\" {\n  name = \"fedora-coreos-31.20200323.3.2-digitalocean.x86_64.qcow2.gz\"\n}\n</code></pre> <p>Set the os_image in the next step.</p>"},{"location":"fedora-coreos/digitalocean/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>digital-ocean/fedora-coreos/kubernetes</code>.</p> <pre><code>module \"nemo\" {\n  source = \"git::https://github.com/poseidon/typhoon//digital-ocean/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Digital Ocean\n  cluster_name = \"nemo\"\n  region       = \"nyc3\"\n  dns_zone     = \"digital-ocean.example.com\"\n\n  # configuration\n  os_image         = data.digitalocean_image.fedora-coreos-31-20200323-3-2.id\n  ssh_fingerprints = [\"d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7\"]\n\n  # optional\n  worker_count = 2\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"fedora-coreos/digitalocean/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_ed25519\nssh-add -L\n</code></pre>"},{"location":"fedora-coreos/digitalocean/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 54 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\nmodule.nemo.null_resource.bootstrap: Still creating... (30s elapsed)\nmodule.nemo.null_resource.bootstrap: Provisioning with 'remote-exec'...\n...\nmodule.nemo.null_resource.bootstrap: Still creating... (6m20s elapsed)\nmodule.nemo.null_resource.bootstrap: Creation complete (ID: 7599298447329218468)\n\nApply complete! Resources: 42 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 3-6 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"fedora-coreos/digitalocean/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-nemo\" {\n  content  = module.nemo.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/nemo-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/nemo-config\n$ kubectl get nodes\nNAME               STATUS  ROLES   AGE  VERSION\n10.132.110.130     Ready   &lt;none&gt;  10m  v1.29.3\n10.132.115.81      Ready   &lt;none&gt;  10m  v1.29.3\n10.132.124.107     Ready   &lt;none&gt;  10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE\nkube-system   coredns-1187388186-ld1j7                   1/1       Running   0          11m\nkube-system   coredns-1187388186-rdhf7                   1/1       Running   0          11m\nkube-system   calico-node-1m5bf                          2/2       Running   0          11m\nkube-system   calico-node-7jmr1                          2/2       Running   0          11m\nkube-system   calico-node-bknc8                          2/2       Running   0          11m\nkube-system   kube-apiserver-ip-10.132.115.81            1/1       Running   0          11m\nkube-system   kube-controller-manager-ip-10.132.115.81   1/1       Running   0          11m\nkube-system   kube-proxy-6kxjf                           1/1       Running   0          11m\nkube-system   kube-proxy-fh3td                           1/1       Running   0          11m\nkube-system   kube-proxy-k35rc                           1/1       Running   0          11m\nkube-system   kube-scheduler-ip-10.132.115.81            1/1       Running   0          11m\n</code></pre>"},{"location":"fedora-coreos/digitalocean/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"fedora-coreos/digitalocean/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"fedora-coreos/digitalocean/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"nemo\" region Digital Ocean region \"nyc1\", \"sfo2\", \"fra1\", tor1\" dns_zone Digital Ocean domain (i.e. DNS zone) \"do.example.com\" os_image Fedora CoreOS image for instances \"custom-image-id\" ssh_fingerprints SSH public key fingerprints [\"d7:9d...\"]"},{"location":"fedora-coreos/digitalocean/#dns-zone","title":"DNS Zone","text":"<p>Clusters create DNS A records <code>${cluster_name}.${dns_zone}</code> to resolve to controller droplets (round robin). This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>nemo.do.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain in DigitalOcean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.</p> <pre><code># Declare a DigitalOcean record to also create a zone file\nresource \"digitalocean_domain\" \"zone-for-clusters\" {\n  name       = \"do.example.com\"\n  ip_address = \"8.8.8.8\"\n}\n</code></pre> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on DigitalOcean (e.g. do.mydomain.com) and update nameservers.</p>"},{"location":"fedora-coreos/digitalocean/#ssh-fingerprints","title":"SSH Fingerprints","text":"<p>DigitalOcean droplets are created with your SSH public key \"fingerprint\" (i.e. MD5 hash) to allow access. If your SSH public key is at <code>~/.ssh/id_ed25519.pub</code>, find the fingerprint with,</p> <pre><code>ssh-keygen -E md5 -lf ~/.ssh/id_ed25519.pub | awk '{print $2}'\nMD5:d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7\n</code></pre> <p>If you use <code>ssh-agent</code> (e.g. Yubikey for SSH), find the fingerprint with,</p> <pre><code>ssh-add -l -E md5\n256 MD5:20:d0:eb:ad:50:b0:09:6d:4b:ba:ad:7c:9c:c1:39:24 foo@xample.com (ED25519)\n</code></pre> <p>Digital Ocean requires the SSH public key be uploaded to your account, so you may also find the fingerprint under Settings -&gt; Security. Finally, if you don't have an SSH key, create one now.</p>"},{"location":"fedora-coreos/digitalocean/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type Droplet type for controllers \"s-2vcpu-2gb\" s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... worker_type Droplet type for workers \"s-1vcpu-2gb\" s-1vcpu-2gb, s-2vcpu-2gb, ... controller_snippets Controller Butane snippets [] example worker_snippets Worker Butane snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" <p>Check the list of valid droplet types or use <code>doctl compute size list</code>.</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than 2GB. Smaller droplets are not sufficient for running a controller and bootstrapping will fail.</p>"},{"location":"fedora-coreos/google-cloud/","title":"Google Cloud","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on Google Compute Engine with Fedora CoreOS.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"fedora-coreos/google-cloud/#requirements","title":"Requirements","text":"<ul> <li>Google Cloud Account and Service Account</li> <li>Google Cloud DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"fedora-coreos/google-cloud/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"fedora-coreos/google-cloud/#provider","title":"Provider","text":"<p>Login to your Google Console API Manager and select a project, or signup if you don't have an account.</p> <p>Select \"Credentials\" and create a service account key. Choose the \"Compute Engine Admin\" and \"DNS Administrator\" roles and save the JSON private key to a file that can be referenced in configs.</p> <pre><code>mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json\n</code></pre> <p>Configure the Google Cloud provider to use your service account key, project-id, and region in a <code>providers.tf</code> file.</p> <pre><code>provider \"google\" {\n  project     = \"project-id\"\n  region      = \"us-central1\"\n  credentials = file(\"~/.config/google-cloud/terraform.json\")\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.13.0\"\n    }\n    google = {\n      source = \"hashicorp/google\"\n      version = \"4.59.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>google</code> provider docs.</p> <p>Tip</p> <p>Regions are listed in docs or with <code>gcloud compute regions list</code>. A project may contain multiple clusters across different regions.</p>"},{"location":"fedora-coreos/google-cloud/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>google-cloud/fedora-coreos/kubernetes</code>.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  region        = \"us-central1\"\n  dns_zone      = \"example.com\"\n  dns_zone_name = \"example-zone\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-ed25519 AAAAB3Nz...\"\n\n  # optional\n  worker_count = 2\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"fedora-coreos/google-cloud/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_ed25519\nssh-add -L\n</code></pre>"},{"location":"fedora-coreos/google-cloud/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 78 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\nmodule.yavin.null_resource.bootstrap: Still creating... (10s elapsed)\n...\nmodule.yavin.null_resource.bootstrap: Still creating... (5m30s elapsed)\nmodule.yavin.null_resource.bootstrap: Still creating... (5m40s elapsed)\nmodule.yavin.null_resource.bootstrap: Creation complete (ID: 5768638456220583358)\n\nApply complete! Resources: 78 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"fedora-coreos/google-cloud/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-yavin\" {\n  content  = module.yavin.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/yavin-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/yavin-config\n$ kubectl get nodes\nNAME                                       ROLES    STATUS  AGE  VERSION\nyavin-controller-0.c.example-com.internal  &lt;none&gt;   Ready   6m   v1.29.3\nyavin-worker-jrbf.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\nyavin-worker-mzdm.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                      READY  STATUS    RESTARTS  AGE\nkube-system   calico-node-1cs8z                         2/2    Running   0         6m\nkube-system   calico-node-d1l5b                         2/2    Running   0         6m\nkube-system   calico-node-sp9ps                         2/2    Running   0         6m\nkube-system   coredns-1187388186-dkh3o                  1/1    Running   0         6m\nkube-system   coredns-1187388186-zj5dl                  1/1    Running   0         6m\nkube-system   kube-apiserver-controller-0               1/1    Running   0         6m\nkube-system   kube-controller-manager-controller-0      1/1    Running   0         6m\nkube-system   kube-proxy-117v6                          1/1    Running   0         6m\nkube-system   kube-proxy-9886n                          1/1    Running   0         6m\nkube-system   kube-proxy-njn47                          1/1    Running   0         6m\nkube-system   kube-scheduler-controller-0               1/1    Running   0         6m\n</code></pre>"},{"location":"fedora-coreos/google-cloud/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"fedora-coreos/google-cloud/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"fedora-coreos/google-cloud/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"yavin\" region Google Cloud region \"us-central1\" dns_zone Google Cloud DNS zone \"google-cloud.example.com\" dns_zone_name Google Cloud DNS zone name \"example-zone\" ssh_authorized_key SSH public key for user 'core' \"ssh-ed25519 AAAAB3NZ...\" <p>Check the list of valid regions and list Fedora CoreOS images with <code>gcloud compute images list | grep fedora-coreos</code>.</p>"},{"location":"fedora-coreos/google-cloud/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a TCP proxy load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>yavin.google-cloud.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on Google Cloud DNS. You can set this up once and create many clusters with unique names.</p> <pre><code>resource \"google_dns_managed_zone\" \"zone-for-clusters\" {\n  dns_name    = \"google-cloud.example.com.\"\n  name        = \"example-zone\"\n  description = \"Production DNS zone\"\n}\n</code></pre> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Google Cloud (e.g. google-cloud.mydomain.com) and update nameservers.</p>"},{"location":"fedora-coreos/google-cloud/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 3 worker_count Number of workers 1 3 controller_type Machine type for controllers \"n1-standard-1\" See below worker_type Machine type for workers \"n1-standard-1\" See below os_stream Fedora CoreOS stream for compute instances \"stable\" \"stable\", \"testing\", \"next\" disk_size Size of the disk in GB 30 100 worker_preemptible If enabled, Compute Engine will terminate workers randomly within 24 hours false true controller_snippets Controller Butane snippets [] examples worker_snippets Worker Butane snippets [] examples networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid machine types.</p>"},{"location":"fedora-coreos/google-cloud/#preemption","title":"Preemption","text":"<p>Add <code>worker_preemptible = \"true\"</code> to allow worker nodes to be preempted at random, but pay significantly less. Clusters tolerate stopping instances fairly well (reschedules pods, but cannot drain) and preemption provides a nice reward for running fault-tolerant cluster systems.`</p>"},{"location":"flatcar-linux/aws/","title":"AWS","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on AWS with Flatcar Linux.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a VPC, gateway, subnets, security groups, controller instances, worker auto-scaling group, network load balancer, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"flatcar-linux/aws/#requirements","title":"Requirements","text":"<ul> <li>AWS Account and IAM credentials</li> <li>AWS Route53 DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"flatcar-linux/aws/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"flatcar-linux/aws/#provider","title":"Provider","text":"<p>Login to your AWS IAM dashboard and find your IAM user. Select \"Security Credentials\" and create an access key. Save the id and secret to a file that can be referenced in configs.</p> <pre><code>[default]\naws_access_key_id = xxx\naws_secret_access_key = yyy\n</code></pre> <p>Configure the AWS provider to use your access key credentials in a <code>providers.tf</code> file.</p> <pre><code>provider \"aws\" {\n  region                  = \"eu-central-1\"\n  shared_credentials_file = \"/home/user/.config/aws/credentials\"\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.11.0\"\n    }\n    aws = {\n      source = \"hashicorp/aws\"\n      version = \"4.61.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>aws</code> provider docs.</p> <p>Tip</p> <p>Regions are listed in docs or with <code>aws ec2 describe-regions</code>.</p>"},{"location":"flatcar-linux/aws/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>aws/flatcar-linux/kubernetes</code>.</p> <pre><code>module \"tempest\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # AWS\n  cluster_name = \"tempest\"\n  dns_zone     = \"aws.example.com\"\n  dns_zone_id  = \"Z3PAABBCFAKEC0\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # optional\n  worker_count = 2\n  worker_type  = \"t3.small\"\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"flatcar-linux/aws/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_rsa\nssh-add -L\n</code></pre>"},{"location":"flatcar-linux/aws/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 109 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\n...\nmodule.tempest.null_resource.bootstrap: Still creating... (4m50s elapsed)\nmodule.tempest.null_resource.bootstrap: Still creating... (5m0s elapsed)\nmodule.tempest.null_resource.bootstrap: Creation complete after 11m8s (ID: 3961816482286168143)\n\nApply complete! Resources: 109 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"flatcar-linux/aws/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-tempest\" {\n  content  = module.tempest.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/tempest-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/tempest-config\n$ kubectl get nodes\nNAME           STATUS  ROLES   AGE  VERSION\nip-10-0-3-155  Ready   &lt;none&gt;  10m  v1.29.3\nip-10-0-26-65  Ready   &lt;none&gt;  10m  v1.29.3\nip-10-0-41-21  Ready   &lt;none&gt;  10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                      READY  STATUS    RESTARTS  AGE\nkube-system   calico-node-1m5bf                         2/2    Running   0         34m\nkube-system   calico-node-7jmr1                         2/2    Running   0         34m\nkube-system   calico-node-bknc8                         2/2    Running   0         34m\nkube-system   coredns-1187388186-wx1lg                  1/1    Running   0         34m\nkube-system   coredns-1187388186-qjnvp                  1/1    Running   0         34m\nkube-system   kube-apiserver-ip-10-0-3-155              1/1    Running   0         34m\nkube-system   kube-controller-manager-ip-10-0-3-155     1/1    Running   0         34m\nkube-system   kube-proxy-14wxv                          1/1    Running   0         34m\nkube-system   kube-proxy-9vxh2                          1/1    Running   0         34m\nkube-system   kube-proxy-sbbsh                          1/1    Running   0         34m\nkube-system   kube-scheduler-ip-10-0-3-155              1/1    Running   1         34m\n</code></pre>"},{"location":"flatcar-linux/aws/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"flatcar-linux/aws/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"flatcar-linux/aws/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"tempest\" dns_zone AWS Route53 DNS zone \"aws.example.com\" dns_zone_id AWS Route53 DNS zone id \"Z3PAABBCFAKEC0\" ssh_authorized_key SSH public key for user 'core' \"ssh-rsa AAAAB3NZ...\""},{"location":"flatcar-linux/aws/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a network load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>tempest.aws.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on AWS Route53. You can set this up once and create many clusters with unique names.</p> <pre><code>resource \"aws_route53_zone\" \"zone-for-clusters\" {\n  name = \"aws.example.com.\"\n}\n</code></pre> <p>Reference the DNS zone id with <code>aws_route53_zone.zone-for-clusters.zone_id</code>.</p> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Route53 (e.g. aws.mydomain.com) and update nameservers.</p>"},{"location":"flatcar-linux/aws/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type EC2 instance type for controllers \"t3.small\" See below worker_type EC2 instance type for workers \"t3.small\" See below os_image AMI channel for a Container Linux derivative \"flatcar-stable\" flatcar-stable, flatcar-beta, flatcar-alpha disk_size Size of the EBS volume in GB 30 100 disk_type Type of the EBS volume \"gp3\" standard, gp2, gp3, io1 disk_iops IOPS of the EBS volume 0 (i.e. auto) 400 worker_target_groups Target group ARNs to which worker instances should be added [] [aws_lb_target_group.app.id] worker_price Spot price in USD for worker instances or 0 to use on-demand instances 0/null 0.10 controller_snippets Controller Container Linux Config snippets [] example worker_snippets Worker Container Linux Config snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" network_mtu CNI interface MTU (calico only) 1480 8981 host_cidr CIDR IPv4 range to assign to EC2 instances \"10.0.0.0/16\" \"10.1.0.0/16\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid instance types.</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than <code>t2.small</code>. Smaller instances are not sufficient for running a controller.</p> <p>MTU</p> <p>If your EC2 instance type supports Jumbo frames (most do), we recommend you change the <code>network_mtu</code> to 8981! You will get better pod-to-pod bandwidth.</p>"},{"location":"flatcar-linux/aws/#spot","title":"Spot","text":"<p>Add <code>worker_price = \"0.10\"</code> to use spot instance workers (instead of \"on-demand\") and set a maximum spot price in USD. Clusters can tolerate spot market interuptions fairly well (reschedules pods, but cannot drain) to save money, with the tradeoff that requests for workers may go unfulfilled.</p>"},{"location":"flatcar-linux/azure/","title":"Azure","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on Azure with Flatcar Linux.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a resource group, virtual network, subnets, security groups, controller availability set, worker scale set, load balancer, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"flatcar-linux/azure/#requirements","title":"Requirements","text":"<ul> <li>Azure account</li> <li>Azure DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"flatcar-linux/azure/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"flatcar-linux/azure/#provider","title":"Provider","text":"<p>Install the Azure <code>az</code> command line tool to authenticate with Azure.</p> <pre><code>az login\n</code></pre> <p>Configure the Azure provider in a <code>providers.tf</code> file.</p> <pre><code>provider \"azurerm\" {\n  features {}\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.11.0\"\n    }\n    azurerm = {\n      source = \"hashicorp/azurerm\"\n      version = \"3.50.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>azurerm</code> provider docs.</p>"},{"location":"flatcar-linux/azure/#flatcar-linux-images","title":"Flatcar Linux Images","text":"<p>Flatcar Linux publishes images to the Azure Marketplace and requires accepting terms.</p> <pre><code>az vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable\naz vm image terms accept --publish kinvolk --offer flatcar-container-linux-free --plan stable-gen2\n</code></pre>"},{"location":"flatcar-linux/azure/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>azure/flatcar-linux/kubernetes</code>.</p> <pre><code>module \"ramius\" {\n  source = \"git::https://github.com/poseidon/typhoon//azure/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # Azure\n  cluster_name   = \"ramius\"\n  region         = \"centralus\"\n  dns_zone       = \"azure.example.com\"\n  dns_zone_group = \"example-group\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # optional\n  worker_count    = 2\n  host_cidr       = \"10.0.0.0/20\"\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"flatcar-linux/azure/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_rsa\nssh-add -L\n</code></pre>"},{"location":"flatcar-linux/azure/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 86 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\n...\nmodule.ramius.null_resource.bootstrap: Still creating... (6m50s elapsed)\nmodule.ramius.null_resource.bootstrap: Still creating... (7m0s elapsed)\nmodule.ramius.null_resource.bootstrap: Creation complete after 7m8s (ID: 3961816482286168143)\n\nApply complete! Resources: 69 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"flatcar-linux/azure/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-ramius\" {\n  content  = module.ramius.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/ramius-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/ramius-config\n$ kubectl get nodes\nNAME                  STATUS  ROLES   AGE  VERSION\nramius-controller-0   Ready   &lt;none&gt;  24m  v1.29.3\nramius-worker-000001  Ready   &lt;none&gt;  25m  v1.29.3\nramius-worker-000002  Ready   &lt;none&gt;  24m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                        READY  STATUS    RESTARTS  AGE\nkube-system   coredns-7c6fbb4f4b-b6qzx                    1/1    Running   0         26m\nkube-system   coredns-7c6fbb4f4b-j2k3d                    1/1    Running   0         26m\nkube-system   calico-node-1m5bf                           2/2    Running   0         26m\nkube-system   calico-node-7jmr1                           2/2    Running   0         26m\nkube-system   calico-node-bknc8                           2/2    Running   0         26m\nkube-system   kube-apiserver-ramius-controller-0          1/1    Running   0         26m\nkube-system   kube-controller-manager-ramius-controller-0 1/1    Running   0         26m\nkube-system   kube-proxy-j4vpq                            1/1    Running   0         26m\nkube-system   kube-proxy-jxr5d                            1/1    Running   0         26m\nkube-system   kube-proxy-lbdw5                            1/1    Running   0         26m\nkube-system   kube-scheduler-ramius-controller-0          1/1    Running   0         26m\n</code></pre>"},{"location":"flatcar-linux/azure/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"flatcar-linux/azure/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"flatcar-linux/azure/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"ramius\" region Azure region \"centralus\" dns_zone Azure DNS zone \"azure.example.com\" dns_zone_group Resource group where the Azure DNS zone resides \"global\" ssh_authorized_key SSH public key for user 'core' \"ssh-rsa AAAAB3NZ...\" <p>Tip</p> <p>Regions are shown in docs or with <code>az account list-locations --output table</code>.</p>"},{"location":"flatcar-linux/azure/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>ramius.azure.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on Azure DNS. You can set this up once and create many clusters with unique names.</p> <pre><code># Azure resource group for DNS zone\nresource \"azurerm_resource_group\" \"global\" {\n  name     = \"global\"\n  location = \"centralus\"\n}\n\n# DNS zone for clusters\nresource \"azurerm_dns_zone\" \"clusters\" {\n  resource_group_name = azurerm_resource_group.global.name\n\n  name      = \"azure.example.com\"\n  zone_type = \"Public\"\n}\n</code></pre> <p>Reference the DNS zone with <code>azurerm_dns_zone.clusters.name</code> and its resource group with <code>\"azurerm_resource_group.global.name</code>.</p> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Azure DNS (e.g. azure.mydomain.com) and update nameservers.</p>"},{"location":"flatcar-linux/azure/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type Machine type for controllers \"Standard_B2s\" See below worker_type Machine type for workers \"Standard_D2as_v5\" See below os_image Channel for a Container Linux derivative \"flatcar-stable\" flatcar-stable, flatcar-beta, flatcar-alpha disk_size Size of the disk in GB 30 100 worker_priority Set priority to Spot to use reduced cost surplus capacity, with the tradeoff that instances can be deallocated at any time Regular Spot controller_snippets Controller Container Linux Config snippets [] example worker_snippets Worker Container Linux Config snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" host_cidr CIDR IPv4 range to assign to instances \"10.0.0.0/16\" \"10.0.0.0/20\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid machine types and their specs. Use <code>az vm list-skus</code> to get the identifier.</p> <p>Warning</p> <p>Unlike AWS and GCP, Azure requires its virtual networks to have non-overlapping IPv4 CIDRs (yeah, go figure). Instead of each cluster just using <code>10.0.0.0/16</code> for instances, each Azure cluster's <code>host_cidr</code> must be non-overlapping (e.g. 10.0.0.0/20 for the 1<sup>st</sup> cluster, 10.0.16.0/20 for the 2<sup>nd</sup> cluster, etc).</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than <code>Standard_B2s</code>. Smaller instances are not sufficient for running a controller.</p>"},{"location":"flatcar-linux/azure/#spot-priority","title":"Spot Priority","text":"<p>Add <code>worker_priority=Spot</code> to use Spot Priority workers that run on Azure's surplus capacity at lower cost, but with the tradeoff that they can be deallocated at random. Spot priority VMs are Azure's analog to AWS spot instances or GCP premptible instances.</p>"},{"location":"flatcar-linux/bare-metal/","title":"Bare-Metal","text":"<p>In this tutorial, we'll network boot and provision a Kubernetes v1.29.3 cluster on bare-metal with Flatcar Linux.</p> <p>First, we'll deploy a Matchbox service and setup a network boot environment. Then, we'll declare a Kubernetes cluster using the Typhoon Terraform module and power on machines. On PXE boot, machines will install Container Linux to disk, reboot into the disk install, and provision themselves as Kubernetes controllers or workers via Ignition.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code> while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"flatcar-linux/bare-metal/#requirements","title":"Requirements","text":"<ul> <li>Machines with 2GB RAM, 30GB disk, PXE-enabled NIC, IPMI</li> <li>PXE-enabled network boot environment (with HTTPS support)</li> <li>Matchbox v0.6+ deployment with API enabled</li> <li>Matchbox credentials <code>client.crt</code>, <code>client.key</code>, <code>ca.crt</code></li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"flatcar-linux/bare-metal/#machines","title":"Machines","text":"<p>Collect a MAC address from each machine. For machines with multiple PXE-enabled NICs, pick one of the MAC addresses. MAC addresses will be used to match machines to profiles during network boot.</p> <ul> <li>52:54:00:a1:9c:ae (node1)</li> <li>52:54:00:b2:2f:86 (node2)</li> <li>52:54:00:c3:61:77 (node3)</li> </ul> <p>Configure each machine to boot from the disk through IPMI or the BIOS menu.</p> <pre><code>ipmitool -H node1 -U USER -P PASS chassis bootdev disk options=persistent\n</code></pre> <p>During provisioning, you'll explicitly set the boot device to <code>pxe</code> for the next boot only. Machines will install (overwrite) the operating system to disk on PXE boot and reboot into the disk install.</p> <p>Ask your hardware vendor to provide MACs and preconfigure IPMI, if possible. With it, you can rack new servers, <code>terraform apply</code> with new info, and power on machines that network boot and provision into clusters.</p>"},{"location":"flatcar-linux/bare-metal/#dns","title":"DNS","text":"<p>Create a DNS A (or AAAA) record for each node's default interface. Create a record that resolves to each controller node (or re-use the node record if there's one controller).</p> <ul> <li>node1.example.com (node1)</li> <li>node2.example.com (node2)</li> <li>node3.example.com (node3)</li> <li>myk8s.example.com (node1)</li> </ul> <p>Cluster nodes will be configured to refer to the control plane and themselves by these fully qualified names and they'll be used in generated TLS certificates.</p>"},{"location":"flatcar-linux/bare-metal/#matchbox","title":"Matchbox","text":"<p>Matchbox is an open-source app that matches network-booted bare-metal machines (based on labels like MAC, UUID, etc.) to profiles to automate cluster provisioning.</p> <p>Install Matchbox on a Kubernetes cluster or dedicated server.</p> <ul> <li>Installing on Kubernetes (recommended)</li> <li>Installing on a server</li> </ul> <p>Tip</p> <p>Deploy Matchbox as service that can be accessed by all of your bare-metal machines globally. This provides a single endpoint to use Terraform to manage bare-metal clusters at different sites. Typhoon will never include secrets in provisioning user-data so you may even deploy matchbox publicly.</p> <p>Matchbox provides a TLS client-authenticated API that clients, like Terraform, can use to manage machine matching and profiles. Think of it like a cloud provider API, but for creating bare-metal instances.</p> <p>Generate TLS client credentials. Save the <code>ca.crt</code>, <code>client.crt</code>, and <code>client.key</code> where they can be referenced in Terraform configs.</p> <pre><code>mv ca.crt client.crt client.key ~/.config/matchbox/\n</code></pre> <p>Verify the matchbox read-only HTTP endpoints are accessible (port is configurable).</p> <pre><code>$ curl http://matchbox.example.com:8080\nmatchbox\n</code></pre> <p>Verify your TLS client certificate and key can be used to access the Matchbox API (port is configurable).</p> <pre><code>$ openssl s_client -connect matchbox.example.com:8081 \\\n  -CAfile ~/.config/matchbox/ca.crt \\\n  -cert ~/.config/matchbox/client.crt \\\n  -key ~/.config/matchbox/client.key\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#pxe-environment","title":"PXE Environment","text":"<p>Create an iPXE-enabled network boot environment. Configure PXE clients to chainload iPXE firmware compiled to support HTTPS downloads. Instruct iPXE clients to chainload from your Matchbox service's <code>/boot.ipxe</code> endpoint.</p> <p>For networks already supporting iPXE clients, you can add a <code>default.ipxe</code> config.</p> <pre><code># /var/www/html/ipxe/default.ipxe\nchain http://matchbox.foo:8080/boot.ipxe\n</code></pre> <p>For networks with Ubiquiti Routers, you can configure the router itself to chainload machines to iPXE and Matchbox.</p> <p>Read about the many ways to setup a compliant iPXE-enabled network. There is quite a bit of flexibility:</p> <ul> <li>Continue using existing DHCP, TFTP, or DNS services</li> <li>Configure specific machines, subnets, or architectures to chainload from Matchbox</li> <li>Place Matchbox behind a menu entry (timeout and default to Matchbox)</li> </ul> <p>TFTP chainloading to modern boot firmware, like iPXE, avoids issues with old NICs and allows faster transfer protocols like HTTP to be used.</p> <p>Warning</p> <p>Compile iPXE from source with support for HTTPS downloads. iPXE's pre-built firmware binaries do not enable this. If you cannot enable HTTPS downloads, set <code>download_protocol = \"http\"</code> (discouraged).</p>"},{"location":"flatcar-linux/bare-metal/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#provider","title":"Provider","text":"<p>Configure the Matchbox provider to use your Matchbox API endpoint and client certificate in a <code>providers.tf</code> file.</p> <pre><code>provider \"matchbox\" {\n  endpoint    = \"matchbox.example.com:8081\"\n  client_cert = file(\"~/.config/matchbox/client.crt\")\n  client_key  = file(\"~/.config/matchbox/client.key\")\n  ca          = file(\"~/.config/matchbox/ca.crt\")\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.11.0\"\n    }\n    matchbox = {\n      source = \"poseidon/matchbox\"\n      version = \"0.5.2\"\n    }\n  }\n}\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>bare-metal/flatcar-linux/kubernetes</code>.</p> <pre><code>module \"mercury\" {\n  source = \"git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # bare-metal\n  cluster_name            = \"mercury\"\n  matchbox_http_endpoint  = \"http://matchbox.example.com\"\n  os_channel              = \"flatcar-stable\"\n  os_version              = \"2345.3.1\"\n\n  # configuration\n  k8s_domain_name    = \"node1.example.com\"\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # machines\n  controllers = [{\n    name   = \"node1\"\n    mac    = \"52:54:00:a1:9c:ae\"\n    domain = \"node1.example.com\"\n  }]\n  workers = [\n    {\n      name   = \"node2\",\n      mac    = \"52:54:00:b2:2f:86\"\n      domain = \"node2.example.com\"\n    },\n    {\n      name   = \"node3\",\n      mac    = \"52:54:00:c3:61:77\"\n      domain = \"node3.example.com\"\n    }\n  ]\n\n  # set to http only if you cannot chainload to iPXE firmware with https support\n  # download_protocol = \"http\"\n}\n</code></pre> <p>Workers with similar features can be defined inline using the <code>workers</code> field as shown above. It's also possible to define discrete workers that attach to the cluster. Discrete workers are more advanced, but more verbose.</p> <pre><code>module \"mercury-node1\" {\n  source = \"git::https://github.com/poseidon/typhoon//bare-metal/fedora-coreos/kubernetes/worker?ref=v1.29.3\"\n\n  # bare-metal\n  cluster_name = \"mercury\"\n  matchbox_http_endpoint  = \"http://matchbox.example.com\"\n  os_channel              = \"flatcar-stable\"\n  os_version              = \"2345.3.1\"\n\n  # configuration\n  name               = \"node2\"\n  mac                = \"52:54:00:b2:2f:86\"\n  domain             = \"node2.example.com\"\n  kubeconfig         = module.mercury.kubeconfig\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # optional\n  snippets       = []\n  node_labels    = []\n  node_tains     = []\n  install_disk   = \"/dev/vda\"\n  cached_install = false\n}\n\n...\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"flatcar-linux/bare-metal/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_rsa\nssh-add -L\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 55 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes. Terraform will generate bootstrap assets and create Matchbox profiles (e.g. controller, worker) and matching rules via the Matchbox API.</p> <pre><code>$ terraform apply\nmodule.mercury.null_resource.copy-controller-secrets.0: Still creating... (10s elapsed)\nmodule.mercury.null_resource.copy-worker-secrets.0: Still creating... (10s elapsed)\n...\n</code></pre> <p>Apply will then loop until it can successfully copy credentials to each machine and start the one-time Kubernetes bootstrap service. Proceed to the next step while this loops.</p>"},{"location":"flatcar-linux/bare-metal/#power","title":"Power","text":"<p>Power on each machine with the boot device set to <code>pxe</code> for the next boot only.</p> <pre><code>ipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe\nipmitool -H node1.example.com -U USER -P PASS power on\n</code></pre> <p>Machines will network boot, install Container Linux to disk, reboot into the disk install, and provision themselves as controllers or workers.</p> <p>If this is the first test of your PXE-enabled network boot environment, watch the SOL console of a machine to spot any misconfigurations.</p>"},{"location":"flatcar-linux/bare-metal/#bootstrap","title":"Bootstrap","text":"<p>Wait for the <code>bootstrap</code> step to finish bootstrapping the Kubernetes control plane. This may take 5-15 minutes depending on your network.</p> <pre><code>module.mercury.null_resource.bootstrap: Still creating... (6m10s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m20s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m30s elapsed)\nmodule.mercury.null_resource.bootstrap: Still creating... (6m40s elapsed)\nmodule.mercury.null_resource.bootstrap: Creation complete (ID: 5441741360626669024)\n\nApply complete! Resources: 55 added, 0 changed, 0 destroyed.\n</code></pre> <p>To watch the install to disk (until machines reboot from disk), SSH to port 2222.</p> <pre><code># before v1.10.1\n$ ssh debug@node1.example.com\n# after v1.10.1\n$ ssh -p 2222 core@node1.example.com\n</code></pre> <p>To watch the bootstrap process in detail, SSH to the first controller and journal the logs.</p> <pre><code>$ ssh core@node1.example.com\n$ journalctl -f -u bootstrap\nThe connection to the server cluster.example.com:6443 was refused - did you specify the right host or port?\nWaiting for static pod control plane\n...\nserviceaccount/calico-node unchanged\nsystemd[1]: Started Kubernetes control plane.\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-mercury\" {\n  content  = module.mercury.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/mercury-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/mercury-config\n$ kubectl get nodes\nNAME                STATUS  ROLES   AGE  VERSION\nnode1.example.com   Ready   &lt;none&gt;  10m  v1.29.3\nnode2.example.com   Ready   &lt;none&gt;  10m  v1.29.3\nnode3.example.com   Ready   &lt;none&gt;  10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE\nkube-system   calico-node-6qp7f                          2/2       Running   1          11m\nkube-system   calico-node-gnjrm                          2/2       Running   0          11m\nkube-system   calico-node-llbgt                          2/2       Running   0          11m\nkube-system   coredns-1187388186-dj3pd                   1/1       Running   0          11m\nkube-system   coredns-1187388186-mx9rt                   1/1       Running   0          11m\nkube-system   kube-apiserver-node1.example.com           1/1       Running   0          11m\nkube-system   kube-controller-node1.example.com          1/1       Running   1          11m\nkube-system   kube-proxy-50sd4                           1/1       Running   0          11m\nkube-system   kube-proxy-bczhp                           1/1       Running   0          11m\nkube-system   kube-proxy-mp2fw                           1/1       Running   0          11m\nkube-system   kube-scheduler-node1.example.com           1/1       Running   0          11m\n</code></pre>"},{"location":"flatcar-linux/bare-metal/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"flatcar-linux/bare-metal/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"flatcar-linux/bare-metal/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name \"mercury\" matchbox_http_endpoint Matchbox HTTP read-only endpoint \"http://matchbox.example.com:port\" os_channel Channel for a Container Linux derivative flatcar-stable, flatcar-beta, flatcar-alpha os_version Version for a Container Linux derivative to PXE and install \"2345.3.1\" k8s_domain_name FQDN resolving to the controller(s) nodes. Workers and kubectl will communicate with this endpoint \"myk8s.example.com\" ssh_authorized_key SSH public key for user 'core' \"ssh-rsa AAAAB3Nz...\" controllers List of controller machine detail objects (unique name, identifying MAC address, FQDN) <code>[{name=\"node1\", mac=\"52:54:00:a1:9c:ae\", domain=\"node1.example.com\"}]</code>"},{"location":"flatcar-linux/bare-metal/#optional","title":"Optional","text":"Name Description Default Example workers List of worker machine detail objects (unique name, identifying MAC address, FQDN) [] <code>[{name=\"node2\", mac=\"52:54:00:b2:2f:86\", domain=\"node2.example.com\"}, {name=\"node3\", mac=\"52:54:00:c3:61:77\", domain=\"node3.example.com\"}]</code> download_protocol Protocol iPXE uses to download the kernel and initrd. iPXE must be compiled with crypto support for https. Unused if cached_install is true \"https\" \"http\" cached_install PXE boot and install from the Matchbox <code>/assets</code> cache. Admin MUST have downloaded Container Linux or Flatcar images into the cache false true install_disk Disk device where Container Linux should be installed \"/dev/sda\" \"/dev/sdb\" networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" network_mtu CNI interface MTU (calico-only) 1480 - snippets Map from machine names to lists of Container Linux Config snippets {} examples network_ip_autodetection_method Method to detect host IPv4 address (calico-only) \"first-found\" \"can-reach=10.0.0.1\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" kernel_args Additional kernel args to provide at PXE boot [] [\"kvm-intel.nested=1\"] worker_node_labels Map from worker name to list of initial node labels {} {\"node2\" = [\"role=special\"]} worker_node_taints Map from worker name to list of initial node taints {} {\"node2\" = [\"role=special:NoSchedule\"]} oem_type An OEM type to install with <code>flatcar-install</code>. \"\" \"vmware_raw\""},{"location":"flatcar-linux/digitalocean/","title":"DigitalOcean","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on DigitalOcean with Flatcar Linux.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create controller droplets, worker droplets, DNS records, tags, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"flatcar-linux/digitalocean/#requirements","title":"Requirements","text":"<ul> <li>Digital Ocean Account and Token</li> <li>Digital Ocean Domain (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"flatcar-linux/digitalocean/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"flatcar-linux/digitalocean/#provider","title":"Provider","text":"<p>Login to DigitalOcean. Or if you don't have one, create an account with our referral link to get free credits.</p> <p>Generate a Personal Access Token with read/write scope from the API tab. Write the token to a file that can be referenced in configs.</p> <pre><code>mkdir -p ~/.config/digital-ocean\necho \"TOKEN\" &gt; ~/.config/digital-ocean/token\n</code></pre> <p>Configure the DigitalOcean provider to use your token in a <code>providers.tf</code> file.</p> <pre><code>provider \"digitalocean\" {\n  token = \"${chomp(file(\"~/.config/digital-ocean/token\"))}\"\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.11.0\"\n    }\n    digitalocean = {\n      source = \"digitalocean/digitalocean\"\n      version = \"2.27.1\"\n    }\n  }\n}\n</code></pre>"},{"location":"flatcar-linux/digitalocean/#flatcar-linux-images","title":"Flatcar Linux Images","text":"<p>Flatcar Linux publishes DigitalOcean images, but does not yet upload them. DigitalOcean allows custom images to be uploaded via a URL or file.</p> <p>Choose a Flatcar Linux release from Flatcar's file server. Copy the URL to the <code>flatcar_production_digitalocean_image.bin.bz2</code>, import it into DigitalOcean, and name it as a custom image. Add a data reference to the image in Terraform:</p> <pre><code>data \"digitalocean_image\" \"flatcar-stable-3227-2-0\" {\n  name = \"flatcar-stable-3227.2.0.bin.bz2\"\n}\n</code></pre> <p>Set the os_image in the next step.</p>"},{"location":"flatcar-linux/digitalocean/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>digital-ocean/flatcar-linux/kubernetes</code>.</p> <pre><code>module \"nemo\" {\n  source = \"git::https://github.com/poseidon/typhoon//digital-ocean/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # Digital Ocean\n  cluster_name = \"nemo\"\n  region       = \"nyc3\"\n  dns_zone     = \"digital-ocean.example.com\"\n\n  # configuration\n  os_image         = data.digitalocean_image.flatcar-stable-2303-4-0.id\n  ssh_fingerprints = [\"d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7\"]\n\n  # optional\n  worker_count = 2\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"flatcar-linux/digitalocean/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_rsa\nssh-add -L\n</code></pre>"},{"location":"flatcar-linux/digitalocean/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 54 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\nmodule.nemo.null_resource.bootstrap: Still creating... (30s elapsed)\nmodule.nemo.null_resource.bootstrap: Provisioning with 'remote-exec'...\n...\nmodule.nemo.null_resource.bootstrap: Still creating... (6m20s elapsed)\nmodule.nemo.null_resource.bootstrap: Creation complete (ID: 7599298447329218468)\n\nApply complete! Resources: 42 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 3-6 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"flatcar-linux/digitalocean/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-nemo\" {\n  content  = module.nemo.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/nemo-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/nemo-config\n$ kubectl get nodes\nNAME               STATUS  ROLES   AGE  VERSION\n10.132.110.130     Ready   &lt;none&gt;  10m  v1.29.3\n10.132.115.81      Ready   &lt;none&gt;  10m  v1.29.3\n10.132.124.107     Ready   &lt;none&gt;  10m  v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>NAMESPACE     NAME                                       READY     STATUS    RESTARTS   AGE\nkube-system   coredns-1187388186-ld1j7                   1/1       Running   0          11m\nkube-system   coredns-1187388186-rdhf7                   1/1       Running   0          11m\nkube-system   calico-node-1m5bf                          2/2       Running   0          11m\nkube-system   calico-node-7jmr1                          2/2       Running   0          11m\nkube-system   calico-node-bknc8                          2/2       Running   0          11m\nkube-system   kube-apiserver-ip-10.132.115.81            1/1       Running   0          11m\nkube-system   kube-controller-manager-ip-10.132.115.81   1/1       Running   0          11m\nkube-system   kube-proxy-6kxjf                           1/1       Running   0          11m\nkube-system   kube-proxy-fh3td                           1/1       Running   0          11m\nkube-system   kube-proxy-k35rc                           1/1       Running   0          11m\nkube-system   kube-scheduler-ip-10.132.115.81            1/1       Running   0          11m\n</code></pre>"},{"location":"flatcar-linux/digitalocean/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"flatcar-linux/digitalocean/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"flatcar-linux/digitalocean/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"nemo\" region Digital Ocean region \"nyc1\", \"sfo2\", \"fra1\", tor1\" dns_zone Digital Ocean domain (i.e. DNS zone) \"do.example.com\" os_image Container Linux image for instances \"uploaded-flatcar-image-id\" ssh_fingerprints SSH public key fingerprints [\"d7:9d...\"]"},{"location":"flatcar-linux/digitalocean/#dns-zone","title":"DNS Zone","text":"<p>Clusters create DNS A records <code>${cluster_name}.${dns_zone}</code> to resolve to controller droplets (round robin). This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>nemo.do.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain in DigitalOcean Domains (i.e. DNS zones). You can set this up once and create many clusters with unique names.</p> <pre><code># Declare a DigitalOcean record to also create a zone file\nresource \"digitalocean_domain\" \"zone-for-clusters\" {\n  name       = \"do.example.com\"\n  ip_address = \"8.8.8.8\"\n}\n</code></pre> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on DigitalOcean (e.g. do.mydomain.com) and update nameservers.</p>"},{"location":"flatcar-linux/digitalocean/#ssh-fingerprints","title":"SSH Fingerprints","text":"<p>DigitalOcean droplets are created with your SSH public key \"fingerprint\" (i.e. MD5 hash) to allow access. If your SSH public key is at <code>~/.ssh/id_rsa</code>, find the fingerprint with,</p> <pre><code>ssh-keygen -E md5 -lf ~/.ssh/id_rsa.pub | awk '{print $2}'\nMD5:d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7\n</code></pre> <p>If you use <code>ssh-agent</code> (e.g. Yubikey for SSH), find the fingerprint with,</p> <pre><code>ssh-add -l -E md5\n2048 MD5:d7:9d:79:ae:56:32:73:79:95:88:e3:a2:ab:5d:45:e7 cardno:000603633110 (RSA)\n</code></pre> <p>Digital Ocean requires the SSH public key be uploaded to your account, so you may also find the fingerprint under Settings -&gt; Security. Finally, if you don't have an SSH key, create one now.</p>"},{"location":"flatcar-linux/digitalocean/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 1 worker_count Number of workers 1 3 controller_type Droplet type for controllers \"s-2vcpu-2gb\" s-2vcpu-2gb, s-2vcpu-4gb, s-4vcpu-8gb, ... worker_type Droplet type for workers \"s-1vcpu-2gb\" s-1vcpu-2gb, s-2vcpu-2gb, ... controller_snippets Controller Container Linux Config snippets [] example worker_snippets Worker Container Linux Config snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" <p>Check the list of valid droplet types or use <code>doctl compute size list</code>.</p> <p>Warning</p> <p>Do not choose a <code>controller_type</code> smaller than 2GB. Smaller droplets are not sufficient for running a controller and bootstrapping will fail.</p>"},{"location":"flatcar-linux/google-cloud/","title":"Google Cloud","text":"<p>In this tutorial, we'll create a Kubernetes v1.29.3 cluster on Google Compute Engine with Flatcar Linux.</p> <p>We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets.</p> <p>Controller hosts are provisioned to run an <code>etcd-member</code> peer and a <code>kubelet</code> service. Worker hosts run a <code>kubelet</code> service. Controller nodes run <code>kube-apiserver</code>, <code>kube-scheduler</code>, <code>kube-controller-manager</code>, and <code>coredns</code>, while <code>kube-proxy</code> and <code>calico</code> (or <code>flannel</code>) run on every node. A generated <code>kubeconfig</code> provides <code>kubectl</code> access to the cluster.</p>"},{"location":"flatcar-linux/google-cloud/#requirements","title":"Requirements","text":"<ul> <li>Google Cloud Account and Service Account</li> <li>Google Cloud DNS Zone (registered Domain Name or delegated subdomain)</li> <li>Terraform v0.13.0+</li> </ul>"},{"location":"flatcar-linux/google-cloud/#terraform-setup","title":"Terraform Setup","text":"<p>Install Terraform v0.13.0+ on your system.</p> <pre><code>$ terraform version\nTerraform v1.0.0\n</code></pre> <p>Read concepts to learn about Terraform, modules, and organizing resources. Change to your infrastructure repository (e.g. <code>infra</code>).</p> <pre><code>cd infra/clusters\n</code></pre>"},{"location":"flatcar-linux/google-cloud/#provider","title":"Provider","text":"<p>Login to your Google Console API Manager and select a project, or signup if you don't have an account.</p> <p>Select \"Credentials\" and create a service account key. Choose the \"Compute Engine Admin\" and \"DNS Administrator\" roles and save the JSON private key to a file that can be referenced in configs.</p> <pre><code>mv ~/Downloads/project-id-43048204.json ~/.config/google-cloud/terraform.json\n</code></pre> <p>Configure the Google Cloud provider to use your service account key, project-id, and region in a <code>providers.tf</code> file.</p> <pre><code>provider \"google\" {\n  project     = \"project-id\"\n  region      = \"us-central1\"\n  credentials = file(\"~/.config/google-cloud/terraform.json\")\n}\n\nprovider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n      version = \"0.11.0\"\n    }\n    google = {\n      source = \"hashicorp/google\"\n      version = \"4.59.0\"\n    }\n  }\n}\n</code></pre> <p>Additional configuration options are described in the <code>google</code> provider docs.</p> <p>Tip</p> <p>Regions are listed in docs or with <code>gcloud compute regions list</code>. A project may contain multiple clusters across different regions.</p>"},{"location":"flatcar-linux/google-cloud/#cluster","title":"Cluster","text":"<p>Define a Kubernetes cluster using the module <code>google-cloud/flatcar-linux/kubernetes</code>.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/flatcar-linux/kubernetes?ref=v1.29.3\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  region        = \"us-central1\"\n  dns_zone      = \"example.com\"\n  dns_zone_name = \"example-zone\"\n\n  # configuration\n  ssh_authorized_key = \"ssh-rsa AAAAB3Nz...\"\n\n  # optional\n  worker_count = 2\n}\n</code></pre> <p>Reference the variables docs or the variables.tf source.</p>"},{"location":"flatcar-linux/google-cloud/#ssh-agent","title":"ssh-agent","text":"<p>Initial bootstrapping requires <code>bootstrap.service</code> be started on one controller node. Terraform uses <code>ssh-agent</code> to automate this step. Add your SSH private key to <code>ssh-agent</code>.</p> <pre><code>ssh-add ~/.ssh/id_rsa\nssh-add -L\n</code></pre>"},{"location":"flatcar-linux/google-cloud/#apply","title":"Apply","text":"<p>Initialize the config directory if this is the first use with Terraform.</p> <pre><code>terraform init\n</code></pre> <p>Plan the resources to be created.</p> <pre><code>$ terraform plan\nPlan: 78 to add, 0 to change, 0 to destroy.\n</code></pre> <p>Apply the changes to create the cluster.</p> <pre><code>$ terraform apply\nmodule.yavin.null_resource.bootstrap: Still creating... (10s elapsed)\n...\nmodule.yavin.null_resource.bootstrap: Still creating... (5m30s elapsed)\nmodule.yavin.null_resource.bootstrap: Still creating... (5m40s elapsed)\nmodule.yavin.null_resource.bootstrap: Creation complete (ID: 5768638456220583358)\n\nApply complete! Resources: 78 added, 0 changed, 0 destroyed.\n</code></pre> <p>In 4-8 minutes, the Kubernetes cluster will be ready.</p>"},{"location":"flatcar-linux/google-cloud/#verify","title":"Verify","text":"<p>Install kubectl on your system. Obtain the generated cluster <code>kubeconfig</code> from module outputs (e.g. write to a local file).</p> <pre><code>resource \"local_file\" \"kubeconfig-yavin\" {\n  content  = module.yavin.kubeconfig-admin\n  filename = \"/home/user/.kube/configs/yavin-config\"\n}\n</code></pre> <p>List nodes in the cluster.</p> <pre><code>$ export KUBECONFIG=/home/user/.kube/configs/yavin-config\n$ kubectl get nodes\nNAME                                       ROLES    STATUS  AGE  VERSION\nyavin-controller-0.c.example-com.internal  &lt;none&gt;   Ready   6m   v1.29.3\nyavin-worker-jrbf.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\nyavin-worker-mzdm.c.example-com.internal   &lt;none&gt;   Ready   5m   v1.29.3\n</code></pre> <p>List the pods.</p> <pre><code>$ kubectl get pods --all-namespaces\nNAMESPACE     NAME                                      READY  STATUS    RESTARTS  AGE\nkube-system   calico-node-1cs8z                         2/2    Running   0         6m\nkube-system   calico-node-d1l5b                         2/2    Running   0         6m\nkube-system   calico-node-sp9ps                         2/2    Running   0         6m\nkube-system   coredns-1187388186-dkh3o                  1/1    Running   0         6m\nkube-system   coredns-1187388186-zj5dl                  1/1    Running   0         6m\nkube-system   kube-apiserver-controller-0               1/1    Running   0         6m\nkube-system   kube-controller-manager-controller-0      1/1    Running   0         6m\nkube-system   kube-proxy-117v6                          1/1    Running   0         6m\nkube-system   kube-proxy-9886n                          1/1    Running   0         6m\nkube-system   kube-proxy-njn47                          1/1    Running   0         6m\nkube-system   kube-scheduler-controller-0               1/1    Running   0         6m\n</code></pre>"},{"location":"flatcar-linux/google-cloud/#going-further","title":"Going Further","text":"<p>Learn about maintenance and addons.</p>"},{"location":"flatcar-linux/google-cloud/#variables","title":"Variables","text":"<p>Check the variables.tf source.</p>"},{"location":"flatcar-linux/google-cloud/#required","title":"Required","text":"Name Description Example cluster_name Unique cluster name (prepended to dns_zone) \"yavin\" region Google Cloud region \"us-central1\" dns_zone Google Cloud DNS zone \"google-cloud.example.com\" dns_zone_name Google Cloud DNS zone name \"example-zone\" ssh_authorized_key SSH public key for user 'core' \"ssh-rsa AAAAB3NZ...\" <p>Check the list of valid regions and list Container Linux images with <code>gcloud compute images list | grep coreos</code>.</p>"},{"location":"flatcar-linux/google-cloud/#dns-zone","title":"DNS Zone","text":"<p>Clusters create a DNS A record <code>${cluster_name}.${dns_zone}</code> to resolve a TCP proxy load balancer backed by controller instances. This FQDN is used by workers and <code>kubectl</code> to access the apiserver(s). In this example, the cluster's apiserver would be accessible at <code>yavin.google-cloud.example.com</code>.</p> <p>You'll need a registered domain name or delegated subdomain on Google Cloud DNS. You can set this up once and create many clusters with unique names.</p> <pre><code>resource \"google_dns_managed_zone\" \"zone-for-clusters\" {\n  dns_name    = \"google-cloud.example.com.\"\n  name        = \"example-zone\"\n  description = \"Production DNS zone\"\n}\n</code></pre> <p>If you have an existing domain name with a zone file elsewhere, just delegate a subdomain that can be managed on Google Cloud (e.g. google-cloud.mydomain.com) and update nameservers.</p>"},{"location":"flatcar-linux/google-cloud/#optional","title":"Optional","text":"Name Description Default Example controller_count Number of controllers (i.e. masters) 1 3 worker_count Number of workers 1 3 controller_type Machine type for controllers \"n1-standard-1\" See below worker_type Machine type for workers \"n1-standard-1\" See below os_image Flatcar Linux image for compute instances \"flatcar-stable\" flatcar-stable, flatcar-beta, flatcar-alpha disk_size Size of the disk in GB 30 100 worker_preemptible If enabled, Compute Engine will terminate workers randomly within 24 hours false true controller_snippets Controller Container Linux Config snippets [] example worker_snippets Worker Container Linux Config snippets [] example networking Choice of networking provider \"cilium\" \"calico\" or \"cilium\" or \"flannel\" pod_cidr CIDR IPv4 range to assign to Kubernetes pods \"10.2.0.0/16\" \"10.22.0.0/16\" service_cidr CIDR IPv4 range to assign to Kubernetes services \"10.3.0.0/16\" \"10.3.0.0/24\" worker_node_labels List of initial worker node labels [] [\"worker-pool=default\"] <p>Check the list of valid machine types.</p>"},{"location":"flatcar-linux/google-cloud/#preemption","title":"Preemption","text":"<p>Add <code>worker_preemptible = \"true\"</code> to allow worker nodes to be preempted at random, but pay significantly less. Clusters tolerate stopping instances fairly well (reschedules pods, but cannot drain) and preemption provides a nice reward for running fault-tolerant cluster systems.`</p>"},{"location":"topics/faq/","title":"FAQ","text":""},{"location":"topics/faq/#terraform","title":"Terraform","text":"<p>Typhoon provides a Terraform Module for each supported operating system and platform. Terraform is considered a format detail, much like a Linux distro might provide images in the qcow2 or ISO format. It is a mechanism for sharing Typhoon in a way that works for many users.</p> <p>Formats rise and evolve. Typhoon may choose to adapt the format over time (with lots of forewarning). However, the authors' have built several Kubernetes \"distros\" before and learned from mistakes - Terraform modules are the right format for now.</p>"},{"location":"topics/faq/#security-issues","title":"Security Issues","text":"<p>If you find security issues, please see security disclosures.</p>"},{"location":"topics/faq/#maintainers","title":"Maintainers","text":"<p>Typhoon clusters are Kubernetes clusters the maintainers use in real-world, production clusters.</p> <ul> <li>Maintainers must personally operate a bare-metal and cloud provider cluster and strive to exercise it in real-world scenarios</li> </ul> <p>We merge features that are along the \"blessed path\". We minimize options to reduce complexity and matrix size. We remove outdated materials to reduce sprawl. \"Skate where the puck is going\", but also \"wait until the fit is right\". No is temporary, yes is forever.</p>"},{"location":"topics/hardware/","title":"Hardware","text":"<p>Typhoon ensures certain networking hardware integrates well with bare-metal Kubernetes.</p>"},{"location":"topics/hardware/#ubiquiti","title":"Ubiquiti","text":"<p>Ubiquiti EdgeRouters and EdgeOS work well with bare-metal Kubernetes clusters. Familiarity with EdgeRouter setup and CLI usage is required.</p>"},{"location":"topics/hardware/#dhcp","title":"DHCP","text":"<p>Assign static IPs to clients with known MAC addresses. This is called a static mapping by EdgeOS. Configure the router with the commands based on region inventory.</p> <pre><code>configure\nshow service dhcp-server shared-network\nset service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME mac-address MACADDR\nset service dhcp-server shared-network-name LAN subnet SUBNET static-mapping NAME ip-address 10.0.0.20\n</code></pre>"},{"location":"topics/hardware/#dns","title":"DNS","text":"<p>Add DNS A records to static IPs as <code>dnsmasq</code> host-records.</p> <pre><code>configure\nset service dns forwarding options host-record=node.example.com,10.0.0.20\n</code></pre> <p>Forward <code>*.svc.cluster.local</code> queries to the CoreDNS Kubernetes service IP to allow clients to resolve Kubernetes services.</p> <pre><code>set service dns forwarding options server=/svc.cluster.local/10.3.0.10\ncommit-confirm\n</code></pre> <p>Restart <code>dnsmasq</code>.</p> <pre><code>sudo /etc/init.d/dnsmasq restart\n</code></pre>"},{"location":"topics/hardware/#pxe","title":"PXE","text":"<p>Ubiquiti EdgeRouters can provide a PXE-enabled network boot environment for client machines.</p>"},{"location":"topics/hardware/#isc-dhcp","title":"ISC DHCP","text":"<p>With ISC DHCP, add a subnet parameter to the LAN DHCP server to include an ISC DHCP config file.</p> <pre><code>configure\nshow service dhcp-server shared-network-name NAME subnet SUBNET\nset service dhcp-server shared-network-name NAME subnet SUBNET subnet-parameters \"include &amp;quot;/config/scripts/ipxe.conf&amp;quot;;\"\ncommit-confirm\n</code></pre> <p>Switch to root (i.e. <code>sudo -i</code>) and write the ISC DHCP config <code>/config/scripts/ipxe.conf</code>. iPXE client machines will chainload to <code>matchbox.example.com</code>, while non-iPXE clients will chainload to <code>undionly.kpxe</code> (requires TFTP).</p> <pre><code>allow bootp;\nallow booting;\nnext-server ADD_ROUTER_IP_HERE;\n\nif exists user-class and option user-class = \"iPXE\" {\n  filename \"http://matchbox.example.com/boot.ipxe\";\n} else {\n  filename \"undionly.kpxe\";\n}\n</code></pre>"},{"location":"topics/hardware/#dnsmasq","title":"dnsmasq","text":"<p>With dnsmasq for DHCP, add options to chainload PXE clients to iPXE <code>undionly.kpxe</code> (requires TFTP), tag iPXE clients, and chainload iPXE clients to <code>matchbox.example.com</code>.</p> <pre><code>set service dns forwarding options 'dhcp-userclass=set:ipxe,iPXE'\nset service dns forwarding options 'pxe-service=tag:#ipxe,x86PC,PXE chainload to iPXE,undionly.kpxe'\nset service dns forwarding options 'pxe-service=tag:ipxe,x86PC,iPXE,http://matchbox.example.com/boot.ipxe'\n</code></pre>"},{"location":"topics/hardware/#tftp","title":"TFTP","text":"<p>Use <code>dnsmasq</code> as a TFTP server to serve <code>undionly.kpxe</code>. Compiling from source with TLS support is strongly recommended. If you use a pre-compiled copy, you must set <code>download_protocol = \"http\"</code> in your cluster definition (discouraged).</p> <pre><code>sudo -i\nmkdir /config/tftpboot &amp;&amp; cd /config/tftpboot\ncurl http://boot.ipxe.org/undionly.kpxe -o undionly.kpxe\n</code></pre> <p>Add <code>dnsmasq</code> command line options to enable the TFTP file server.</p> <pre><code>configure\nshow service dns forwarding\nset service dns forwarding options enable-tftp\nset service dns forwarding options tftp-root=/config/tftpboot\ncommit-confirm\n</code></pre>"},{"location":"topics/hardware/#routing","title":"Routing","text":""},{"location":"topics/hardware/#static-routes","title":"Static Routes","text":"<p>Add static route(s) to Kubernetes node(s) that can route to Kubernetes service IPs (default: 10.3.0.0/16). Kubernetes service IPs will become routeable on the LAN.</p> <pre><code>configure\nshow protocols static route\nset protocols static route 10.3.0.0/16 next-hop NODE_IP\ncommit-confirm\n</code></pre> <p>Note</p> <p>Adding multiple next-hop nodes provides equal-cost multi-path (ECMP) routing. EdgeOS v2.0+ is required. The kernel in prior versions used flow-hash to balanced packets, whereas with v2.0, round-robin sessions are used.</p>"},{"location":"topics/hardware/#bgp","title":"BGP","text":"<p>EdgeRouter can exchange routes with other autonomous systems, including a cluster's Calico AS. Peers will exchange <code>podCIDR</code> routes to make individual pods routeable on the LAN.</p> <p>Define the EdgeRouter AS (if undefined).</p> <pre><code>configure\nshow protocols bgp 1\nset protocols bgp 1 parameters router-id ROUTER_IP\n</code></pre> <p>Peer with node(s) in another AS (eg. Calico default 64512)</p> <pre><code>set protocols bgp 1 neighbor NODE1_IP remote-as 64512\nset protocols bgp 1 neighbor NODE2_IP remote-as 64512\nset protocols bgp 1 neighbor NODE3_IP remote-as 64512\ncommit-confirm\n</code></pre> <p>Configure Calico node(s) as to peer with the EdgeRouter.</p> <pre><code>apiVersion: crd.projectcalico.org/v1\nkind: BGPPeer\nmetadata:\n  name: NODE_NAME-to-edgerouter\nspec:\n  peerIP: ROUTER_IP\n  asNumber: 1\n  node: NODE_NAME\n</code></pre> <p>Or, if every node is to be peered (i.e. full mesh), define a global BGPPeer.</p> <pre><code>apiVersion: crd.projectcalico.org/v1\nkind: BGPPeer\nmetadata:\n  name: global\nspec:\n  peerIP: ROUTER_IP\n  asNumber: 1\n</code></pre> <p>If Calico nodes should advertise Kubernetes Service IPs (i.e. ClusterIPs) as well, add a <code>BGPConfiguration</code>.</p> <pre><code>apiVersion: crd.projectcalico.org/v1\nkind: BGPConfiguration\nmetadata:\n  name: default\nspec:\n  logSeverityScreen: Info\n  nodeToNodeMeshEnabled: true\n  serviceClusterIPs:\n    - cidr: 10.3.0.0/16\n</code></pre> <p>Show a summary of peers and exchanged routes.</p> <pre><code>show ip bgp summary\nshow ip route bgp\n</code></pre>"},{"location":"topics/hardware/#port-forwarding","title":"Port Forwarding","text":"<p>Expose the Ingress Controller by adding <code>port-forward</code> rules that DNAT a port on the router's WAN interface to an internal IP and port. By convention, a public Ingress controller is assigned a fixed service IP (e.g. 10.3.0.12).</p> <pre><code>configure\nset port-forward wan-interface eth0\nset port-forward lan-interface eth1\nset port-forward auto-firewall enable\nset port-forward hairpin-nat enable\nset port-forward rule 1 description 'ingress http'\nset port-forward rule 1 forward-to address 10.3.0.12\nset port-forward rule 1 forward-to port 80\nset port-forward rule 1 original-port 80\nset port-forward rule 1 protocol tcp_udp\nset port-forward rule 2 description 'ingress https'\nset port-forward rule 2 forward-to address 10.3.0.12\nset port-forward rule 2 forward-to port 443\nset port-forward rule 2 original-port 443\nset port-forward rule 2 protocol tcp_udp\ncommit-confirm\n</code></pre>"},{"location":"topics/hardware/#web-ui","title":"Web UI","text":"<p>The web UI is often accessible from the LAN on ports 80/443 by default. Edit the ports to 8080 and 4443 to avoid a conflict.</p> <pre><code>configure\nshow service gui\nset service gui http-port 8080\nset service gui https-port 4443\ncommit-confirm\n</code></pre>"},{"location":"topics/maintenance/","title":"Maintenance","text":""},{"location":"topics/maintenance/#best-practices","title":"Best Practices","text":"<ul> <li>Run multiple Kubernetes clusters. Run across platforms. Plan for regional and cloud outages.</li> <li>Require applications be platform agnostic. Moving an application between a Kubernetes AWS cluster and a Kubernetes bare-metal cluster should be normal.</li> <li>Strive to make single-cluster outages tolerable. Practice performing failovers.</li> <li>Strive to make single-cluster outages a non-event. Load balance applications between multiple clusters, automate failover behaviors, and adjust alerting behaviors.</li> </ul>"},{"location":"topics/maintenance/#versioning","title":"Versioning","text":"<p>Typhoon provides tagged releases to allow clusters to be versioned using ordinary Terraform configs.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/fedora-coreos/kubernetes?ref=v1.29.3\"\n  ...\n}\n\nmodule \"mercury\" {\n  source = \"git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes?ref=v1.29.3\"\n  ...\n}\n</code></pre> <p>Main is updated regularly, so it is recommended to pin modules to a release tag or commit hash. Pinning ensures <code>terraform get --update</code> only fetches the desired version.</p>"},{"location":"topics/maintenance/#terraform-versions","title":"Terraform Versions","text":"<p>Typhoon modules support Terraform v0.13.x and higher. Poseidon publishes providers to the Terraform Provider Registry for automatic install via <code>terraform init</code>.</p> Typhoon Release Terraform version v1.21.2 - ? v0.13.x, v0.14.4+, v0.15.x, v1.0.x v1.21.1 - v1.21.1 v0.13.x, v0.14.4+, v0.15.x v1.20.2 - v1.21.0 v0.13.x, v0.14.4+ v1.20.0 - v1.20.2 v0.13.x v1.18.8 - v1.19.4 v0.12.26+, v0.13.x v1.15.0 - v1.18.8 v0.12.x v1.10.3 - v1.15.0 v0.11.x v1.9.2 - v1.10.2 v0.10.4+ or v0.11.x v1.7.3 - v1.9.1 v0.10.x v1.6.4 - v1.7.2 v0.9.x"},{"location":"topics/maintenance/#cluster-upgrades","title":"Cluster Upgrades","text":"<p>Typhoon recommends upgrading clusters using a blue-green replacement strategy and migrating workloads.</p> <ol> <li>Launch new (candidate) clusters from tagged releases</li> <li>Apply workloads from existing cluster(s)</li> <li>Evaluate application health and performance</li> <li>Migrate application traffic to the new cluster</li> <li>Compare metrics and delete old cluster when ready</li> </ol> <p>Blue-green replacement reduces risk for clusters running critical applications. Candidate clusters allow baseline properties of clusters to be assessed (e.g. pod-to-pod bandwidth). Applying application workloads allows health to be assessed before being subjected to traffic (e.g. detect any changes in Kubernetes behavior between versions). Migration to the new cluster can be controlled according to requirements. Migration may mean updating DNS records to resolve the new cluster's ingress or may involve a load balancer gradually shifting traffic to the new cluster \"backend\". Retain the old cluster for a time to compare metrics or for fallback if issues arise.</p> <p>Blue-green replacement provides some subtler benefits as well:</p> <ul> <li>Encourages investment in tooling for traffic migration and failovers. When a cluster incident arises, shifting applications to a healthy cluster will be second nature.</li> <li>Discourages reliance on in-place opaque state. Retain confidence in your ability to create infrastructure from scratch.</li> <li>Allows Typhoon to make architecture changes between releases and eases the burden on Typhoon maintainers. By contrast, distros promising in-place upgrades get stuck with their mistakes or require complex and error-prone migrations.</li> </ul>"},{"location":"topics/maintenance/#bare-metal","title":"Bare-Metal","text":"<p>Typhoon bare-metal clusters are provisioned by a PXE-enabled network boot environment and a Matchbox service. To upgrade, re-provision machines into a new cluster.</p> <p>Failover application workloads to another cluster (varies).</p> <pre><code>kubectl config use-context other-context\nkubectl apply -f mercury -R\n# DNS or load balancer changes\n</code></pre> <p>Power off bare-metal machines and set their next boot device to PXE.</p> <pre><code>ipmitool -H node1.example.com -U USER -P PASS power off\nipmitool -H node1.example.com -U USER -P PASS chassis bootdev pxe\n</code></pre> <p>Delete or comment the Terraform config for the cluster.</p> <pre><code>- module \"mercury\" {\n-   source = \"git::https://github.com/poseidon/typhoon//bare-metal/flatcar-linux/kubernetes\"\n-   ...\n-}\n</code></pre> <p>Apply to delete old provisioning configs from Matchbox.</p> <pre><code>$ terraform apply\nApply complete! Resources: 0 added, 0 changed, 55 destroyed.\n</code></pre> <p>Re-provision a new cluster by following the bare-metal tutorial.</p>"},{"location":"topics/maintenance/#cloud","title":"Cloud","text":"<p>Create a new cluster following the tutorials. Failover application workloads to the new cluster (varies).</p> <pre><code>kubectl config use-context other-context\nkubectl apply -f mercury -R\n# DNS or load balancer changes\n</code></pre> <p>Once you're confident in the new cluster, delete the Terraform config for the old cluster.</p> <pre><code>- module \"yavin\" {\n-   source = \"git::https://github.com/poseidon/typhoon//google-cloud/flatcar-linux/kubernetes\"\n-   ...\n-}\n</code></pre> <p>Apply to delete the cluster.</p> <pre><code>$ terraform apply\nApply complete! Resources: 0 added, 0 changed, 55 destroyed.\n</code></pre>"},{"location":"topics/maintenance/#alternatives","title":"Alternatives","text":""},{"location":"topics/maintenance/#in-place-edits","title":"In-place Edits","text":"<p>Typhoon uses a static pod Kubernetes control plane which allows certain manifest upgrades to be performed in-place. Components like <code>kube-apiserver</code>, <code>kube-controller-manager</code>, and <code>kube-scheduler</code> are run as static pods. Components <code>flannel</code>/<code>calico</code>, <code>coredns</code>, and <code>kube-proxy</code> are scheduled on Kubernetes and can be edited via <code>kubectl</code>.</p> <p>In certain scenarios, in-place edits can be useful for quickly rolling out security patches (e.g. bumping <code>coredns</code>) or prioritizing speed over the safety of a proper cluster re-provision and transition.</p> <p>Note</p> <p>Rarely, we may test certain security in-place edits and mention them as an option in release notes.</p> <p>Warning</p> <p>Typhoon does not support or document in-place edits as an upgrade strategy. They involve inherent risks and we choose not to make recommendations or guarentees about the safety of different in-place upgrades. Its explicitly a non-goal.</p>"},{"location":"topics/maintenance/#node-replacement","title":"Node Replacement","text":"<p>Typhoon supports multi-controller clusters, so it is possible to upgrade a cluster by deleting and replacing nodes one by one.</p> <p>Warning</p> <p>Typhoon does not support or document node replacement as an upgrade strategy. It limits Typhoon's ability to make infrastructure and architectural changes between tagged releases.</p>"},{"location":"topics/maintenance/#node-configuration-updates","title":"Node Configuration Updates","text":"<p>Typhoon worker instance groups (default workers and worker pools) on AWS and Google Cloud gradually rolling replace worker instances when configuration changes are applied.</p>"},{"location":"topics/maintenance/#aws","title":"AWS","text":"<p>On AWS, worker instances belong to an auto-scaling group. When an auto-scaling group's launch configuration changes, an AWS Instance Refresh gradually replaces worker instances.</p> <p>Instance refresh creates surge instances, waits for a warm-up period, then deletes old instances.</p> <pre><code>module \"tempest\" {\n  source = \"git::https://github.com/poseidon/typhoon//aws/VARIANT/kubernetes?ref=VERSION\"\n\n  # AWS\n  cluster_name = \"tempest\"\n  ...\n\n  # optional\n  worker_count = 2\n- worker_type  = \"t3.small\"\n+ worker_type  = \"t3a.small\"\n\n  # change from on-demand to spot\n+ worker_price = \"0.0309\"\n\n  # default is 30GB\n+ disk_size = 50\n\n  # change worker snippets\n+ worker_snippets = [\n+   file(\"butane/feature.yaml\"),\n+ ]\n}\n</code></pre> <p>Applying edits to most worker fields will start an instance refresh:</p> <ul> <li><code>worker_type</code></li> <li><code>disk_*</code></li> <li><code>worker_price</code> (i.e. spot)</li> <li><code>worker_target_groups</code></li> <li><code>worker_snippets</code></li> </ul> <p>However, changing <code>os_stream</code>/<code>os_channel</code> or new AMIs becoming available will NOT change the launch configuration or trigger an Instance Refresh. This allows Fedora CoreOS or Flatcar Linux to auto-update themselves via reboots and avoids unexpected terraform diffs for new AMIs.</p> <p>Note</p> <p>Before Typhoon v1.29.3, worker nodes only used new launch configurations when replaced manually (or due to failure). If you must change node configuration manually, it's still possible. Create a new worker pool, then scale down the old worker pool as desired.</p>"},{"location":"topics/maintenance/#google-cloud","title":"Google Cloud","text":"<p>On Google Cloud, worker instances belong to a managed instance group. When a group's launch template changes, a rolling update gradually replaces worker instances.</p> <p>The rolling update creates surge instances, waits for instances to be healthy, then deletes old instances.</p> <pre><code>module \"yavin\" {\n  source = \"git::https://github.com/poseidon/typhoon//google-cloud/VARIANT/kubernetes?ref=VERSION\"\n\n  # Google Cloud\n  cluster_name  = \"yavin\"\n  ...\n\n  # optional\n  worker_count = 2\n+ worker_type = \"n2-standard-2\"\n+ worker_preemptible = true\n\n  # default is 30GB\n+ disk_size = 50\n\n  # change worker snippets\n+ worker_snippets = [\n+   file(\"butane/feature.yaml\"),\n+ ]\n}\n</code></pre> <p>Applying edits to most worker fields will start an instance refresh:</p> <ul> <li><code>worker_type</code></li> <li><code>disk_*</code></li> <li><code>worker_preemptible</code> (i.e. spot)</li> <li><code>worker_snippets</code></li> </ul> <p>However, changing <code>os_stream</code>/<code>os_channel</code> or new compute images becoming available will NOT change the launch template or update instances. This allows Fedora CoreOS or Flatcar Linux to auto-update themselves via reboots and avoids unexpected terraform diffs for new AMIs.</p> <p>Note</p> <p>Before Typhoon v1.29.3, worker nodes only used new launch templates when replaced manually (or due to failure). If you must change node configuration manually, it's still possible. Create a new worker pool, then scale down the old worker pool as desired.</p>"},{"location":"topics/maintenance/#upgrade-poseidonct","title":"Upgrade poseidon/ct","text":"<p>The poseidon/ct Terraform provider plugin parses, validates, and converts Butane Configs to Ignition user-data for provisioning instances. Since Typhoon v1.12.2+, the plugin can be updated in-place so that on apply, only workers will be replaced.</p> <p>Update the version of the <code>ct</code> plugin in each Terraform working directory. Typhoon clusters managed in the working directory must be v1.12.2 or higher.</p> <pre><code>provider \"ct\" {}\n\nterraform {\n  required_providers {\n    ct = {\n      source  = \"poseidon/ct\"\n-     version = \"0.10.0\"\n+     version = \"0.11.0\"\n    }\n    ...\n  }\n}\n</code></pre> <p>Run init and plan to check that no diff is proposed for the controller nodes (a diff would destroy cluster state).</p> <pre><code>terraform init\nterraform plan\n</code></pre> <p>Apply the change. If worker nodes' user-data is changed and workers will be replaced. Rollout happens slightly differently on each platform:</p>"},{"location":"topics/maintenance/#aws_1","title":"AWS","text":"<p>See AWS node config updates.</p>"},{"location":"topics/maintenance/#azure","title":"Azure","text":"<p>Azure edits the worker scale set in-place instantly. Manually terminate workers to create replacement workers using the new user-data.</p>"},{"location":"topics/maintenance/#bare-metal_1","title":"Bare-Metal","text":"<p>No action is needed. Bare-Metal machines do not re-PXE unless explicitly made to do so.</p>"},{"location":"topics/maintenance/#digitalocean","title":"DigitalOcean","text":"<p>DigitalOcean destroys existing worker nodes and DNS records, then creates new workers and DNS records. DigitalOcean lacks a \"managed group\" notion. For worker droplets to join the cluster, you must taint the secret copying step to indicate it must be repeated to add the kubeconfig to new workers.</p> <pre><code># old workers destroyed, new workers created\nterraform apply\n\n# add kubeconfig to new workers\nterraform state list | grep null_resource\nterraform taint module.nemo.null_resource.copy-worker-secrets[N]\nterraform apply\n</code></pre> <p>Expect downtime.</p>"},{"location":"topics/maintenance/#google-cloud_1","title":"Google Cloud","text":"<p>See Google Cloud node config updates.</p>"},{"location":"topics/performance/","title":"Performance","text":""},{"location":"topics/performance/#provision-time","title":"Provision Time","text":"<p>Provisioning times vary based on the operating system and platform. Sampling the time to create (apply) and destroy clusters with 1 controller and 2 workers shows (roughly) what to expect.</p> Platform Apply Destroy AWS 5 min 3 min Azure 10 min 7 min Bare-Metal 10-15 min NA Digital Ocean 3 min 30 sec 20 sec Google Cloud 8 min 5 min <p>Notes:</p> <ul> <li>SOA TTL and NXDOMAIN caching can have a large impact on provision time</li> <li>Platforms with auto-scaling take more time to provision (AWS, Azure, Google)</li> <li>Bare-metal POST times and network bandwidth will affect provision times</li> </ul>"},{"location":"topics/performance/#network-performance","title":"Network Performance","text":"<p>Network performance varies based on the platform and CNI plugin. <code>iperf</code> was used to measure the bandwidth between different hosts and different pods. Host-to-host shows typical bandwidth between host machines. Pod-to-pod shows the bandwidth between two <code>iperf</code> containers.</p> Platform / Plugin Theory Host to Host Pod to Pod AWS (flannel) 5 Gb/s 4.94 Gb/s 4.89 Gb/s AWS (calico, MTU 1480) 5 Gb/s 4.94 Gb/s 4.42 Gb/s AWS (calico, MTU 8981) 5 Gb/s 4.94 Gb/s 4.90 Gb/s Azure (flannel) Varies 749 Mb/s 650 Mb/s Azure (calico) Varies 749 Mb/s 650 Mb/s Bare-Metal (flannel) 1 Gb/s 940 Mb/s 903 Mb/s Bare-Metal (calico) 1 Gb/s 940 Mb/s 931 Mb/s Digital Ocean (flannel) Varies 1.97 Gb/s 1.20 Gb/s Digital Ocean (calico) Varies 1.97 Gb/s 1.20 Gb/s Google Cloud (flannel) 2 Gb/s 1.94 Gb/s 1.76 Gb/s Google Cloud (calico) 2 Gb/s 1.94 Gb/s 1.81 Gb/s <p>Notes:</p> <ul> <li>Calico, Cilium, and Flannel have comparable performance. Platform and configuration differences dominate.</li> <li>Azure and DigitalOcean network performance can be quite variable or depend on machine type</li> <li>Only certain AWS EC2 instance types allow jumbo frames. This is why the default MTU on AWS must be 1480.</li> </ul>"},{"location":"topics/security/","title":"Security","text":"<p>Typhoon aims to be minimal and secure. We're running it ourselves after all.</p>"},{"location":"topics/security/#overview","title":"Overview","text":"<p>Kubernetes</p> <ul> <li>etcd with peer-to-peer and client-auth TLS</li> <li>Kubelets TLS bootstrap certificates (72 hours)</li> <li>Generated TLS certificate (365 days) for admin <code>kubeconfig</code></li> <li>NodeRestriction is enabled to limit Kubelet authorization</li> <li>Role-Based Access Control is enabled. Apps must define RBAC policies for API access</li> <li>Workloads run on worker nodes only, unless they tolerate the master taint</li> <li>Kubernetes Network Policy and Calico NetworkPolicy support <sup>1</sup></li> </ul> <p>Hosts</p> <ul> <li>Container Linux auto-updates are enabled</li> <li>Hosts limit logins to SSH key-based auth (user \"core\")</li> <li>SELinux enforcing mode <sup>2</sup></li> </ul> <p>Platform</p> <ul> <li>Cloud firewalls limit access to ssh, kube-apiserver, and ingress</li> <li>No cluster credentials are stored in Matchbox (used for bare-metal)</li> <li>No cluster credentials are stored in Digital Ocean metadata</li> <li>Cluster credentials are stored in AWS metadata (for ASGs)</li> <li>Cluster credentials are stored in Azure metadata (for scale sets)</li> <li>Cluster credentials are stored in Google Cloud metadata (for managed instance groups)</li> <li>No account credentials are available to Digital Ocean droplets</li> <li>No account credentials are available to AWS EC2 instances (no IAM permissions)</li> <li>No account credentials are available to Azure instances (no IAM permissions)</li> <li>No account credentials are available to Google Cloud instances (no IAM permissions)</li> </ul>"},{"location":"topics/security/#precautions","title":"Precautions","text":"<p>Typhoon limits exposure to many security threats, but it is not a silver bullet. As usual,</p> <ul> <li>Do not run untrusted images or accept manifests from strangers</li> <li>Do not give untrusted users a shell behind your firewall</li> <li>Define network policies for your namespaces</li> </ul>"},{"location":"topics/security/#container-images","title":"Container Images","text":"<p>Typhoon uses upstream container images (where possible) and upstream binaries.</p> <p>Note</p> <p>Kubernetes releases <code>kubelet</code> as a binary for distros to package, either as a DEB/RPM on traditional distros or as a container image for container-optimized operating systems.</p> <p>Typhoon packages the upstream Kubelet and its dependencies as a container image. Builds fetch the upstream Kubelet binary and verify its checksum.</p> <p>The Kubelet image is published to Quay.io and Dockerhub.</p> <ul> <li>quay.io/poseidon/kubelet (official)</li> <li>docker.io/psdn/kubelet (fallback)</li> </ul> <p>Two tag styles indicate the build strategy used.</p> <ul> <li>Typhoon internal infra publishes single and multi-arch images (e.g. <code>v1.18.4</code>, <code>v1.18.4-amd64</code>, <code>v1.18.4-arm64</code>, <code>v1.18.4-2-g23228e6-amd64</code>, <code>v1.18.4-2-g23228e6-arm64</code>)</li> <li>Quay automated builds publish verifiable images (e.g. <code>build-SHA</code> on Quay)</li> </ul> <p>The Typhoon-built Kubelet image is used as the official image. Automated builds provide an alternative image for those preferring to trust images built by Quay (albeit lacking multi-arch). To use the fallback registry or an alternative tag, see customization.</p>"},{"location":"topics/security/#flannel-cni","title":"flannel-cni","text":"<p>Typhoon packages the flannel-cni container image to provide security patches.</p> <ul> <li>quay.io/poseidon/flannel-cni (official)</li> </ul>"},{"location":"topics/security/#terraform-providers","title":"Terraform Providers","text":"<p>Typhoon publishes Terraform providers to the Terraform Registry, GPG signed by 0x8F515AD1602065C8.</p> Name Source Registry ct github poseidon/ct matchbox github poseidon/matchbox"},{"location":"topics/security/#kube-system","title":"kube-system","text":"Name user hostNet privileged kube-apiserver nobody true false kube-controller-manager nobody true false kube-scheduler nobody true false coredns NA false false kube-proxy root true true cilium root true true calico root true true flannel root true true Name priorityClassName kube-apiserver system-cluster-critical kube-controller-manager system-cluster-critical kube-scheduler system-cluster-critical coredns system-cluster-critical kube-proxy system-node-critical cilium system-node-critical calico system-node-critical flannel system-node-critical"},{"location":"topics/security/#disclosures","title":"Disclosures","text":"<p>If you find security issues, please email <code>security@psdn.io</code>. If the issue lies in upstream Kubernetes, please inform upstream Kubernetes as well.</p> <ol> <li> <p>Requires <code>networking = \"calico\"</code>. Calico is the default on all platforms (AWS, Azure, bare-metal, DigitalOcean, and Google Cloud).\u00a0\u21a9</p> </li> <li> <p>SELinux is enforcing on Fedora CoreOS, permissive on Flatcar Linux.\u00a0\u21a9</p> </li> </ol>"}]}
						
						
					
				
				
					
					View Git Blame
					Copy Permalink