mirror of
https://github.com/poseidon/typhoon
synced 2024-11-18 14:17:50 +01:00
ad2e4311d1
* Allow multi-controller clusters on Google Cloud * GCP regional network load balancers have a long open bug in which requests originating from a backend instance are routed to the instance itself, regardless of whether the health check passes or not. As a result, only the 0th controller node registers. We've recommended just using single master GCP clusters for a while * https://issuetracker.google.com/issues/67366622 * Workaround issue by switching to a GCP TCP Proxy load balancer. TCP proxy lb routes traffic to a backend service (global) of instance group backends. In our case, spread controllers across 3 zones (all regions have 3+ zones) and organize them in 3 zonal unmanaged instance groups that serve as backends. Allows multi-controller cluster creation * GCP network load balancers only allowed legacy HTTP health checks so kubelet 10255 was checked as an approximation of controller health. Replace with TCP apiserver health checks to detect unhealth or unresponsive apiservers. * Drawbacks: GCP provision time increases, tailed logs now timeout (similar tradeoff in AWS), controllers only span 3 zones instead of the exact number in the region * Workaround in Typhoon has been known and posted for 5 months, but there still appears to be no better alternative. Its probably time to support multi-master and accept the downsides
16 KiB
16 KiB
Typhoon
Notable changes between versions.
Latest
Google Cloud
- Add support for multi-controller clusters (i.e. multi-master) (#54, #190)
- Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a bug in Google network load balancers that limited clusters to only bootstrapping one controller node.
- Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation.
Addons
- Update kube-state-metrics from v1.3.0 to v1.3.1
v1.10.1
- Kubernetes v1.10.1
- Enable etcd v3.3 metrics endpoint (#175)
- Use
k8s.gcr.io
instead ofgcr.io/google_containers
(#180)- Kubernetes recommends using the alias to pull from the nearest regional mirror and to abstract the backing container registry
- Update kube-dns from v1.14.8 to v1.14.9
- Update etcd from v3.3.2 to v3.3.3
- Use kubernetes-incubator/bootkube v0.12.0
Bare-Metal
- Fix need for multiple
terraform apply
runs to create a cluster with Terraform v0.11.4 (#181)- To SSH during a disk install for debugging, SSH as user "core" with port 2222
- Remove the old trick of using a user "debug" during disk install
Google Cloud
- Refactor out the
controller
internal module
Addons
- Add Prometheus discovery for etcd peers on controller nodes (#175)
- Scrape etcd v3.3
--listen-metrics-urls
for metrics - Enable etcd alerts and populate the etcd Grafana dashboard
- Scrape etcd v3.3
- Update kube-state-metrics from v1.2.0 to v1.3.0
v1.10.0
- Kubernetes v1.10.0
- Remove unused, unmaintained
pxe-worker
internal module
AWS
- Add
disk_type
optional variable for setting the EBS volume type (#176)- Change default type from
standard
togp2
. Prometheus etcd alerts are tuned for fast disks.
- Change default type from
Digital Ocean
- Ensure etcd secrets are only distributed to controller hosts, not workers.
- Remove
networking
optional variable. Only flannel works on Digital Ocean.
Google Cloud
- Add
disk_size
optional variable for setting instance disk size in GB - Add
controller_type
optional variable for setting machine type for controllers - Add
worker_type
optional variable for setting machine type for workers - Remove
machine_type
optional variable. Usecontroller_type
andworker_type
.
Addons
v1.9.6
- Kubernetes v1.9.6
- Update Calico from v3.0.3 to v3.0.4
Addons
- Update heapster from v1.5.1 to v1.5.2
v1.9.5
- Kubernetes v1.9.5
- Fix
subPath
volume mounts regression (kubernetes#61076)
- Fix
- Introduce Container Linux Config snippets on cloud platforms (#145)
- Validate and additively merge custom Container Linux Configs during
terraform plan
- Define files, systemd units, dropins, networkd configs, mounts, users, and more
- Require updating
terraform-provider-ct
plugin from v0.2.0 to v0.2.1
- Validate and additively merge custom Container Linux Configs during
- Add
node-role.kubernetes.io/controller="true"
node label to controllers (#160)
AWS
Digital Ocean
Google Cloud
- Require updating
terraform-provider-ct
plugin from v0.2.0 to v0.2.1 (action required!) - Relax
os_image
to optional. Default to "coreos-stable".
Addons
- Update nginx-ingress from 0.11.0 to 0.12.0
- Update Prometheus from 2.2.0 to 2.2.1
v1.9.4
- Kubernetes v1.9.4
- Secret, configMap, downward API, and projected volumes now read-only (breaking, kubernetes#58720)
- Regressed
subPath
volume mounts (regression, kubernetes#61076) - Mitigated
subPath
CVE-2017-1002101
- Introduce worker pools for AWS and Google Cloud for joining heterogeneous workers to existing clusters.
- Use new Network Load Balancers and cross zone load balancing on AWS
- Allow flexvolume plugins to be used on any Typhoon cluster (not just bare-metal)
- Upgrade etcd from v3.2.15 to v3.3.2
- Update Calico from v3.0.2 to v3.0.3
- Use kubernetes-incubator/bootkube v0.11.0
- Recommend updating
terraform-provider-ct
plugin from v0.2.0 to v0.2.1 (action recommended)
AWS
- Promote AWS platform to stable
- Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) (#150)
- Replace the apiserver elastic load balancer with a network load balancer (#136)
- Replace the Ingress elastic load balancer with a network load balancer (#141)
- AWS NLBs can handle millions of RPS with high throughput and low latency.
- Require
terraform-provider-aws
1.7.0 or higher
- Enable NLB cross-zone load balancing (#159)
- Requests are automatically evenly distributed to targets regardless of AZ
- Require
terraform-provider-aws
1.11.0 or higher
- Add kubelet
--volume-plugin-dir
flag to allow flexvolume plugins (#142) - Fix controller and worker launch configs to ignore AMI changes (#126, #158)
Digital Ocean
- Add kubelet
--volume-plugin-dir
flag to allow flexvolume plugins (#142) - Fix to pass
ssh_fingerprints
as a list to droplets (#143)
Google Cloud
- Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) (#148)
- Add kubelet
--volume-plugin-dir
flag to allow flexvolume plugins (#142) - Add
kubeconfig
variable tocontrollers
andworkers
submodules (#147) - Remove
kubeconfig_*
variables fromcontrollers
andworkers
submodules (#147) - Allow initial experimentation with accelerators (i.e. GPUs) on workers (#161) (unofficial)
- Require
terraform-provider-google
v1.6.0
- Require
Addons
- Update Prometheus from 2.1.0 to 2.2.0 (#153)
- Scrape Prometheus itself to enable alerts about Prometheus itself
- Adjust KubeletDown rule to fire when 10% of kubelets are down
- Update heapster from v1.5.0 to v1.5.1 (#131)
- Use separate service account
- Update nginx-ingress from 0.10.2 to 0.11.0
v1.9.3
- Kubernetes v1.9.3
- Network improvements and fixes (#104)
- Switch from Calico v2.6.6 to v3.0.2
- Add Calico GlobalNetworkSet CRD
- Update flannel from v0.9.0 to v0.10.0
- Use separate service account for flannel
- Update etcd from v3.2.14 to v3.2.15
Digital Ocean
- Use new Droplet types which offer more CPU/memory, at lower cost. (#105)
- A small Digital Ocean cluster costs less than $25 a month!
Addons
- Update Prometheus from v2.0.0 to v2.1.0 (#113)
- Improve alerting rules
- Relabel discovered kubelet, endpoint, service, and apiserver scrapes
- Use separate service accounts
- Update node-exporter and kube-state-metrics
- Include Grafana dashboards for Kubernetes admins (#113)
- Add grafana-watcher to load bundled upstream dashboards
- Update nginx-ingress from 0.9.0 to 0.10.2
- Update CLUO from v0.5.0 to v0.6.0
- Switch manifests to use
apps/v1
Deployments and Daemonsets (#120) - Remove Kubernetes Dashboard manifests (#121)
v1.9.2
- Kubernetes v1.9.2
- Add Terraform v0.11.x support
- Add explicit "providers" section to modules for Terraform v0.11.x
- Retain support for Terraform v0.10.4+
- Add migration guide from Terraform v0.10.x to v0.11.x (action required!)
- Update etcd from 3.2.13 to 3.2.14
- Update calico from 2.6.5 to 2.6.6
- Update kube-dns from v1.14.7 to v1.14.8
- Use separate service account for kube-dns
- Use kubernetes-incubator/bootkube v0.10.0
Bare-Metal
- Use per-node Container Linux install profiles (#97)
- Allow Container Linux channel/version to be chosen per-cluster
- Fix issue where cluster deletion could require
terraform apply
multiple times
Digital Ocean
- Relax
digitalocean
provider version constraint - Fix bug with
terraform plan
always showing a firewall diff to be applied (#3)
Addons
- Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (important)
- Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes (cluo#163)
- Update kube-state-metrics from v1.1.0 to v1.2.0
- Fix RBAC cluster role for kube-state-metrics
v1.9.1
- Kubernetes v1.9.1
- Update kube-dns from 1.14.5 to v1.14.7
- Update etcd from 3.2.0 to 3.2.13
- Update Calico from v2.6.4 to v2.6.5
- Enable portmap to fix hostPort with Calico
- Use separate service account for controller-manager
v1.8.6
- Kubernetes v1.8.6
- Update Calico from v2.6.3 to v2.6.4
v1.8.5
- Kubernetes v1.8.5
- Recommend Container Linux images with Docker 17.09
- Container Linux stable, beta, and alpha now provide Docker 17.09 (instead of 1.12)
- Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
- Fix race where
etcd-member.service
could fail to resolve peers (#69) - Add optional
cluster_domain_suffix
variable (#74) - Use kubernetes-incubator/bootkube v0.9.1
Bare-Metal
- Add kubelet
--volume-plugin-dir
flag to allow flexvolume providers (#61)
Addons
- Discourage deploying the Kubernetes Dashboard (security)
v1.8.4
- Kubernetes v1.8.4
- Calico related bug fixes
- Update Calico from v2.6.1 to v2.6.3
- Update flannel from v0.9.0 to v0.9.1
- Service accounts for kube-proxy and pod-checkpointer
- Use kubernetes-incubator/bootkube v0.9.0
v1.8.3
- Kubernetes v1.8.3
- Run etcd on-host, across controllers
- Promote AWS platform to beta
- Use kubernetes-incubator/bootkube v0.8.2
Google Cloud
- Add required variable
region
(e.g. "us-central1") - Reduce time to bootstrap a cluster
- Change etcd to run on-host, across controllers (etcd-member.service)
- Change controller instances to automatically span zones in the region
- Change worker managed instance group to automatically span zones in the region
- Improve internal firewall rules and use tag-based firewall policies
- Remove support for self-hosted etcd
- Remove the
zone
required variable - Remove the
controller_preemptible
optional variable
AWS
- Promote AWS platform to beta
- Reduce time to bootstrap a cluster
- Change etcd to run on-host, across controllers (etcd-member.service)
- Fix firewall rules for multi-controller kubelet scraping and node-exporter
- Remove support for self-hosted etcd
Addons
- Add Prometheus 2.0 addon with alerting rules
- Add Grafana dashboard for observing metrics
v1.8.2
- Kubernetes v1.8.2
- Fixes a memory leak in the v1.8.1 apiserver (kubernetes#53485)
- Switch to using the
gcr.io/google_containers/hyperkube
- Update flannel from v0.8.0 to v0.9.0
- Add
hairpinMode
to flannel CNI config - Add
--no-negcache
to kube-dns dnsmasq - Use kubernetes-incubator/bootkube v0.8.1
v1.8.1
- Kubernetes v1.8.1
- Use kubernetes-incubator/bootkube v0.8.0
Digital Ocean
- Run etcd cluster across controller nodes (etcd-member.service)
- Remove support for self-hosted etcd
- Reduce time to bootstrap a cluster
v1.7.7
- Kubernetes v1.7.7
- Use kubernetes-incubator/bootkube v0.7.0
- Update kube-dns to 1.14.5 to fix dnsmasq vulnerability
- Calico v2.6.1
- flannel-cni v0.3.0
- Update flannel CNI config to fix hostPort
v1.7.5
- Kubernetes v1.7.5
- Use kubernetes-incubator/bootkube v0.6.2
- Add AWS Terraform module (alpha)
- Add support for Calico networking (bare-metal, Google Cloud, AWS)
- Change networking default from "flannel" to "calico"
AWS
- Add
network_mtu
to allow CNI interface MTU customization
Bare-Metal
- Add
network_mtu
to allow CNI interface MTU customization - Remove support for
experimental_self_hosted_etcd
v1.7.3
- Kubernetes v1.7.3
- Use kubernetes-incubator/bootkube v0.6.1
Digital Ocean
- Add cloud firewall rules (requires Terraform v0.10)
- Change nodes tags from strings to DO tags
v1.7.1
- Kubernetes v1.7.1
- Use kubernetes-incubator/bootkube v0.6.0
- Add Bare-Metal Terraform module (stable)
- Add Digital Ocean Terraform module (beta)
Google Cloud
- Remove
k8s_domain_name
variable,cluster_name
+dns_zone
resolves to controllers - Rename
dns_base_zone
todns_zone
- Rename
dns_base_zone_name
todns_zone_name
v1.6.7
- Kubernetes v1.6.7
- Use kubernetes-incubator/bootkube v0.5.1
v1.6.6
- Kubernetes v1.6.6
- Use kubernetes-incubator/bootkube v0.4.5
- Disable locksmithd on hosts, in favor of CLUO.
v1.6.4
- Kubernetes v1.6.4
- Add Google Cloud Terraform module (stable)
Earlier
Earlier versions, back to v1.3.0, used different designs and mechanisms.