1
0
mirror of https://github.com/poseidon/typhoon synced 2024-11-17 20:14:02 +01:00
typhoon/CHANGES.md
Dalton Hubble 316f06df06 Combine NLBs to use one NLB per cluster
* Simplify clusters to come with a single NLB
* Listen for apiserver traffic on port 6443 and forward
to controllers (with healthy apiserver)
* Listen for ingress traffic on ports 80/443 and forward
to workers (with healthy ingress controller)
* Reduce cost of default clusters by 1 NLB ($18.14/month)
* Keep using CNAME records to the `ingress_dns_name` NLB and
the nginx-ingress addon for Ingress (up to a few million RPS)
* Users with heavy traffic (many million RPS) can create their
own separate NLB(s) for Ingress and use the new output worker
target groups
* Fix issue where additional worker pools come with an
extraneous network load balancer
2018-06-21 23:46:57 -07:00

22 KiB

Typhoon

Notable changes between versions.

Latest

AWS

  • Switch kube-apiserver port from 443 to 6443 (#248)
  • Combine apiserver and ingress NLBs (#249)
    • Simplify clusters to come with one NLB. Reduce cost by ~$18/month per cluster.
    • Users may keep using CNAME records to ingress_dns_name and the nginx-ingress addon for Ingress (up to a few million RPS)
    • Users with heavy traffic (many million RPS) should create a separate NLB(s) for Ingress instead
    • Listen for apiserver traffic on port 6443 and forward to controllers (with healthy apiserver)
    • Listen for ingress traffic on ports 80/443 and forward to workers (with healthy ingress controller)
  • Worker pools (advanced) no longer include an extraneous load balancer

Bare-Metal

  • Switch kube-apiserver port from 443 to 6443 (#248)
    • Users who exposed kube-apiserver on a WAN via their router/load-balancer will need to adjust its configuration (e.g. DNAT 6443). Most apiservers are on a LAN (internal, VPN-only, etc) so if you didn't specially configure network gear for 443, no change is needed. (possible action required)
  • Fix possible deadlock when provisioning clusters larger than 10 nodes (#244)

DigitalOcean

  • Switch kube-apiserver port from 443 to 6443 (#248)
    • Update firewall rules and generated kubeconfig's

Addons

  • Update CLUO from v0.6.0 to v0.7.0 (#242)

v1.10.4

  • Kubernetes v1.10.4
  • Update etcd from v3.3.5 to v3.3.6
  • Update Calico from v3.1.2 to v3.1.3

Addons

  • Update Prometheus from v2.2.1 to v2.3.1
  • Add Prometheus liveness and readiness probes
  • Update Grafana from 5.1.3 to 5.1.4
  • Annotate Grafana service so Prometheus scrapes metrics
  • Label namespaces to ease writing Network Policies

v1.10.3

  • Kubernetes v1.10.3
  • Add Flatcar Linux (Container Linux derivative) as an option for AWS and bare-metal (thanks @kinvolk folks)
  • Allow bearer token authentication to the Kubelet (#216)
    • Require Webhook authorization to the Kubelet
    • Switch apiserver X509 client cert org to satisfy new authorization requirement
  • Require Terraform v0.11.x and drop support for v0.10.x (migration guide)
  • Update etcd from v3.3.4 to v3.3.5 (#213)
  • Update Calico from v3.1.1 to v3.1.2

AWS

  • Allow Flatcar Linux by setting os_image to flatcar-stable (default), flatcar-beta, flatcar-alpha (#211)
  • Replace os_channel variable with os_image to align naming across clouds
    • Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (action required!)
  • Allow preemptible workers via spot instances (#202)
    • Add worker_price to allow worker spot instances. Default to empty string for the worker autoscaling group to use regular on-demand instances
    • Add spot_price to internal workers module for spot worker pools

Bare-Metal

  • Allow Flatcar Linux by setting os_channel to flatcar-stable, flatcar-beta, flatcar-alpha (#220)
  • Replace container_linux_channel variable with os_channel
    • Please change values stable, beta, or alpha to coreos-stable, coreos-beta, coreos-alpha (action required!)
  • Replace container_linux_version variable with os_version
  • Add network_ip_autodetection_method variable for Calico host IPv4 address detection
    • Use Calico's default "first-found" to support single NIC and bonded NIC nodes
    • Allow alternative methods for multi NIC nodes, like can-reach=IP or interface=REGEX
  • Deprecate container_linux_oem variable

DigitalOcean

  • Update Fedora Atomic module to use Fedora Atomic 28 (#225)
    • Fedora Atomic 27 images disappeared from DigitalOcean and forced this early update

Addons

  • Fix Prometheus data directory location (#203)
  • Configure Prometheus to scrape Kubelets directly with bearer token auth instead of proxying through the apiserver (#217)
    • Security improvement: Drop RBAC permission from nodes/proxy to nodes/metrics
    • Scale: Remove per-node proxied scrape load from the apiserver
  • Update Grafana from v5.04 to v5.1.3 (#208)
    • Disable Grafana Google Analytics by default (#214)
  • Update nginx-ingress from 0.14.0 to 0.15.0
  • Annotate nginx-ingress service so Prometheus auto-discovers and scrapes service endpoints (#222)

v1.10.2

Google Cloud

  • Add support for multi-controller clusters (i.e. multi-master) (#54, #190)
    • Switch from Google Cloud network load balancer to a TCP proxy load balancer. Avoid a bug in Google network load balancers that limited clusters to only bootstrapping one controller node.
    • Add TCP health check for apiserver pods on controllers. Replace kubelet check approximation.

Addons

  • Update nginx-ingress from 0.12.0 to 0.14.0
  • Update kube-state-metrics from v1.3.0 to v1.3.1

v1.10.1

  • Kubernetes v1.10.1
  • Enable etcd v3.3 metrics endpoint (#175)
  • Use k8s.gcr.io instead of gcr.io/google_containers (#180)
    • Kubernetes recommends using the alias to pull from the nearest regional mirror and to abstract the backing container registry
  • Update etcd from v3.3.2 to v3.3.3
  • Update kube-dns from v1.14.8 to v1.14.9
  • Use kubernetes-incubator/bootkube v0.12.0

Bare-Metal

  • Fix need for multiple terraform apply runs to create a cluster with Terraform v0.11.4 (#181)
    • To SSH during a disk install for debugging, SSH as user "core" with port 2222
    • Remove the old trick of using a user "debug" during disk install

Google Cloud

  • Refactor out the controller internal module

Addons

  • Add Prometheus discovery for etcd peers on controller nodes (#175)
    • Scrape etcd v3.3 --listen-metrics-urls for metrics
    • Enable etcd alerts and populate the etcd Grafana dashboard
  • Update kube-state-metrics from v1.2.0 to v1.3.0

v1.10.0

  • Kubernetes v1.10.0
  • Remove unused, unmaintained pxe-worker internal module

AWS

  • Add disk_type optional variable for setting the EBS volume type (#176)
    • Change default type from standard to gp2. Prometheus etcd alerts are tuned for fast disks.

Digital Ocean

  • Ensure etcd secrets are only distributed to controller hosts, not workers.
  • Remove networking optional variable. Only flannel works on Digital Ocean.

Google Cloud

  • Add disk_size optional variable for setting instance disk size in GB
  • Add controller_type optional variable for setting machine type for controllers
  • Add worker_type optional variable for setting machine type for workers
  • Remove machine_type optional variable. Use controller_type and worker_type.

Addons

  • Update Grafana from v4.6.3 to v5.0.4 (#153, #174)
    • Restrict dashboard organization role to Viewer

v1.9.6

  • Kubernetes v1.9.6
  • Update Calico from v3.0.3 to v3.0.4

Addons

  • Update heapster from v1.5.1 to v1.5.2

v1.9.5

  • Kubernetes v1.9.5
  • Introduce Container Linux Config snippets on cloud platforms (#145)
    • Validate and additively merge custom Container Linux Configs during terraform plan
    • Define files, systemd units, dropins, networkd configs, mounts, users, and more
    • Require updating terraform-provider-ct plugin from v0.2.0 to v0.2.1
  • Add node-role.kubernetes.io/controller="true" node label to controllers (#160)

AWS

  • Require updating terraform-provider-ct plugin from v0.2.0 to v0.2.1 (action required!)

Digital Ocean

  • Require updating terraform-provider-ct plugin from v0.2.0 to v0.2.1 (action required!)

Google Cloud

  • Require updating terraform-provider-ct plugin from v0.2.0 to v0.2.1 (action required!)
  • Relax os_image to optional. Default to "coreos-stable".

Addons

  • Update nginx-ingress from 0.11.0 to 0.12.0
  • Update Prometheus from 2.2.0 to 2.2.1

v1.9.4

  • Kubernetes v1.9.4
  • Introduce worker pools for AWS and Google Cloud for joining heterogeneous workers to existing clusters.
  • Use new Network Load Balancers and cross zone load balancing on AWS
  • Allow flexvolume plugins to be used on any Typhoon cluster (not just bare-metal)
  • Upgrade etcd from v3.2.15 to v3.3.2
  • Update Calico from v3.0.2 to v3.0.3
  • Use kubernetes-incubator/bootkube v0.11.0
  • Recommend updating terraform-provider-ct plugin from v0.2.0 to v0.2.1 (action recommended)

AWS

  • Promote AWS platform to stable
  • Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) (#150)
  • Replace the apiserver elastic load balancer with a network load balancer (#136)
  • Replace the Ingress elastic load balancer with a network load balancer (#141)
    • AWS NLBs can handle millions of RPS with high throughput and low latency.
    • Require terraform-provider-aws 1.7.0 or higher
  • Enable NLB cross-zone load balancing (#159)
    • Requests are automatically evenly distributed to targets regardless of AZ
    • Require terraform-provider-aws 1.11.0 or higher
  • Add kubelet --volume-plugin-dir flag to allow flexvolume plugins (#142)
  • Fix controller and worker launch configs to ignore AMI changes (#126, #158)

Digital Ocean

  • Add kubelet --volume-plugin-dir flag to allow flexvolume plugins (#142)
  • Fix to pass ssh_fingerprints as a list to droplets (#143)

Google Cloud

  • Allow groups of workers to be defined and joined to a cluster (i.e. worker pools) (#148)
  • Add kubelet --volume-plugin-dir flag to allow flexvolume plugins (#142)
  • Add kubeconfig variable to controllers and workers submodules (#147)
  • Remove kubeconfig_* variables from controllers and workers submodules (#147)
  • Allow initial experimentation with accelerators (i.e. GPUs) on workers (#161) (unofficial)
    • Require terraform-provider-google v1.6.0

Addons

  • Update Prometheus from 2.1.0 to 2.2.0 (#153)
    • Scrape Prometheus itself to enable alerts about Prometheus itself
    • Adjust KubeletDown rule to fire when 10% of kubelets are down
  • Update heapster from v1.5.0 to v1.5.1 (#131)
    • Use separate service account
  • Update nginx-ingress from 0.10.2 to 0.11.0

v1.9.3

  • Kubernetes v1.9.3
  • Network improvements and fixes (#104)
    • Switch from Calico v2.6.6 to v3.0.2
    • Add Calico GlobalNetworkSet CRD
    • Update flannel from v0.9.0 to v0.10.0
    • Use separate service account for flannel
  • Update etcd from v3.2.14 to v3.2.15

Digital Ocean

  • Use new Droplet types which offer more CPU/memory, at lower cost. (#105)
    • A small Digital Ocean cluster costs less than $25 a month!

Addons

  • Update Prometheus from v2.0.0 to v2.1.0 (#113)
    • Improve alerting rules
    • Relabel discovered kubelet, endpoint, service, and apiserver scrapes
    • Use separate service accounts
    • Update node-exporter and kube-state-metrics
  • Include Grafana dashboards for Kubernetes admins (#113)
    • Add grafana-watcher to load bundled upstream dashboards
  • Update nginx-ingress from 0.9.0 to 0.10.2
  • Update CLUO from v0.5.0 to v0.6.0
  • Switch manifests to use apps/v1 Deployments and Daemonsets (#120)
  • Remove Kubernetes Dashboard manifests (#121)

v1.9.2

  • Kubernetes v1.9.2
  • Add Terraform v0.11.x support
    • Add explicit "providers" section to modules for Terraform v0.11.x
    • Retain support for Terraform v0.10.4+
  • Add migration guide from Terraform v0.10.x to v0.11.x (action required!)
  • Update etcd from 3.2.13 to 3.2.14
  • Update calico from 2.6.5 to 2.6.6
  • Update kube-dns from v1.14.7 to v1.14.8
  • Use separate service account for kube-dns
  • Use kubernetes-incubator/bootkube v0.10.0

Bare-Metal

  • Use per-node Container Linux install profiles (#97)
    • Allow Container Linux channel/version to be chosen per-cluster
    • Fix issue where cluster deletion could require terraform apply multiple times

Digital Ocean

  • Relax digitalocean provider version constraint
  • Fix bug with terraform plan always showing a firewall diff to be applied (#3)

Addons

  • Update CLUO to v0.5.0 to fix compatibility with Kubernetes 1.9 (important)
    • Earlier versions can't roll out Container Linux updates on Kubernetes 1.9 nodes (cluo#163)
  • Update kube-state-metrics from v1.1.0 to v1.2.0
  • Fix RBAC cluster role for kube-state-metrics

v1.9.1

  • Kubernetes v1.9.1
  • Update kube-dns from 1.14.5 to v1.14.7
  • Update etcd from 3.2.0 to 3.2.13
  • Update Calico from v2.6.4 to v2.6.5
  • Enable portmap to fix hostPort with Calico
  • Use separate service account for controller-manager

v1.8.6

  • Kubernetes v1.8.6
  • Update Calico from v2.6.3 to v2.6.4

v1.8.5

  • Kubernetes v1.8.5
  • Recommend Container Linux images with Docker 17.09
    • Container Linux stable, beta, and alpha now provide Docker 17.09 (instead of 1.12)
    • Older clusters (with CLUO addon) auto-update Container Linux version to begin using Docker 17.09
  • Fix race where etcd-member.service could fail to resolve peers (#69)
  • Add optional cluster_domain_suffix variable (#74)
  • Use kubernetes-incubator/bootkube v0.9.1

Bare-Metal

  • Add kubelet --volume-plugin-dir flag to allow flexvolume providers (#61)

Addons

  • Discourage deploying the Kubernetes Dashboard (security)

v1.8.4

  • Kubernetes v1.8.4
  • Calico related bug fixes
  • Update Calico from v2.6.1 to v2.6.3
  • Update flannel from v0.9.0 to v0.9.1
  • Service accounts for kube-proxy and pod-checkpointer
  • Use kubernetes-incubator/bootkube v0.9.0

v1.8.3

  • Kubernetes v1.8.3
  • Run etcd on-host, across controllers
  • Promote AWS platform to beta
  • Use kubernetes-incubator/bootkube v0.8.2

Google Cloud

  • Add required variable region (e.g. "us-central1")
  • Reduce time to bootstrap a cluster
  • Change etcd to run on-host, across controllers (etcd-member.service)
  • Change controller instances to automatically span zones in the region
  • Change worker managed instance group to automatically span zones in the region
  • Improve internal firewall rules and use tag-based firewall policies
  • Remove support for self-hosted etcd
  • Remove the zone required variable
  • Remove the controller_preemptible optional variable

AWS

  • Promote AWS platform to beta
  • Reduce time to bootstrap a cluster
  • Change etcd to run on-host, across controllers (etcd-member.service)
  • Fix firewall rules for multi-controller kubelet scraping and node-exporter
  • Remove support for self-hosted etcd

Addons

  • Add Prometheus 2.0 addon with alerting rules
  • Add Grafana dashboard for observing metrics

v1.8.2

  • Kubernetes v1.8.2
  • Switch to using the gcr.io/google_containers/hyperkube
  • Update flannel from v0.8.0 to v0.9.0
  • Add hairpinMode to flannel CNI config
  • Add --no-negcache to kube-dns dnsmasq
  • Use kubernetes-incubator/bootkube v0.8.1

v1.8.1

  • Kubernetes v1.8.1
  • Use kubernetes-incubator/bootkube v0.8.0

Digital Ocean

  • Run etcd cluster across controller nodes (etcd-member.service)
  • Remove support for self-hosted etcd
  • Reduce time to bootstrap a cluster

v1.7.7

  • Kubernetes v1.7.7
  • Use kubernetes-incubator/bootkube v0.7.0
  • Update kube-dns to 1.14.5 to fix dnsmasq vulnerability
  • Calico v2.6.1
  • flannel-cni v0.3.0
    • Update flannel CNI config to fix hostPort

v1.7.5

  • Kubernetes v1.7.5
  • Use kubernetes-incubator/bootkube v0.6.2
  • Add AWS Terraform module (alpha)
  • Add support for Calico networking (bare-metal, Google Cloud, AWS)
  • Change networking default from "flannel" to "calico"

AWS

  • Add network_mtu to allow CNI interface MTU customization

Bare-Metal

  • Add network_mtu to allow CNI interface MTU customization
  • Remove support for experimental_self_hosted_etcd

v1.7.3

  • Kubernetes v1.7.3
  • Use kubernetes-incubator/bootkube v0.6.1

Digital Ocean

  • Add cloud firewall rules (requires Terraform v0.10)
  • Change nodes tags from strings to DO tags

v1.7.1

  • Kubernetes v1.7.1
  • Use kubernetes-incubator/bootkube v0.6.0
  • Add Bare-Metal Terraform module (stable)
  • Add Digital Ocean Terraform module (beta)

Google Cloud

  • Remove k8s_domain_name variable, cluster_name + dns_zone resolves to controllers
  • Rename dns_base_zone to dns_zone
  • Rename dns_base_zone_name to dns_zone_name

v1.6.7

  • Kubernetes v1.6.7
  • Use kubernetes-incubator/bootkube v0.5.1

v1.6.6

  • Kubernetes v1.6.6
  • Use kubernetes-incubator/bootkube v0.4.5
  • Disable locksmithd on hosts, in favor of CLUO.

v1.6.4

  • Kubernetes v1.6.4
  • Add Google Cloud Terraform module (stable)

Earlier

Earlier versions, back to v1.3.0, used different designs and mechanisms.