1
0
Fork 0
mirror of https://github.com/poseidon/typhoon synced 2024-06-01 20:36:08 +02:00
Commit Graph

209 Commits

Author SHA1 Message Date
Dalton Hubble cbef202eec Update Prometheus discovery of kube components
* Kubernetes v1.22.0 disabled kube-controller-manager insecure
port, which was used internally for Prometheus metrics scraping
* Configure Prometheus to discover and scrape endpoints for
kube-scheduler and kube-controller-manager via the authenticated
https ports, via bearer token
* Change firewall ports to allow Prometheus (on worker nodes)
to scrape kube-scheduler and kube-controller-manager targets
that run on controller(s) with hostNetwork
* Disable the insecure port on kube-scheduler
2021-08-10 21:25:19 -07:00
Dalton Hubble 9bac641511 Update Kubernetes from v1.21.3 to v1.22.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#v1220
2021-08-04 22:09:19 -07:00
Dalton Hubble f03045f0dc Update Cilium for cgroups v2 support
* On Fedora CoreOS, Cilium cross-node service IP load balancing
stopped working for a time (first observable as CoreDNS pods
located on worker nodes not being able to reach the kubernetes
API service 10.3.0.1). This turned out to have two parts:
* Fedora CoreOS switched to cgroups v2 by default. In our early
testing with cgroups v2, Calico (default) was used. With the
cgroups v2 change, SELinux policy denied some eBPF operations.
Since fixed in all Fedora CoreOS channels
* Cilium requires new mounts to support cgroups v2, which are
added here

* https://github.com/coreos/fedora-coreos-tracker/issues/292
* https://github.com/coreos/fedora-coreos-tracker/issues/881
* https://github.com/cilium/cilium/pull/16259
2021-07-24 10:36:47 -07:00
Dalton Hubble 171fd2c998 Update Kubernetes from v1.21.2 to v1.21.3
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.3
2021-07-17 18:22:24 -07:00
Dalton Hubble 3a71b2ccb1 Update Cilium from v1.10.1 to v1.10.2
* https://github.com/cilium/cilium/releases/tag/v1.10.2
2021-07-04 10:11:21 -07:00
Dalton Hubble d0e73b8174 Bump terraform-render-bootstrap 2021-06-27 18:11:43 -07:00
Dalton Hubble 485feb82c4 Update CoreDNS from v1.8.0 to v1.8.4
* https://coredns.io/2021/01/20/coredns-1.8.1-release/
* https://coredns.io/2021/02/23/coredns-1.8.2-release/
* https://coredns.io/2021/02/24/coredns-1.8.3-release/
* https://coredns.io/2021/05/28/coredns-1.8.4-release/
2021-06-23 23:31:25 -07:00
Dalton Hubble 0b276b6b7e Update Kubernetes from v1.21.1 to v1.21.2
* https://github.com/kubernetes/kubernetes/releases/tag/v1.21.2
2021-06-17 16:15:20 -07:00
Dalton Hubble e8513e58bb Add support for Terraform v1.0.0
* https://github.com/hashicorp/terraform/releases/tag/v1.0.0
2021-06-17 13:32:56 -07:00
Dalton Hubble 996bdd9112 Update Calico from v3.19.0 to v3.19.1
* https://docs.projectcalico.org/archive/v3.19/release-notes/
2021-06-02 14:51:15 -07:00
Dalton Hubble 966fd280b0 Update Cilium from v0.10.0-rc1 to v0.10.0
* https://github.com/cilium/cilium/releases/tag/v1.10.0
2021-05-24 11:16:51 -07:00
Dalton Hubble e4e074c894 Update Cilium from v1.9.6 to v1.10.0-rc1
* Add multi-arch container images and arm64 support
* https://github.com/cilium/cilium/releases/tag/v1.10.0-rc1
2021-05-14 14:24:52 -07:00
Dalton Hubble 2076a779a3 Update Kubernetes from v1.21.0 to v1.21.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1211
2021-05-13 11:23:26 -07:00
Dalton Hubble 9c842395a8 Update Cilium from v1.9.5 to v1.9.6
* https://github.com/cilium/cilium/releases/tag/v1.9.6
2021-04-26 10:55:23 -07:00
Dalton Hubble 67047ead08 Update Terraform version to allow v0.15.0
* Require Terraform version v0.13 <= x < v0.16
2021-04-16 09:46:01 -07:00
Dalton Hubble 34e8db7aae Update static Pod manifests for Kubernetes v1.21.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/257
2021-04-11 15:05:46 -07:00
Dalton Hubble d73621c838 Update Kubernetes from v1.20.5 to v1.21.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md#v1210
2021-04-08 21:44:31 -07:00
Dalton Hubble 798ec9a92f Change CNI config directory to /etc/cni/net.d
* Change CNI config directory from `/etc/kubernetes/cni/net.d`
to `/etc/cni/net.d` (Kubelet default)
* https://github.com/poseidon/terraform-render-bootstrap/pull/255
2021-04-02 00:03:48 -07:00
Dalton Hubble 597ca4acce Update CoreDNS from v1.7.0 to v1.8.0
* https://github.com/poseidon/terraform-render-bootstrap/pull/254
2021-03-20 16:47:25 -07:00
Dalton Hubble 796149d122 Update Kubernetes from v1.20.4 to v1.20.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1205
2021-03-19 11:27:31 -07:00
Dalton Hubble a66bccd590 Update Cilium from v1.9.4 to v1.9.5
* https://github.com/cilium/cilium/releases/tag/v1.9.5
2021-03-14 11:48:22 -07:00
Dalton Hubble 30b1edfcc6 Mark bootstrap token as sensitive in plan/apply
* Mark the bootstrap token as sensitive, which is useful when
Terraform is run in automated CI/CD systems to avoid showing
the token
* https://github.com/poseidon/terraform-render-bootstrap/pull/251
2021-03-14 11:32:35 -07:00
Dalton Hubble a4afe06b64 Update Calico from v3.17.3 to v3.18.1
* https://docs.projectcalico.org/archive/v3.18/release-notes/
2021-03-14 10:35:24 -07:00
Dalton Hubble e76fe80b45 Update Kubernetes from v1.20.3 to v1.20.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1204
2021-02-19 00:02:07 -08:00
Dalton Hubble 32853aaa7b Update Kubernetes from v1.20.2 to v1.20.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1203
2021-02-17 22:29:33 -08:00
Dalton Hubble 9671b1c734 Update flannel-cni from v0.4.1 to v0.4.2
* https://github.com/poseidon/flannel-cni/releases/tag/v0.4.2
2021-02-14 12:04:59 -08:00
Dalton Hubble 18165d8076 Update Calico from v3.17.1 to v3.17.2
* https://github.com/projectcalico/calico/releases/tag/v3.17.2
2021-02-04 22:03:51 -08:00
Dalton Hubble 50acf28ce5 Update Cilium from v1.9.3 to v1.9.4
* https://github.com/cilium/cilium/releases/tag/v1.9.4
2021-02-03 23:08:22 -08:00
Dalton Hubble ab793eb842 Update Cilium from v1.9.2 to v1.9.3
* https://github.com/cilium/cilium/releases/tag/v1.9.3
2021-01-26 17:13:52 -08:00
Dalton Hubble b74c958524 Update Cilium from v1.9.1 to v1.9.2
* https://github.com/cilium/cilium/releases/tag/v1.9.2
2021-01-20 22:06:45 -08:00
Dalton Hubble 05f7df9e80 Update Kubernetes from v1.20.1 to v1.20.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1202
2021-01-13 17:46:51 -08:00
Dalton Hubble 646bdd78e4 Update Kubernetes from v1.20.0 to v1.20.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1201
2020-12-19 12:56:28 -08:00
Dalton Hubble ee9ce3d0ab Update Calico from v3.17.0 to v3.17.1
* https://github.com/projectcalico/calico/releases/tag/v3.17.1
2020-12-10 22:48:38 -08:00
Dalton Hubble a8b8a9b454 Update Kubernetes from v1.20.0-rc.0 to v1.20.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200
2020-12-08 18:28:13 -08:00
Dalton Hubble 968febb050 Add support for Terraform v0.14.x
* Support Terraform v0.13.x and v0.14.x
2020-12-07 00:22:38 -08:00
Dalton Hubble bee455f83a Update Cilium from v1.9.0 to v1.9.1
* https://github.com/cilium/cilium/releases/tag/v1.9.1
2020-12-04 14:14:18 -08:00
Dalton Hubble e77dd6ecd4 Update Kubernetes from v1.19.4 to v1.20.0-rc.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#v1200-rc0
2020-12-03 16:01:28 -08:00
Dalton Hubble 804dfea0f9 Add kubeconfig's for kube-scheduler and kube-controller-manager
* Generate TLS client certificates for `kube-scheduler` and
`kube-controller-manager` with `system:kube-scheduler` and
`system:kube-controller-manager` CNs
* Template separate kubeconfigs for kube-scheduler and
kube-controller manager (`scheduler.conf` and
`controller-manager.conf`). Rename admin for clarity
* Before v1.16.0, Typhoon scheduled a self-hosted control
plane, which allowed the steady-state kube-scheduler and
kube-controller-manager to use a scoped ServiceAccount.
With a static pod control plane, separate CN TLS client
certificates are the nearest equiv.
* https://kubernetes.io/docs/setup/best-practices/certificates/
* Remove unused Kubelet certificate, TLS bootstrap is used
instead
2020-12-01 22:02:15 -08:00
Dalton Hubble 8ba23f364c Add TokenReview and TokenRequestProjection flags
* Add kube-apiserver flags for TokenReview and TokenRequestProjection
(beta, defaults on) to allow using Service Account Token Volume
Projection to create and mount service account tokens tied to a Pod's
lifecycle

Rel:

* https://github.com/poseidon/terraform-render-bootstrap/pull/231
* https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection
2020-12-01 20:02:33 -08:00
Dalton Hubble ae548ce213 Update Calico from v3.16.5 to v3.17.0
* Enable Calico MTU auto-detection
* Remove [workaround](https://github.com/poseidon/typhoon/pull/724) to
Calico cni-plugin [issue](https://github.com/projectcalico/cni-plugin/issues/874)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/230
2020-11-25 14:22:58 -08:00
Dalton Hubble 1113a22f61 Update Kubernetes from v1.19.3 to v1.19.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1194
2020-11-11 22:56:27 -08:00
Dalton Hubble 79deb8a967 Update Cilium from v1.9.0-rc3 to v1.9.0
* https://github.com/cilium/cilium/releases/tag/v1.9.0
2020-11-10 23:42:41 -08:00
Dalton Hubble f412f0d9f2 Update Calico from v3.16.4 to v3.16.5
* https://github.com/projectcalico/calico/releases/tag/v3.16.5
2020-11-10 22:58:19 -08:00
Dalton Hubble 82e5ac3e7c Update Cilium from v1.8.5 to v1.9.0-rc3
* https://github.com/poseidon/terraform-render-bootstrap/pull/224
2020-11-03 10:29:07 -08:00
Dalton Hubble a8f7880511 Update Cilium from v1.8.4 to v1.8.5
* https://github.com/cilium/cilium/releases/tag/v1.8.5
2020-10-29 00:50:18 -07:00
Dalton Hubble 893d139590 Update Calico from v3.16.3 to v3.16.4
* https://github.com/projectcalico/calico/releases/tag/v3.16.4
2020-10-26 00:50:40 -07:00
Dalton Hubble afac46e39a Remove asset_dir variable and optional asset writes
* Originally, poseidon/terraform-render-bootstrap generated
TLS certificates, manifests, and cluster "assets" written
to local disk (`asset_dir`) during terraform apply cluster
bootstrap
* Typhoon v1.17.0 introduced bootstrapping using only Terraform
state to store cluster assets, to avoid ever writing sensitive
materials to disk and improve automated use-cases. `asset_dir`
was changed to optional and defaulted to "" (no writes)
* Typhoon v1.18.0 deprecated the `asset_dir` variable, removed
docs, and announced it would be deleted in future.
* Add Terraform output `assets_dir` map
* Remove the `asset_dir` variable

Cluster assets are now stored in Terraform state only. For those
who wish to write those assets to local files, this is possible
doing so explicitly.

```
resource local_file "assets" {
  for_each = module.yavin.assets_dist
  filename = "some-assets/${each.key}"
  content = each.value
}
```

Related:

* https://github.com/poseidon/typhoon/pull/595
* https://github.com/poseidon/typhoon/pull/678
2020-10-17 15:00:15 -07:00
Dalton Hubble 511f5272f4 Update Calico from v3.15.3 to v3.16.3
* https://github.com/projectcalico/calico/releases/tag/v3.16.3
* https://github.com/poseidon/terraform-render-bootstrap/pull/212
2020-10-15 20:08:51 -07:00
Dalton Hubble 46ca5e8813 Update Kubernetes from v1.19.2 to v1.19.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1193
2020-10-14 20:47:49 -07:00
Dalton Hubble 901f7939b2 Update Cilium from v1.8.3 to v1.8.4
* https://github.com/cilium/cilium/releases/tag/v1.8.4
2020-10-02 00:24:26 -07:00
Dalton Hubble 444363be2d Update Kubernetes from v1.19.1 to v1.19.2
* Update flannel from v0.12.0 to v0.13.0-rc2
* Update flannel-cni from v0.4.0 to v0.4.1
* Update CNI plugins from v0.8.6 to v0.8.7
2020-09-16 20:05:54 -07:00
Dalton Hubble 29b16c3fc0 Change seccomp annotations to seccompProfile
* seccomp graduated to GA in Kubernetes v1.19. Support for
seccomp alpha annotations will be removed in v1.22
* Replace seccomp annotations with the GA seccompProfile
field in the PodTemplate securityContext
* Switch profile from `docker/default` to `runtime/default`
(no effective change, since docker is the runtime)
* Verify with docker inspect SecurityOpt. Without the profile,
you'd see `seccomp=unconfined`

Related: https://github.com/poseidon/terraform-render-bootstrap/pull/215
2020-09-10 01:15:07 -07:00
Dalton Hubble 0c7a879bc4 Update Kubernetes from v1.19.0 to v1.19.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md#v1191
2020-09-09 20:52:29 -07:00
Dalton Hubble 28ee693e6b Update Cilium from v1.8.2 to v1.8.3
* https://github.com/cilium/cilium/releases/tag/v1.8.3
2020-09-07 21:10:27 -07:00
Dalton Hubble 88cf7273dc Update Kubernetes from v1.18.8 to v1.19.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.19.md
2020-08-27 08:50:01 -07:00
Dalton Hubble c87db3ef37 Update Kubernetes from v1.18.6 to v1.18.8
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1188
2020-08-13 20:47:43 -07:00
Dalton Hubble 5e70d7e2c8 Migrate from Terraform v0.12.x to v0.13.x
* Recommend Terraform v0.13.x
* Support automatic install of poseidon's provider plugins
* Update tutorial docs for Terraform v0.13.x
* Add migration guide for Terraform v0.13.x (best-effort)
* Require Terraform v0.12.26+ (migration compatibility)
* Require `terraform-provider-ct` v0.6.1
* Require `terraform-provider-matchbox` v0.4.1
* Require `terraform-provider-digitalocean` v1.20+

Related:

* https://www.hashicorp.com/blog/announcing-hashicorp-terraform-0-13/
* https://www.terraform.io/upgrade-guides/0-13.html
* https://registry.terraform.io/providers/poseidon/ct/latest
* https://registry.terraform.io/providers/poseidon/matchbox/latest
2020-08-12 01:54:32 -07:00
Dalton Hubble ccee5d3d89 Update from coreos/flannel-cni to poseidon/flannel-cni
* Update CNI plugins from v0.6.0 to v0.8.6 to fix several CVEs
* Update the base image to alpine:3.12
* Use `flannel-cni` as an init container and remove sleep
* https://github.com/poseidon/terraform-render-bootstrap/pull/205
* https://github.com/poseidon/flannel-cni
* https://quay.io/repository/poseidon/flannel-cni

Background

* Switch from github.com/coreos/flannel-cni v0.3.0 which was last
published by me in 2017 and is no longer accessible to me to maintain
or patch
* Port to the poseidon/flannel-cni rewrite, which releases v0.4.0
to continue the prior release numbering
2020-08-02 15:13:15 -07:00
Dalton Hubble cd0a28904e Update Cilium from v1.8.1 to v1.8.2
* https://github.com/cilium/cilium/releases/tag/v1.8.2
2020-07-25 16:06:27 -07:00
Dalton Hubble 618f8b30fd Update CoreDNS from v1.6.7 to v1.7.0
* https://coredns.io/2020/06/15/coredns-1.7.0-release/
* Update Grafana dashboard with revised metrics names
2020-07-25 15:51:31 -07:00
Dalton Hubble 9ea6d2c245 Update Kubernetes from v1.18.5 to v1.18.6
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1186
* https://github.com/poseidon/terraform-render-bootstrap/pull/201
2020-07-15 22:05:57 -07:00
Dalton Hubble 49050320ce Update Cilium from v1.8.0 to v1.8.1
* https://github.com/cilium/cilium/releases/tag/v1.8.1
2020-07-05 16:00:00 -07:00
Dalton Hubble 7bce15975c Update Kubernetes from v1.18.4 to v1.18.5
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1185
2020-06-27 13:52:18 -07:00
Dalton Hubble 1f83ae7dbb Update Calico from v3.14.1 to v3.15.0
* https://docs.projectcalico.org/v3.15/release-notes/
2020-06-26 02:40:12 -07:00
Dalton Hubble d27f367004 Update Cilium from v1.8.0-rc4 to v1.8.0
* https://github.com/cilium/cilium/releases/tag/v1.8.0
2020-06-22 22:26:49 -07:00
Dalton Hubble e9c8520359 Add experimental Cilium CNI provider
* Accept experimental CNI `networking` mode "cilium"
* Run Cilium v1.8.0-rc4 with overlay vxlan tunnels and a
minimal set of features. We're interested in:
  * IPAM: Divide pod_cidr into /24 subnets per node
  * CNI networking pod-to-pod, pod-to-external
  * BPF masquerade
  * NetworkPolicy as defined by Kubernetes (no L7 Policy)
* Continue using kube-proxy with Cilium probe mode
* Firewall changes:
  * Require UDP 8472 for vxlan (Linux kernel default) between nodes
  * Optional ICMP echo(8) between nodes for host reachability
    (health)
  * Optional TCP 4240 between nodes for endpoint reachability (health)

Known Issues:

* Containers with `hostPort` don't listen on all host addresses,
these workloads must use `hostNetwork` for now
https://github.com/cilium/cilium/issues/12116
* Erroneous warning on Fedora CoreOS
https://github.com/cilium/cilium/issues/10256

Note: This is experimental. It is not listed in docs and may be
changed or removed without a deprecation notice

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/192
* https://github.com/cilium/cilium/issues/12217
2020-06-21 20:41:53 -07:00
Dalton Hubble 90e23f5822 Rename controller node label and NoSchedule taint
* Remove node label `node.kubernetes.io/master` from controller nodes
* Use `node.kubernetes.io/controller` (present since v1.9.5,
[#160](https://github.com/poseidon/typhoon/pull/160)) to node select controllers
* Rename controller NoSchedule taint from `node-role.kubernetes.io/master` to
`node-role.kubernetes.io/controller`
* Tolerate the new taint name for workloads that may run on controller nodes
and stop tolerating `node-role.kubernetes.io/master` taint
2020-06-19 00:12:13 -07:00
Dalton Hubble c25c59058c Update Kubernetes from v1.18.3 to v1.18.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1184
2020-06-17 19:53:19 -07:00
Dalton Hubble 96711d7f17 Remove unused Kubelet cert / key Terraform state
* Generated Kubelet TLS certificate and key are not longer
used or distributed to machines since Kubelet TLS bootstrap
is used instead. Remove the certificate and key from state
2020-06-11 21:24:36 -07:00
Dalton Hubble ba44408b76 Update Calico from v3.14.0 to v3.14.1
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-30 22:08:37 -07:00
Dalton Hubble ecae6679ff Update Kubernetes from v1.18.2 to v1.18.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-05-20 20:37:39 -07:00
Dalton Hubble a2db4fa8c4 Update Calico from v3.13.3 to v3.14.0
* https://docs.projectcalico.org/v3.14/release-notes/
2020-05-09 16:05:30 -07:00
Dalton Hubble 358854e712 Fix Calico install-cni crash loop on Pod restarts
* Set a consistent MCS level/range for Calico install-cni
* Note: Rebooting a node was a workaround, because Kubelet
relabels /etc/kubernetes(/cni/net.d)

Background:

* On SELinux enforcing systems, the Calico CNI install-cni
container ran with default SELinux context and a random MCS
pair. install-cni places CNI configs by first creating a
temporary file and then moving them into place, which means
the file MCS categories depend on the containers SELinux
context.
* calico-node Pod restarts creates a new install-cni container
with a different MCS pair that cannot access the earlier
written file (it places configs every time), causing the
init container to error and calico-node to crash loop
* https://github.com/projectcalico/cni-plugin/issues/874

```
mv: inter-device move failed: '/calico.conf.tmp' to
'/host/etc/cni/net.d/10-calico.conflist'; unable to remove target:
Permission denied
Failed to mv files. This may be caused by selinux configuration on
the
host, or something else.
```

Note, this isn't a host SELinux configuration issue.

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/186
2020-05-09 16:01:44 -07:00
Dalton Hubble fd044ee117 Enable Kubelet TLS bootstrap and NodeRestriction
* Enable bootstrap token authentication on kube-apiserver
* Generate the bootstrap.kubernetes.io/token Secret that
may be used as a bootstrap token
* Generate a bootstrap kubeconfig (with a bootstrap token)
to be securely distributed to nodes. Each Kubelet will use
the bootstrap kubeconfig to authenticate to kube-apiserver
as `system:bootstrappers` and send a node-unique CSR for
kube-controller-manager to automatically approve to issue
a Kubelet certificate and kubeconfig (expires in 72 hours)
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the `system:node-bootstrapper`
ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr nodeclient ClusterRole
* Add ClusterRoleBinding for bootstrap token subjects
(`system:bootstrappers`) to have the csr selfnodeclient ClusterRole
* Enable NodeRestriction admission controller to limit the
scope of Node or Pod objects a Kubelet can modify to those of
the node itself
* Ability for a Kubelet to delete its Node object is retained
as preemptible nodes or those in auto-scaling instance groups
need to be able to remove themselves on shutdown. This need
continues to have precedence over any risk of a node deleting
itself maliciously

Security notes:

1. Issued Kubelet certificates authenticate as user `system:node:NAME`
and group `system:nodes` and are limited in their authorization
to perform API operations by Node authorization and NodeRestriction
admission. Previously, a Kubelet's authorization was broader. This
is the primary security motivation.

2. The bootstrap kubeconfig credential has the same sensitivity
as the previous generated TLS client-certificate kubeconfig.
It must be distributed securely to nodes. Its compromise still
allows an attacker to obtain a Kubelet kubeconfig

3. Bootstrapping Kubelet kubeconfig's with a limited lifetime offers
a slight security improvement.
  * An attacker who obtains the kubeconfig can likely obtain the
  bootstrap kubeconfig as well, to obtain the ability to renew
  their access
  * A compromised bootstrap kubeconfig could plausibly be handled
  by replacing the bootstrap token Secret, distributing the token
  to new nodes, and expiration. Whereas a compromised TLS-client
  certificate kubeconfig can't be revoked (no CRL). However,
  replacing a bootstrap token can be impractical in real cluster
  environments, so the limited lifetime is mostly a theoretical
  benefit.
  * Cluster CSR objects are visible via kubectl which is nice

4. Bootstrapping node-unique Kubelet kubeconfigs means Kubelet
clients have more identity information, which can improve the
utility of audits and future features

Rel: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/185
2020-04-28 19:35:33 -07:00
Dalton Hubble 38a6bddd06 Update Calico from v3.13.1 to v3.13.3
* https://docs.projectcalico.org/v3.13/release-notes/
2020-04-23 23:58:02 -07:00
Dalton Hubble 671eacb86e Update Kubernetes from v1.18.1 to v1.18.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#changelog-since-v1181
2020-04-16 23:40:52 -07:00
Dalton Hubble 73af2f3b7c Update Kubernetes from v1.18.0 to v1.18.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#v1181
2020-04-08 19:41:48 -07:00
Dalton Hubble 135c6182b8 Update flannel from v0.11.0 to v0.12.0
* https://github.com/coreos/flannel/releases/tag/v0.12.0
2020-03-31 18:31:59 -07:00
Dalton Hubble bac5acb3bd Change default kube-system DaemonSet tolerations
* Change kube-proxy, flannel, and calico-node DaemonSet
tolerations to tolerate `node.kubernetes.io/not-ready`
and `node-role.kubernetes.io/master` (i.e. controllers)
explicitly, rather than tolerating all taints
* kube-system DaemonSets will no longer tolerate custom
node taints by default. Instead, custom node taints must
be enumerated to opt-in to scheduling/executing the
kube-system DaemonSets
* Consider setting the daemonset_tolerations variable
of terraform-render-bootstrap at a later date

Background: Tolerating all taints ruled out use-cases
where certain nodes might legitimately need to keep
kube-proxy or CNI networking disabled
Related: https://github.com/poseidon/terraform-render-bootstrap/pull/179
2020-03-31 01:00:45 -07:00
Dalton Hubble f100a90d28 Update Kubernetes from v1.17.4 to v1.18.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md
2020-03-25 17:51:50 -07:00
Dalton Hubble 590d941f50 Switch from upstream hyperkube image to individual images
* Kubernetes plans to stop releasing the hyperkube container image
* Upstream will continue to publish `kube-apiserver`, `kube-controller-manager`,
`kube-scheduler`, and `kube-proxy` container images to `k8s.gcr.io`
* Upstream will publish Kubelet only as a binary for distros to package,
either as a DEB/RPM on traditional distros or a container image on
container-optimized operating systems
* Typhoon will package the upstream Kubelet (checksummed) and its
dependencies as a container image for use on CoreOS Container Linux,
Flatcar Linux, and Fedora CoreOS
* Update the Typhoon container image security policy to list
`quay.io/poseidon/kubelet`as an official distributed artifact

Hyperkube: https://github.com/kubernetes/kubernetes/pull/88676
Kubelet Container Image: https://github.com/poseidon/kubelet
Kubelet Quay Repo: https://quay.io/repository/poseidon/kubelet
2020-03-21 15:43:05 -07:00
Dalton Hubble bc7902f40a Update Kubernetes from v1.17.3 to v1.17.4
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1174
2020-03-13 00:06:41 -07:00
Dalton Hubble 70bf39bb9a Update Calico from v3.12.0 to v3.13.1
* https://docs.projectcalico.org/v3.13/release-notes/
2020-03-12 23:00:38 -07:00
Dalton Hubble 4a38fb5927 Update CoreDNS from v1.6.6 to v1.6.7
* https://coredns.io/2020/01/28/coredns-1.6.7-release/
2020-02-18 21:46:19 -08:00
Dalton Hubble 1243f395d1 Update Kubernetes from v1.17.2 to v1.17.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#v1173
2020-02-11 20:22:14 -08:00
Dalton Hubble ca96a1335c Update Calico from v3.11.2 to v3.12.0
* https://docs.projectcalico.org/release-notes/#v3120
* Remove reverse packet filter override, since Calico no
longer relies on the setting
* https://github.com/coreos/fedora-coreos-tracker/issues/219
* https://github.com/projectcalico/felix/pull/2189
2020-02-06 00:43:33 -08:00
Dalton Hubble 1cda5bcd2a Update Kubernetes from v1.17.1 to v1.17.2
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1172
2020-01-21 18:27:39 -08:00
Dalton Hubble 7daabd28b5 Update Calico from v3.11.1 to v3.11.2
* https://docs.projectcalico.org/v3.11/release-notes/
2020-01-18 13:45:24 -08:00
Dalton Hubble b642e3b41b Update Kubernetes from v1.17.0 to v1.17.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md#v1171
2020-01-14 20:21:36 -08:00
Dalton Hubble 43e05b9131 Enable kube-proxy metrics and allow Prometheus scrapes
* Configure kube-proxy --metrics-bind-address=0.0.0.0 (default
127.0.0.1) to serve metrics on 0.0.0.0:10249
* Add firewall rules to allow Prometheus (resides on a worker) to
scrape kube-proxy service endpoints on controllers or workers
* Add a clusterIP: None service for kube-proxy endpoint discovery
2020-01-06 21:11:18 -08:00
Dalton Hubble 11565ffa8a Update Calico from v3.10.2 to v3.11.1
* https://docs.projectcalico.org/v3.11/release-notes/
2019-12-28 11:08:03 -08:00
Dalton Hubble daa8d9d9ec Update CoreDNS from v1.6.5 to v1.6.6
* https://coredns.io/2019/12/11/coredns-1.6.6-release/
2019-12-22 10:47:19 -05:00
Dalton Hubble c0ce04e1de Update Calico from v3.10.1 to v3.10.2
* https://docs.projectcalico.org/v3.10/release-notes/
2019-12-09 21:03:00 -08:00
Dalton Hubble de36d99afc Update Kubernetes from v1.16.3 to v1.17.0
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.17.md/#v1170
2019-12-09 18:31:58 -08:00
Dalton Hubble 4fce9485c8 Reduce kube-controller-manager pod eviction timeout from 5m to 1m
* Reduce time to delete pods on unready nodes from 5m to 1m
* Present since v1.13.3, but mistakenly removed in v1.16.0 static
pod control plane migration

Related:

* https://github.com/poseidon/terraform-render-bootstrap/pull/148
* https://github.com/poseidon/terraform-render-bootstrap/pull/164
2019-12-08 22:58:31 -08:00
Dalton Hubble 2837275265 Introduce cluster creation without local writes to asset_dir
* Allow generated assets (TLS materials, manifests) to be
securely distributed to controller node(s) via file provisioner
(i.e. ssh-agent) as an assets bundle file, rather than relying
on assets being locally rendered to disk in an asset_dir and
then securely distributed
* Change `asset_dir` from required to optional. Left unset,
asset_dir defaults to "" and no assets will be written to
files on the machine that runs terraform apply
* Enhancement: Managed cluster assets are kept only in Terraform
state, which supports different backends (GCS, S3, etcd, etc) and
optional encryption. terraform apply accesses state, runs in-memory,
and distributes sensitive materials to controllers without making
use of local disk (simplifies use in CI systems)
* Enhancement: Improve asset unpack and layout process to position
etcd certificates and control plane certificates more cleanly,
without unneeded secret materials

Details:

* Terraform file provisioner support for distributing directories of
contents (with unknown structure) has been limited to reading from a
local directory, meaning local writes to asset_dir were required.
https://github.com/poseidon/typhoon/issues/585 discusses the problem
and newer or upcoming Terraform features that might help.
* Observation: Terraform provisioner support for single files works
well, but iteration isn't viable. We're also constrained to Terraform
language features on the apply side (no extra plugins, no shelling out)
and CoreOS / Fedora tools on the receive side.
* Take a map representation of the contents that would have been splayed
out in asset_dir and pack/encode them into a single file format devised
for easy unpacking. Use an awk one-liner on the receive side to unpack.
In pratice, this has worked well and its rather nice that a single
assets file is transferred by file provisioner (all or none)

Rel: https://github.com/poseidon/terraform-render-bootstrap/pull/162
2019-12-05 01:24:50 -08:00
Dalton Hubble 4b485a9bf2 Fix recent deletion of bootstrap module pinned SHA
* Fix deletion of bootstrap module pinned SHA, which was
introduced recently through an automation mistake creating
https://github.com/poseidon/typhoon/pull/589
2019-11-21 22:34:09 -08:00
Dalton Hubble 0e4ee5efc9 Add small CPU resource requests to static pods
* Set small CPU requests on static pods kube-apiserver,
kube-controller-manager, and kube-scheduler to align with
upstream tooling and for edge cases
* Effectively, a practical case for these requests hasn't been
observed. However, a small static pod CPU request may offer
a slight benefit if a controller became overloaded and the
below mechanisms were insufficient

Existing safeguards:

* Control plane nodes are tainted to isolate them from
ordinary workloads. Even dense workloads can only compress
CPU resources on worker nodes.
* Control plane static pods use the highest priority class, so
contention favors control plane pods (over say node-exporter)
and CPU is compressible too.

See: https://github.com/poseidon/terraform-render-bootstrap/pull/161
2019-11-13 17:18:45 -08:00
Dalton Hubble a271b9f340 Update CoreDNS from v1.6.2 to v1.6.5
* Add health `lameduck` option 5s. Before CoreDNS shuts down, it will
wait and report unhealthy for 5s to allow time for plugins to shutdown
cleanly
* Minor bug fixes over a few releases
* https://coredns.io/2019/08/31/coredns-1.6.3-release/
* https://coredns.io/2019/09/27/coredns-1.6.4-release/
* https://coredns.io/2019/11/05/coredns-1.6.5-release/
2019-11-13 16:47:44 -08:00
Dalton Hubble cb0598e275 Adopt Terraform v0.12 templatefile function
* Update terraform-render-bootstrap module to adopt the
Terrform v0.12 templatefile function feature to replace
the use of terraform-provider-template's `template_dir`
* Require Terraform v0.12.6+ which adds `for_each`

Background:

* `template_dir` was added to `terraform-provider-template`
to add support for template directory rendering in CoreOS
Tectonic Kubernetes distribution (~2017)
* Terraform v0.12 introduced a native `templatefile` function
and v0.12.6 introduced native `for_each` support (July 2019)
that makes it possible to replace `template_dir` usage
2019-11-13 16:33:36 -08:00
Dalton Hubble d7061020ba Update Kubernetes from v1.16.2 to v1.16.3
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1163
2019-11-13 13:05:15 -08:00
Dalton Hubble 0034a15711 Update Calico from v3.10.0 to v3.10.1
* https://docs.projectcalico.org/v3.10/release-notes/
2019-11-07 11:38:32 -08:00
Dalton Hubble 4775e9d0f7 Upgrade Calico v3.9.2 to v3.10.0
* Allow advertising Kubernetes service ClusterIPs to BGPPeer
routers via a BGPConfiguration
* Improve EdgeRouter docs about routes and BGP
* https://docs.projectcalico.org/v3.10/release-notes/
* https://docs.projectcalico.org/v3.10/networking/advertise-service-ips
2019-10-27 14:13:41 -07:00
Dalton Hubble d418045929 Switch kube-proxy from iptables mode to ipvs mode
* Kubernetes v1.11 considered kube-proxy IPVS mode GA
* Many problems were found #321
* Since then, major blockers seem to have been addressed
2019-10-27 00:37:41 -07:00
Dalton Hubble 24fc440d83 Update Kubernetes from v1.16.1 to v1.16.2
* Update Calico from v3.9.1 to v3.9.2
2019-10-15 22:42:52 -07:00
Dalton Hubble d874bdd17d Update bootstrap module control plane manifests and type constraints
* Remove unneeded control plane flags that correspond to defaults
* Adopt Terraform v0.12 type constraints in bootstrap module
2019-10-06 21:09:30 -07:00
Dalton Hubble 5b9dab6659 Introduce list of detail objects for bare-metal machines
* Define bare-metal `controllers` and `workers` as a complex type
list(object{name=string, mac=string, domain=string}) to allow
clusters with many machines to be defined more cleanly
* Remove `controller_names` list variable
* Remove `controller_macs` list variable
* Remove `controller_domains` list variable
* Remove `worker_names` list variable
* Remove `worker_macs` list variable
* Remove `worker_domains` list variable
2019-10-06 20:22:45 -07:00
Dalton Hubble 1c5ed84fc2 Update Kubernetes from v1.16.0 to v1.16.1
* https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.16.md#v1161
2019-10-02 21:31:55 -07:00
Dalton Hubble a6de245d8a Rename bootkube.tf to bootstrap.tf
* Typhoon no longer uses the bootkube project
2019-09-29 11:30:49 -07:00