mirror of https://gitlab.archlinux.org/archlinux/infrastructure.git synced 2025-01-18 08:06:16 +01:00

Official Arch Linux Infrastructure Repository

Go to file

Christian Heusel eda56c0e01 Onboarding/Offboarding: tag irc group contacts		2025-01-17 01:32:23 +01:00
.gitlab/issue_templates	Onboarding/Offboarding: tag irc group contacts	2025-01-17 01:32:23 +01:00
docs	docs: Fix incorrect CX22 disk size	2024-12-30 05:14:29 +01:00
group_vars	Offboard coderobe as Package Maintainer	2025-01-16 18:54:18 +01:00
host_vars	Add alert for Fastly cost	2024-12-15 18:06:39 +01:00
library	Fix dbscripts dir generation for python3	2018-05-31 10:54:36 +02:00
misc	Reduce heftig's access to the default vault and the Matrix server	2024-12-21 16:25:20 +01:00
one-shots	keycloak: remove /auth from all Keycloak endpoints	2022-11-12 17:33:36 +02:00
packer	Fix packer and update plugins	2024-11-19 11:34:37 +01:00
playbooks	Remove obsolete "pacman-website" playbook	2024-12-31 16:27:31 +01:00
plugins/callback	plugins: add auto_tags plugin	2017-12-07 14:14:33 +01:00
pubkeys	Offboard coderobe as Package Maintainer	2025-01-16 18:54:18 +01:00
roles	aurweb: send traces to correct route for the otlp endpoint	2025-01-16 05:25:11 +01:00
tf-stage1	Remove obsolete dev.archlinux.org subdomain	2025-01-12 01:31:13 +01:00
tf-stage2	tf-stage2: use 1h timeout for keycloak pw resets	2024-12-02 19:00:49 +01:00
.ansible-lint	lint: allow role includes with unprefixed role var	2023-09-22 23:24:47 +03:00
.editorconfig	.editorconfig: Add	2019-07-04 21:41:04 +02:00
.gitignore	Add basic terraform config	2019-02-14 06:45:18 +01:00
.gitlab-ci.yml	ci: install jmespath for ansible-lint	2022-09-19 00:16:55 +03:00
ansible.cfg	Improve ansible logging to be more human readable	2024-11-19 06:40:13 +01:00
hosts	Add Mumble server	2024-11-26 14:30:42 +01:00
LICENSE	Add MIT License	2021-02-11 21:01:19 +01:00
README.md	fix small typo the README.md	2024-01-19 23:17:15 +01:00

README.md

Arch Infrastructure

This repository contains the complete collection of ansible playbooks and roles for the Arch Linux infrastructure.

[[TOC]]

Requirements

Install these packages:

terraform
python-click
python-jmespath
moreutils (for playbooks/tasks/reencrypt-vault-key.yml)

Instructions

All systems are set up the same way. For the first time setup in the Hetzner rescue system, run the provisioning script: ansible-playbook playbooks/tasks/install_arch.yml -l $host. The provisioning script configures a sane basic systemd with sshd. By design, it is NOT idempotent. After the provisioning script has run, it is safe to reboot.

Once in the new system, run the regular playbook: HCLOUD_TOKEN=$(misc/get_key.py misc/vaults/vault_hetzner.yml hetzner_cloud_api_key) ansible-playbook playbooks/$hostname.yml. This playbook is the one regularity used for administrating the server and is entirely idempotent.

When adding a new machine you should also deploy our SSH known_hosts file and update the SSH hostkeys file in this git repo. For this you can simply run the playbooks/tasks/sync-ssh-hostkeys.yml playbook and commit the changes it makes to this git repository. It will also deploy any new SSH host keys to all our machines.

Note about GPG keys

The root_access.yml file contains the vault_default_pgpkeys variable which determines the users that have access to the default vault, as well as the borg backup keys. A separate super vault exists for storing highly sensitive secrets like Hetzner credentials; access to the super vault is controlled by the vault_super_pgpkeys variable.

All the keys should be on the local user gpg keyring and at minimum be locally signed with --lsign-key (or if you use TOFU, have --tofu-policy good). This is necessary for running any of the reencrypt-vault-default-key, reencrypt-vault-super-key or fetch-borg-keys tasks.

Note about packer

We use packer to build snapshots on hcloud to use as server base images. In order to use this, you need to install packer and then run

packer build -var $(misc/get_key.py misc/vaults/vault_hetzner.yml hetzner_cloud_api_key --format env) packer/archlinux.pkr.hcl

This will take some time after which a new snapshot will have been created on the primary hcloud archlinux project.

For the sandbox project please run

packer build -var $(misc/get_key.py misc/vaults/vault_hetzner.yml hetzner_cloud_sandbox_infrastructure_api_key --format env | sed 's/_sandbox_infrastructure//') -var install_ec2_public_keys_service=true packer/archlinux.pkr.hcl

Note about terraform

We use terraform in two ways:

To provision a part of the infrastructure on hcloud (and possibly other service providers in the future)
To declaratively configure applications

For both of these, we have set up a separate terraform script. The reason for that is that sadly terraform can't have providers depend on other providers so we can't declaratively state that we want to configure software on a server which itself needs to be provisioned first. Therefore, we use a two-stage process. Generally speaking, scenario 1. is configured in tf-stage1 and 2. is in tf-stage2. Maybe in the future, we can just have a single terraform script for everything but for the time being, this is what we're stuck with.

The very first time you run terraform on your system, you'll have to init it:

cd tf-stage1  # and also tf-stage2
terraform init -backend-config="conn_str=postgres://terraform:$(../misc/get_key.py ../group_vars/all/vault_terraform.yml vault_terraform_db_password)@state.archlinux.org?sslmode=verify-full"

After making changes to the infrastructure in tf-stage1/archlinux.tf, run

terraform plan

This will show you planned changes between the current infrastructure and the desired infrastructure. You can then run

terraform apply

to actually apply your changes.

The same applies to changed application configuration in which case you'd run it inside of tf-stage2 instead of tf-stage1.

We store terraform state on a special server that is the only hcloud server NOT managed by terraform so that we do not run into a chicken-egg problem. The state server is assumed to just exist so in an unlikely case where we have to entirely redo this infrastructure, the state server would have to be manually set up.

SMTP Configuration

All hosts should be relaying email through our primary mx host (currently 'mail.archlinux.org'). See docs/email.md for full details.

Putting a service in maintenance mode

Most web services with a nginx configuration, can be put into a maintenance mode, by running the playbook with a maintenance variable:

ansible-playbook -e maintenance=true playbooks/<playbook.yml>

This also works with a tag:

ansible-playbook -t <tag> -e maintenance=true playbooks/<playbook.yml>

As long as you pass the maintenance variable to the playbook run, the web service will stay in maintenance mode. As soon as you stop passing it on the command line and run the playbook again, the regular nginx configuration should resume and the service should accept requests by the end of the run.

Passing maintenance=false, will also prevent the regular nginx configuration from resuming, but will not put the service into maintenance mode.

Keep in mind that passing the maintenance variable to the whole playbook, without any tag, will make all the web services that have the maintenance mode in them, to be put in maintenance mode. Use tags to affect only the services you want.

Documentation on how to add the maintenance mode to a web service is inside docs/maintenance.md.

Finding servers requiring security updates

Arch-audit can be used to find servers in need of updates for security issues.

ansible all -a "arch-audit -u"

Updating servers

The following steps should be used to update our managed servers:

pacman -Syu
sync
checkservices
reboot

Semi-automated server updates (experimental)

For updating a lot of servers in a more unattended manner, the following playbook can be used:

ansible-playbook playbooks/tasks/upgrade-servers.yml [-l SUBSET]

It runs pacman -Syu on the targeted hosts in batches and then reboots them. If any server fails to reboot successfully, the rolling update stops and further batches are cancelled. To display the packages updated on each host, you can pass the --diff option to ansible-playbook.

Using this update method, .pacnew files are left unmerged which is OK for most configuration files that are managed by Ansible. However, care must be taken with updates that require manual intervention (e.g. major PostgreSQL releases).

Servers

This section has been moved to docs/servers.md.

Ansible repo workflows

Fetching the borg keys for local storage

Make sure you have all the GPG keys at least locally signed
Run the playbooks/tasks/fetch-borg-keys.yml playbook
Make sure the playbook runs successfully and check the keys under the borg-keys directory

Re-encrypting the vaults after adding a new PGP key

Follow the instructions in group_vars/all/root_access.yml.

Changing the vault password on encrypted files

See docs/vault-rekeying.md.

Backup documentation

We use BorgBackup for all of our backup needs. We have a primary backup storage as well as an additional offsite backup.

See docs/backups.md for detailed backup information.

Updating Gitlab

Our Gitlab installation uses Omnibus to run Gitlab on Docker. Updating Gitlab is as simple as running the ansible gitlab playbook:

ansible-playbook playbooks/gitlab.archlinux.org.yml --diff -t gitlab

To view the current Gitlab version visit this url

One-shots

A bunch of once-only admin task scripts can be found in one-shots/. We try to minimize the amount of manual one-shot admin work we have to do but sometimes for some migrations it might be necessary to have such scripts.