2023-01-30 22:48:23 +01:00
|
|
|
% =========================================================================== %
|
|
|
|
% Encoding: UTF-8 (žluťoučký kůň úpěl ďábelšké ódy)
|
|
|
|
% =========================================================================== %
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\vspace*{\fill}
|
|
|
|
\begin{center}
|
|
|
|
\Large
|
|
|
|
\textit{This is a document draft.}
|
|
|
|
\end{center}
|
|
|
|
\vspace*{\fill}
|
|
|
|
\newpage
|
|
|
|
|
2023-01-30 22:48:23 +01:00
|
|
|
% =========================================================================== %
|
|
|
|
\nn{Introduction}
|
2023-05-19 18:24:17 +02:00
|
|
|
Introduce the goals and the methods attempted to achieve the goals.
|
2023-01-30 22:48:23 +01:00
|
|
|
|
|
|
|
% =========================================================================== %
|
|
|
|
\part{Theoretical part}
|
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{1}{Purpose}
|
|
|
|
What this write-up is attempting to achieve.
|
|
|
|
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{1}{Terminology}
|
|
|
|
|
|
|
|
\n{2}{Linux}
|
|
|
|
|
|
|
|
The term \emph{Linux} is exclusively used in the meaning of the
|
|
|
|
Linux kernel~\cite{linux}.
|
|
|
|
|
2023-05-19 21:40:42 +02:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{2}{GNU/Linux}
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
As far as a Linux-based operating system is concerned, the term ``GNU/Linux''
|
|
|
|
as defined by the Free Software Foundation~\cite{fsfgnulinux} is used. While it
|
|
|
|
is longer and arguably a little bit cumbersome, the author aligns with the
|
|
|
|
opinion that this term more correctly describes its actual target. Being aware
|
|
|
|
there are many people that conflate the complete operating system with its (be
|
|
|
|
it core) component, the kernel, the author is taking care to distinguish the
|
|
|
|
two, although writing from experience, colloquially, this probably brings more
|
2023-05-19 18:24:17 +02:00
|
|
|
confusion and a lengthy explanation is usually required.
|
|
|
|
|
2023-05-19 21:40:42 +02:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{2}{Containers}
|
|
|
|
|
|
|
|
When the concept of \emph{containerisation} and \emph{containers} is mentioned
|
|
|
|
throughout this work, the author has OCI containers~\cite{ocicontainers} in
|
|
|
|
mind, which is broadly a superset of \emph{Linux Containers} where some set of
|
|
|
|
processes is presented with a view of kernel resources (there are multiple
|
|
|
|
kinds of resources, such as IPC queues; network devices, stacks, ports; mount
|
|
|
|
points, process IDs, user and group IDs, Cgroups and others) that differs for
|
|
|
|
each different set of processes, similar in thought to FreeBSD
|
|
|
|
\emph{jails}~\cite{freebsdjails} with the distingction being that they are, of
|
|
|
|
course, facilitated by the Linux kernel namespace
|
|
|
|
functionality~\cite{linuxnamespaces}, which is in turn be regarded to be
|
|
|
|
\emph{inspired} by Plan 9's namespaces~\cite{plan9namespaces}, Plan 9 being a
|
|
|
|
Bell Labs successor to Unix 8th Edition, discontinued in 2015.
|
|
|
|
|
|
|
|
While there without a doubt \emph{is} specificity bound to using each of the
|
|
|
|
tools that enable creating (Podman vs.\ Buildah vs.\ Docker BuildX) or running
|
|
|
|
(ContainerD vs.\ runC vs.\ crun) container images, when describing an action
|
|
|
|
that gets performed with or onto a container, the process should generally be
|
|
|
|
explained in such a way that it is repeatable using any spec-conforming tool
|
|
|
|
that is available and \emph{intended for the job}.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-22 22:35:07 +02:00
|
|
|
\n{2}{The program}
|
|
|
|
|
|
|
|
By \emph{the program} or \emph{the application} without any additional context
|
|
|
|
the author usually means the Password Compromise Monitoring Tool program.
|
|
|
|
|
2023-05-19 21:40:42 +02:00
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{1}{Cryptography primer}\label{sec:cryptographyprimer}
|
2023-05-19 18:24:17 +02:00
|
|
|
Pre-requisites necessary for following up.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-22 22:34:53 +02:00
|
|
|
\n{2}{Encryption}
|
|
|
|
|
|
|
|
\n{3}{Symmetric cryptography}
|
|
|
|
|
|
|
|
\n{3}{Asymmetric cryptography}
|
|
|
|
|
|
|
|
\n{3}{The key exchange problem}
|
|
|
|
|
|
|
|
\n{3}{The key protection problem}
|
|
|
|
|
|
|
|
\n{3}{TLS}\label{sec:tls}
|
|
|
|
|
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{2}{Hash functions}
|
2023-05-19 18:24:17 +02:00
|
|
|
Explanation. What are hash functions
|
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{3}{Uses and \textit{mis}uses}
|
2023-05-19 23:55:38 +02:00
|
|
|
The good, the bad and the ugly of hash usage (including or in some cases
|
2023-01-31 04:11:49 +01:00
|
|
|
excluding salting, weak hashes, split hashes (Microsoft)).
|
2023-05-19 18:24:17 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{3}{Threats to hashes}
|
2023-05-19 18:24:17 +02:00
|
|
|
Rainbow tables, broken hash functions\ldots
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{1}{Brief passwords history}\label{sec:history}
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
\n{2}{Purpose over time}
|
|
|
|
|
|
|
|
\n{2}{What is considered a password}
|
|
|
|
|
|
|
|
\n{2}{Problems with passwords}
|
|
|
|
\n{3}{Arbitrary length requirements (min/max)}
|
|
|
|
\n{3}{Arbitrary complexity requirements}
|
|
|
|
\n{3}{Restricting special characters}
|
2023-05-19 23:55:38 +02:00
|
|
|
Service providers have too often been found forbidding the use of so called
|
2023-05-19 18:24:17 +02:00
|
|
|
\textit{special characters} in passwords for as long as passwords have been
|
|
|
|
used to protect privileged access. Ways of achieving the same may vary but the
|
|
|
|
intent stays the same: prevent users from inputting characters into the system,
|
|
|
|
which the system cannot comfortably handle, for one reason or another.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
|
|
|
|
\n{1}{Password strength validation}
|
|
|
|
Entropy, dictionaries, multiple factors.
|
|
|
|
|
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{1}{Web security}\label{sec:websecurity}
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
The internet, being the vast space of intertwined concepts and ideas, is a
|
2023-05-23 00:03:47 +02:00
|
|
|
superset of the Web, since not everything that is available on internet can be
|
|
|
|
described as web \emph{resources}. But precisely that is the part of the
|
|
|
|
internet that is discussed in the next sections and covers what browsers are,
|
|
|
|
what they do and how they relate to web security.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
|
2023-05-23 00:03:47 +02:00
|
|
|
\n{2}{Browsers}\label{sec:browsers}
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
TODO: describe how browsers find out where the web page lives, get a webpage,
|
|
|
|
parse it, parse stylesheets, run scripts, apply SAMEORIGIN restrictions etc.
|
2023-05-22 22:35:57 +02:00
|
|
|
TODO: (privileged process running untrusted code on user's computer), history,
|
|
|
|
present, security focus of the development teams, user facing signalling
|
|
|
|
(padlock colours, scary warnings).
|
|
|
|
|
2023-05-23 00:03:47 +02:00
|
|
|
Browsers, sometimes used together with the word that can serve as a real tell
|
2023-05-23 18:14:59 +02:00
|
|
|
for their specialisation - \emph{web} browsers - are programs intended for
|
2023-05-23 00:03:47 +02:00
|
|
|
\emph{browsing} of \emph{the web}. In more technical terms, browsers are
|
|
|
|
programs that facilitate (directly or via intermediary tools) domain name
|
|
|
|
lookups, connecting to web servers, optionally establishing a secure
|
|
|
|
connection, requesting the web page in question, determining its \emph{security
|
|
|
|
policy} and resolving what accompanying resources the web page specifies and
|
|
|
|
depending on the applicable security policy, requesting those from their
|
|
|
|
respective origins, applying stylesheets and running scripts. Constructing a
|
|
|
|
program that can speak many protocols, securely runs untrusted code from the
|
|
|
|
internet is no easy task.
|
|
|
|
|
|
|
|
\n{3}{Complexity}
|
|
|
|
|
|
|
|
Browsers these days are also quite ubiquitous programs running on
|
|
|
|
\emph{billions} of consumer-grade mobile devices (which are also notorious for
|
|
|
|
bad update hygiene) or desktop devices all over the world. Regular users
|
|
|
|
usually expect them to work flawlessly with a multitude of network conditions,
|
|
|
|
network scenarios (café WiFi, cellular data in a remote location, home
|
|
|
|
broadband that is DNS-poisoned by the ISP), differently tuned (or commonly
|
|
|
|
misconfigured) web servers, a combination of modern and \emph{legacy}
|
|
|
|
encryption schemes and different levels of conformance to web standards from
|
|
|
|
both web server and website developers. Of course, if a website is broken, it
|
|
|
|
is the browser's fault. Browsers are expected to detect if \emph{captive
|
|
|
|
portals} (a type of access control that usually tries to force the user through
|
|
|
|
a webpage with terms of use) are active and offer redirects. All of this is
|
|
|
|
immense complexity and the combination of ubiquity and great exposure this type
|
|
|
|
of software gets is in the authors opinion the cause behind a staggering amount
|
|
|
|
of vulnerabilities found, reported and fixed in browsers every year.
|
|
|
|
|
|
|
|
\n{3}{Standardisation}
|
|
|
|
|
|
|
|
Over the years, a consortium of parties interested in promoting and developing
|
|
|
|
the web (also due to its potential as a digital marketplace, i.e.\ financial
|
|
|
|
incentives) and browser vendors (of which the most neutral participant is
|
|
|
|
perhaps \emph{Mozilla}, with Chrome being run by Google, Edge by Microsoft and
|
|
|
|
Safari/Webkit by Apple) has evolved a great volume of web standards, which are
|
|
|
|
also relatively frequently getting updated or deprecated and replaced by
|
|
|
|
revised or new ones, rendering the browser maintenance task into essentially a
|
|
|
|
cat-and-mouse game.
|
|
|
|
|
|
|
|
It is the web's extensibility that enabled this build-up and ironically has
|
|
|
|
been proclaimed by some to be its greatest asset. It has also been ostensibly
|
|
|
|
been criticised~\cite{ddvweb} in the past and the frustration with the status
|
|
|
|
quo of web standards has relatively recently prompted a group of people to even
|
|
|
|
create ``\textit{a new application-level internet protocol for the distribution
|
|
|
|
of arbitrary files, with some special consideration for serving a lightweight
|
|
|
|
hypertext format which facilitates linking between files}'':
|
|
|
|
Gemini~\cite{gemini}\cite{geminispec} that in the words of its authors can be
|
|
|
|
thought of as ``\textit{the web, stripped right back to its essence}'' or as
|
|
|
|
``\textit{Gopher, souped up and modernised just a little}'', depending upon the
|
|
|
|
reader's perspective, noting that the latter view is probably more accurate.
|
|
|
|
|
|
|
|
\n{3}{HTTP}
|
|
|
|
|
|
|
|
Originally, HTTP was also designed just for fetching hypertext
|
|
|
|
\emph{resources}, but it has evolved since then, particularly due to its
|
|
|
|
extensibility, to allow for fetching of all sorts of web resources a modern
|
|
|
|
website of today provides, such as scripts or images, or even to \emph{post}
|
|
|
|
content back to servers.
|
|
|
|
|
|
|
|
HTTP relies on TCP (Transmission Control Protocol), which is one of the
|
|
|
|
\emph{reliable} (mandated by HTTP) protocols used to send data across
|
|
|
|
contemporary IP (Internet Protocol) networks, to deliver the data it requests
|
|
|
|
or sends. When Tim Berners-Lee invented the World Wide Web (WWW) in 1989 while
|
|
|
|
working at CERN (The European Organization for Nuclear Research) with a rather
|
|
|
|
noble intent as a ``\emph{wide-area hypermedia information retrieval initiative
|
|
|
|
to give universal access to a large universe of documents}''~\cite{wwwf}, he
|
|
|
|
also invented the HyperText Markup Language (HTML) to serve as a formatting
|
|
|
|
method for these new hypermedia documents. The first website was written
|
|
|
|
roughly the same way as today's websites are, using HTML, although the markup
|
|
|
|
language has changed since, with the current version being HTML5.
|
|
|
|
|
|
|
|
It has been mentioned that the client \textbf{requests} a \textbf{resource} and
|
|
|
|
receives a \textbf{response}, so those terms should probably be defined.
|
|
|
|
|
|
|
|
A request is what the client sends to the server. A resource is what it
|
|
|
|
requests and a response is the answer provided by the server.
|
|
|
|
|
|
|
|
HTTP follows a classic client-server model whereby it is \textbf{always} the
|
|
|
|
client that initiates the request.
|
|
|
|
|
|
|
|
A web page is, to be blunt, a chunk of \emph{hypertext}. To display a web page,
|
|
|
|
a browser first needs to send a request to fetch the HTML representing the
|
|
|
|
page, which is then parsed and additional requests for sub-resources are made.
|
|
|
|
If a page defines a layout information in the form of CSS, that is parsed as
|
|
|
|
well.
|
|
|
|
|
|
|
|
A web page needs to be present on the local computer first \emph{before} it can
|
|
|
|
be parsed by the browser, and since websites are usually still served by
|
|
|
|
programs called \emph{web servers} as in the \emph{early days}, that presents a
|
2023-05-24 16:50:22 +02:00
|
|
|
problem of how tell the browser where the resource should be fetched from. In
|
2023-05-23 00:03:47 +02:00
|
|
|
today's browsers, the issue is sorted (short of the CLI) by the \emph{address
|
|
|
|
bar}, a place into which user types what they wish the browser to fetch for
|
|
|
|
them.
|
|
|
|
|
|
|
|
The formal name of this segment is a \emph{Universal Resource Locator}, or URL,
|
|
|
|
and it contains the schema (or the protocol, such as \texttt{http://}), the
|
|
|
|
host address or a domain name and a (TCP) port number.
|
|
|
|
|
|
|
|
Since a TCP connection needs to be established first, to connect to a server
|
|
|
|
whose only URL contains a domain name, the browser needs to perform a domain
|
|
|
|
name \emph{lookup} using system facilities, or as was the case for a couple of
|
|
|
|
notorious Chromium versions, send some additional and unrelated queries which
|
|
|
|
(with Chromium-based derivatives' numbers) ended up placing unnecessary load
|
|
|
|
directly at the root DNS servers~\cite{chromiumrootdns}.
|
|
|
|
|
|
|
|
If a raw IP address+port combination is used, the browser attempts to connect
|
|
|
|
to it directly and requests the user-requested page by default using the
|
|
|
|
\texttt{GET} \emph{method}. A \emph{well-known} HTTP port 80 is assumed unless
|
|
|
|
other port is explicitly specified and it can be omitted both if host is a
|
|
|
|
domain name or an IP address.
|
|
|
|
|
|
|
|
The method is a way for the user-agent to define what operation it wants to
|
|
|
|
perform. \texttt{GET} is used for fetching resources while \texttt{POST} is
|
|
|
|
used to send data to the server, such as to post the values of an HTML form.
|
|
|
|
|
|
|
|
A server response is comprised of a \textbf{status code}, a status message,
|
|
|
|
HTTP \textbf{headers} and an optional \textbf{body} containing the content. The
|
|
|
|
status code indicates if the original request was successful or not and the
|
|
|
|
browser is generally there to interpret these status codes to the user. There
|
|
|
|
is enough status codes to be confused by the sheer numbers but luckily, there
|
|
|
|
is a method to the madness and they can be divided into groups/classes:
|
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item 1xx: Informational responses
|
|
|
|
\item 2xx: Successful responses
|
|
|
|
\item 3xx: Redirection responses
|
|
|
|
\item 4xx: Client error responses
|
|
|
|
\item 5xx: Server error responses
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
In case the \emph{user agent} (a web \emph{client}) such as a browser receives
|
|
|
|
a response with content, it has to parse it.
|
|
|
|
|
|
|
|
A header is additional information sent by both the server and the client.
|
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{2}{Cross-site scripting}\label{sec:xss}
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{2}{Content Security Policy}\label{sec:csp}
|
2023-05-23 00:03:47 +02:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
Content Security Policy has been an important addition to the arsenal of
|
2023-05-23 00:03:47 +02:00
|
|
|
website operators, even though not everybody has necessarily been utilising it
|
|
|
|
properly or even taken notice. To understand what guarantees it provides and
|
2023-05-19 18:24:17 +02:00
|
|
|
what kind of protections it employs, it is first necessary to grok how websites
|
2023-05-23 00:03:47 +02:00
|
|
|
are parsed and displayed, which has been discussed in depth in previous
|
|
|
|
sections.
|
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-18 11:22:12 +02:00
|
|
|
\n{1}{Sandboxing}\label{sec:sandboxing}
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{2}{User isolation}
|
|
|
|
Admin vs regular user, privilege escalation, least-privilege principle,
|
|
|
|
zero-trust principle.
|
|
|
|
\n{2}{Process isolation}
|
|
|
|
Sandbox escape.
|
|
|
|
\n{2}{Namespaced isolation}
|
|
|
|
Sandbox escape.
|
|
|
|
|
|
|
|
|
|
|
|
\n{1}{Data storage}
|
|
|
|
Among the key aspects of any security-minded system (application), the
|
|
|
|
following are certain to make the count:
|
|
|
|
\begin{enumerate}
|
|
|
|
\item data integrity
|
|
|
|
\item data authenticity
|
|
|
|
\item data confidentiality
|
|
|
|
\end{enumerate}
|
|
|
|
|
|
|
|
\n{2}{Integrity}
|
|
|
|
|
|
|
|
\n{2}{Authenticity}
|
|
|
|
|
|
|
|
\n{2}{Confidentiality}
|
|
|
|
|
|
|
|
\n{2}{Encryption-at-rest}
|
|
|
|
|
|
|
|
|
|
|
|
\n{1}{Compromise checking and prevention}
|
|
|
|
|
|
|
|
\n{2}{HIBP and similar tools}
|
|
|
|
|
|
|
|
\n{2}{OWASP Top 10 for the implementers}
|
|
|
|
|
|
|
|
\n{2}{Password best practices}
|
|
|
|
|
|
|
|
|
2023-01-30 22:48:23 +01:00
|
|
|
% =========================================================================== %
|
|
|
|
\part{Practical part}
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{1}{Kudos}
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\textbf{Disclaimer:} the author is not affiliated in any way with any of the
|
|
|
|
projects described on this page.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
The \textit{Password Compromise Monitoring Tool} (\texttt{pcmt}) program has
|
|
|
|
been developed using and utilising a great deal of free (as in Freedom) and
|
|
|
|
open-source software in the process, either directly or as an outstanding work
|
|
|
|
tool, and the author would like to take this opportunity to recognise that
|
|
|
|
fact.
|
|
|
|
|
|
|
|
In particular, the author acknowledges that this work would not be the same
|
|
|
|
without:
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item vim (\url{https://www.vim.org/})
|
|
|
|
\item Arch Linux (\url{https://archlinux.org/})
|
|
|
|
\item ZSH (\url{https://www.zsh.org/})
|
|
|
|
\item kitty (\url{https://sw.kovidgoyal.net/kitty/})
|
|
|
|
\item Nix (\url{https://nixos.org/explore.html})
|
|
|
|
\item pre-commit (\url{https://pre-commit.com/})
|
|
|
|
\item Podman (\url{https://podman.io/})
|
|
|
|
\item Go (\url{https://go.dev/})
|
|
|
|
\end{itemize}
|
|
|
|
|
|
|
|
All of the code written has been typed into VIM (\texttt{9.0}), the shell used
|
|
|
|
to run the commands was ZSH, both running in the author's terminal emulator of
|
2023-05-19 23:55:38 +02:00
|
|
|
choice - \texttt{kitty} on a \raisebox{.8ex}{\texttildelow}8 month (at the time
|
|
|
|
of writing) installation of \textit{Arch Linux (by the way)} using a
|
2023-05-24 16:47:18 +02:00
|
|
|
\texttt{6.3.x-wanderer-zfs-xanmod1} variant of the Linux kernel.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
2023-05-19 21:40:42 +02:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{1}{Development}
|
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
The source code of the project was being versioned since the start using the
|
|
|
|
popular and industry-standard git (\url{https://git-scm.com}) source code
|
|
|
|
management (SCM) tool. Commits were made frequently and, if at all possible,
|
|
|
|
for small and self-contained changes of code, trying to follow sane commit
|
|
|
|
message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted commit
|
|
|
|
messages. The name of the default branch is \texttt{development}, since that is
|
|
|
|
what the author likes to choose for new projects that are not yet stable (it is
|
|
|
|
in fact the default in author's \texttt{.gitconfig}).
|
|
|
|
|
|
|
|
|
|
|
|
\n{2}{Commit signing}
|
|
|
|
|
|
|
|
Since git allows cryptographically \emph{singing} all commits, it would be
|
|
|
|
unwise not to take advantage of this. For the longest time, GPG was the only
|
|
|
|
method available for signing commits in git, however, that is no longer
|
|
|
|
applicable~\cite{agwagitssh}. These days, it is also possible to both sign and
|
|
|
|
verify one's git commits (and tags!) using SSH keys, namely those produced by
|
|
|
|
OpenSSH (the same ones that can be used to log in to remote systems). The
|
|
|
|
author has, of course, not reused the same key pair that is used to connect to
|
|
|
|
machines for signing commits. A different, \texttt{Ed25519} elliptic curve key
|
|
|
|
pair has been used specifically for signing. A public component of this key is
|
2023-05-24 16:49:56 +02:00
|
|
|
enclosed in this thesis as Appendix~\ref{appendix:signingkey} for future
|
|
|
|
reference.
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
The validity of a signature on a particular commit can be viewed with git using
|
|
|
|
the following commands (the \% sign denotes the shell prompt):
|
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\begin{varwidth}{\linewidth}
|
|
|
|
\begin{verbatim}
|
|
|
|
% cd <cloned project dir>
|
|
|
|
% git show --show-signature <commit>
|
|
|
|
% # alternatively:
|
|
|
|
% git verify-commit <commit>
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
\caption{Verifying signature of a git commit}
|
|
|
|
\label{fig:gitverif}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
There is one caveat to this though, git first needs some additional
|
|
|
|
configuration for the code in Figure~\ref{fig:gitverif} to work as one would
|
|
|
|
expect. Namely that the public key used to verify the signature needs to be
|
|
|
|
stored in git's ``allowed signers file'', then git needs to be told where that
|
|
|
|
file is using the configuration value \texttt{gpg.ssh.allowedsignersfile} and
|
|
|
|
finally the configuration value of the \texttt{gpg.format} field needs to be
|
|
|
|
set to \texttt{ssh}.
|
|
|
|
|
|
|
|
Since git allows the configuration values to be local to each repository, both
|
|
|
|
of the mentioned issues can be solved by running the following commands from
|
|
|
|
inside of the cloned repository:
|
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\begin{varwidth}{\linewidth}
|
|
|
|
\scriptsize
|
|
|
|
\begin{verbatim}
|
|
|
|
% # set the signature format for the local repository.
|
|
|
|
% git config --local gpg.format ssh
|
|
|
|
% # save the public key.
|
|
|
|
% cat >./tmp/.allowed_signers \
|
|
|
|
<<<'leo ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKwshTdBgLzwY4d8N7VainZCngH88OwvPGhZ6bm87rBO'
|
|
|
|
% # set the allowed signers file path for the local repository.
|
|
|
|
% git config --local gpg.ssh.allowedsignersfile=./tmp/.allowed_signers
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
\caption{Prepare allowed signers file and signature format for git}
|
|
|
|
\label{fig:gitsshprep}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
After the code in Figure~\ref{fig:gitsshprep} is run, everything from the
|
|
|
|
Figure~\ref{fig:gitverif} should remain applicable for the lifetime of the
|
2023-05-24 16:47:18 +02:00
|
|
|
repository or until git changes implementation of signature verification. The
|
|
|
|
git \texttt{user.name} that can be seen on the commits in the \textbf{Author}
|
|
|
|
field is named after the machine that was used to develop the program, since
|
|
|
|
the author uses different signing keys on each machine. That way the committer
|
|
|
|
machine can be determined post-hoc.
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
For future reference, git has been used in the version \texttt{git version
|
|
|
|
2.40.1}.
|
|
|
|
|
|
|
|
|
|
|
|
\n{2}{Continuous Integration}
|
|
|
|
|
|
|
|
To increase both the author's and public confidence in the atomic changes made
|
|
|
|
over time, it was attempted to thoroughly \emph{integrate} them using a
|
|
|
|
continuous integration (CI) service that was plugged into the main source code
|
|
|
|
repository since the early stages of development. This, of course, was again
|
|
|
|
self-hosted, including the workers. The tool of choice there was Drone
|
|
|
|
(\url{https://drone.io}) and the ``docker'' runner (in fact it runs any OCI
|
|
|
|
container) was used to run the builds.
|
|
|
|
|
|
|
|
The way this runner works is it creates an ephemeral container for every
|
|
|
|
pipeline step and executes given \emph{commands} inside of it. At the end of
|
|
|
|
each step the container is discarded, while the repository, which is mounted
|
|
|
|
into each container's \texttt{/drone/src} is persisted between steps, allowing
|
|
|
|
it to be cloned only from \emph{origin} only at the start of the pipeline and
|
|
|
|
then shared for all of the following steps, saving bandwidth, time and disk
|
|
|
|
writes.
|
|
|
|
|
|
|
|
The entire configuration used to run the pipelines can be found in a file named
|
|
|
|
\texttt{.drone.yml} at the root of the main source code repository. The
|
2023-05-23 18:14:59 +02:00
|
|
|
workflow consists of four pipelines, which are run in parallel. Two main
|
|
|
|
pipelines are defined to build the frontend assets, the \texttt{pcmt} binary
|
|
|
|
and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Arch and
|
|
|
|
Alpine (version 3.17). These the two pipelines are identical apart from
|
|
|
|
OS-specific bits such as installing a certain package, etc. For the record,
|
|
|
|
other OS-architecture combinations were not tested.
|
|
|
|
|
|
|
|
A third pipeline contains instructions to build a popular static analysis tool
|
|
|
|
called \texttt{golangci-lint}, which is sort of a meta-linter, bundling a
|
|
|
|
staggering amount of linters (linter is a tool that performs static code
|
|
|
|
analysis and can raise awareness of programming errors, flag potentially buggy
|
|
|
|
code constructs, or \emph{mere} stylistic errors) - from sources and then
|
|
|
|
perform the analysis of project's codebase using the freshly built binary. If
|
|
|
|
the result of this step is successful, a handful of code analysis services get
|
|
|
|
pinged in the next steps to take notice of the changes to project's source code
|
|
|
|
and update their metrics, details can be found in the main Drone configuration
|
|
|
|
file \texttt{.drone.yml} and the configuration for the \texttt{golangci-lint}
|
|
|
|
tool itself (what linters are enabled/disabled and with whats settings) can be
|
2023-05-19 23:55:38 +02:00
|
|
|
found in the root of the repository in the file named \texttt{.golangci.yml}.
|
2023-05-23 18:14:59 +02:00
|
|
|
|
|
|
|
The fourth pipeline focuses on linting the Containerfile and building the
|
|
|
|
container, although the latter action is only performed on feature branches,
|
|
|
|
\emph{pull requests} or \emph{tag} events.
|
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
The median build time as of writing was 1 minute, which includes running all
|
2023-05-23 18:14:59 +02:00
|
|
|
four pipelines, and that is acceptable. Build times might of course vary
|
2023-05-24 16:47:18 +02:00
|
|
|
depending on the hardware, for reference, these builds were being run on a
|
|
|
|
machine equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4
|
|
|
|
3200MHz RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS)
|
|
|
|
and a 400MiB downlink, software-wise running Arch with an author-flavoured
|
|
|
|
Xanmod kernel version 6.3.x.
|
2023-05-19 23:55:38 +02:00
|
|
|
|
2023-05-23 18:14:59 +02:00
|
|
|
\obr{Drone CI median build
|
|
|
|
time}{fig:drone-median-build}{.77}{graphics/drone-median-build}
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
|
|
|
|
\n{2}{Source code repositories}\label{sec:repos}
|
|
|
|
|
|
|
|
All of the pertaining source code was published in repositories on a publicly
|
|
|
|
available git server operated by the author, the reasoning \emph{pro}
|
|
|
|
self-hosting being that it is the preferred way of guaranteed autonomy over
|
|
|
|
one's source code, as opposed to large silos owned by big corporations having a
|
2023-05-24 16:47:18 +02:00
|
|
|
track record of arguably not always deciding with user's best interest in mind
|
|
|
|
(although recourse has been observed~\cite{ytdl}), acting on impulse or under
|
|
|
|
public pressure (potentially at least temporarily disrupting their user's
|
|
|
|
operations), thus not only beholding their user to their lengthy \emph{terms of
|
|
|
|
service} that \emph{can change at any time}, but also factors outside their
|
|
|
|
control. Granted, decentralisation can take a toll on discoverability of the
|
|
|
|
project, but that is not a concern here.
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
The git repository containing source code of the \texttt{pcmt} project:\\
|
|
|
|
\url{https://git.dotya.ml/mirre-mt/pcmt.git}.
|
|
|
|
|
|
|
|
The git repository hosting the \texttt{pcmt} configuration schema:\\
|
|
|
|
\url{https://git.dotya.ml/mirre-mt/pcmt-config-schema.git}.
|
|
|
|
|
|
|
|
The repository containing the \LaTeX{} source code of this thesis:\\
|
|
|
|
\url{https://git.dotya.ml/mirre-mt/masters-thesis.git}.
|
|
|
|
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{2}{Toolchain}
|
|
|
|
|
|
|
|
Throughout the creation of this work, the \emph{current} version of the Go
|
|
|
|
programming language was used, i.e. \texttt{go1.20}.
|
|
|
|
|
|
|
|
\tab{Tool/Library-Usage Matrix}{tab:toolchain}{1.0}{ll}{
|
|
|
|
\textbf{Name} & \textbf{Usage} \\
|
|
|
|
Go programming language & program core \\
|
|
|
|
Dhall configuration language & program configuration \\
|
|
|
|
Echo & HTTP handlers, controllers, web server \\
|
2023-05-23 19:06:46 +02:00
|
|
|
ent & ORM using graph-based modelling \\
|
2023-05-19 18:24:17 +02:00
|
|
|
bluemonday & HTML sanitising \\
|
2023-05-24 16:48:03 +02:00
|
|
|
TailwindCSS & stylesheets using a utility-first approach \\
|
|
|
|
PostgreSQL & storing data \\
|
2023-05-19 18:24:17 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
\tab{Dependency-Version Matrix}{tab:depsversionmx}{1.0}{ll}{
|
|
|
|
\textbf{Name} & \textbf{version} \\
|
|
|
|
\texttt{echo} (\url{https://echo.labstack.com/}) & 4.10.2 \\
|
|
|
|
\texttt{go-dhall} (\url{https://github.com/philandstuff/dhall-golang}) & 6.0.2\\
|
|
|
|
\texttt{ent} (\url{https://entgo.io/}) & 0.11.10 \\
|
|
|
|
\texttt{bluemonday} (\url{https://github.com/microcosm-cc/bluemonday}) & 1.0.23 \\
|
|
|
|
\texttt{tailwindcss} (\url{https://tailwindcss.com/}) & 3.3.0 \\
|
2023-05-24 16:48:03 +02:00
|
|
|
\texttt{PostgreSQL} (\url{https://www.postgresql.org/}) & 15.2 \\
|
2023-05-19 18:24:17 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
\n{2}{A word about Go}
|
|
|
|
First, a question of \textit{`Why pick Go for building a web
|
|
|
|
application?'} might arise, so the following few lines will try to address
|
|
|
|
that.
|
|
|
|
|
2023-05-23 19:34:56 +02:00
|
|
|
Go~\cite{golang} (or \emph{Golang} for SEO-friendliness) is a strongly typed,
|
|
|
|
high-level \emph{garbage collected}, language where functions are first-class
|
|
|
|
citizens and errors are values.
|
|
|
|
|
|
|
|
The appeal for the author comes from a number of features of the language, such
|
|
|
|
as built-in support for concurrency and unit testing, sane \emph{zero} values,
|
|
|
|
lack of pointer arithmetic, inheritance and implicit type conversions,
|
|
|
|
easy-to-read syntax, producing a statically linked binary by default, etc., on
|
|
|
|
top of that, the language has got a cute mascot.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
Due to the foresight of the authors of the Go Authors regarding \emph{the
|
2023-05-19 21:40:42 +02:00
|
|
|
formatting question} (i.e.\ where to put the braces, tabs vs.\ spaces, etc.),
|
2023-05-19 18:24:17 +02:00
|
|
|
most of the discussions on this topic have been foregone. Every
|
|
|
|
\emph{gopher}~\footnote{euph.\ a person writing in the Go programming language}
|
|
|
|
is expected to format their source code with the official formatter
|
|
|
|
(\texttt{gofmt}), which automatically ensures the code adheres to the official
|
|
|
|
formatting standards.
|
|
|
|
|
|
|
|
\n{2}{A word about Nix}
|
2023-01-31 04:11:49 +01:00
|
|
|
\url{https://builtwithnix.org/}
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
|
|
|
|
\n{1}{Application architecture}
|
|
|
|
|
|
|
|
\n{1}{Implementation}
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{2}{Configuration}
|
|
|
|
|
|
|
|
Every non-trivial program usually offers at least \emph{some} way to
|
2023-05-19 21:40:42 +02:00
|
|
|
tweak/manage its behaviour, and these changes are usually persisted
|
2023-05-19 18:24:17 +02:00
|
|
|
\emph{somewhere} on the filesystem of the host: in a local SQLite3 database, a
|
|
|
|
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
|
|
|
configuration file. These configuration files need to be read and checked at
|
|
|
|
least on program start-up and either stored into operating memory for the
|
|
|
|
duration of the runtime of the program, or loaded and parsed and the memory
|
|
|
|
subsequently \emph{freed} (initial configuration).
|
|
|
|
|
|
|
|
There is an abundance of configuration languages (or file formats used to craft
|
2023-05-24 16:47:18 +02:00
|
|
|
configuration files, whether they were intended for it or not) available, TOML,
|
|
|
|
INI, JSON, YAML, to name some of the popular ones (as of today).
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
Dhall stood out as a language that was designed with both security and the
|
|
|
|
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
|
|
|
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
2023-05-24 16:47:18 +02:00
|
|
|
few of its concepts from Haskell), and in its apparent core being very similar
|
|
|
|
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
|
|
|
is: ``a programmable configuration language that you can think of as: JSON +
|
|
|
|
functions + types + imports''~\cite{dhalllang}.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
Among all of the listed features, the especially intriguing one to the author
|
|
|
|
was the promise of \emph{types}. There are multiple examples directly on the
|
|
|
|
project's documentation webpage demonstrating for instance the declaration and
|
|
|
|
usage of custom types (that are, of course merely combinations of the primitive
|
|
|
|
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
|
|
|
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
|
|
|
designing a custom configuration \emph{schema} for the program.
|
|
|
|
Dhall not being a Turing-complete language also guarantees that evaluation
|
|
|
|
\emph{always} terminates eventually, which is a good attribute to possess as a
|
|
|
|
configuration language.
|
|
|
|
|
2023-05-19 23:55:38 +02:00
|
|
|
|
2023-05-23 18:15:21 +02:00
|
|
|
\n{3}{Dhall Schema}
|
2023-05-19 23:55:38 +02:00
|
|
|
|
|
|
|
The configuration schema was at first being developed as part of the main
|
|
|
|
project's repository, before it was determined that it would benefit both the
|
|
|
|
development and overall clarity if the schema lived in its own repository (see
|
2023-05-23 22:42:07 +02:00
|
|
|
Section~\ref{sec:repos} for details). This enabled it to be independently
|
|
|
|
developed and versioned and only pulled into the main application whenever it
|
|
|
|
is determined the application is ready for it.
|
|
|
|
|
|
|
|
The full schema with type annotations can be seen in
|
|
|
|
Figure~\ref{fig:dhallschema}. The \texttt{let} statement declares a variable
|
|
|
|
called \texttt{Schema} and assigns it the result of the expression on the right
|
|
|
|
side of the equals sign, which has for practical reasons been trimmed and is
|
|
|
|
displayed without the \emph{default} block, which is instead shown in its own
|
2023-05-24 02:16:53 +02:00
|
|
|
Figure~\ref{fig:dhallschemadefaults}.
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
\begin{figure}[!h]
|
2023-05-19 23:55:38 +02:00
|
|
|
\begin{varwidth}
|
|
|
|
\scriptsize
|
|
|
|
\begin{verbatim}
|
|
|
|
let Schema =
|
|
|
|
{ Type =
|
|
|
|
{ Host : Text
|
|
|
|
, Port : Natural
|
|
|
|
, HTTP :
|
|
|
|
{ Domain : Text
|
|
|
|
, Secure : Bool
|
|
|
|
, AutoTLS : Bool
|
|
|
|
, TLSKeyPath : Text
|
|
|
|
, TLSCertKeyPath : Text
|
|
|
|
, HSTSMaxAge : Natural
|
|
|
|
, ContentSecurityPolicy : Text
|
|
|
|
, RateLimit : Natural
|
|
|
|
, Gzip : Natural
|
|
|
|
, Timeout : Natural
|
|
|
|
}
|
|
|
|
, Mailer :
|
|
|
|
{ Enabled : Bool
|
|
|
|
, Protocol : Text
|
|
|
|
, SMTPAddr : Text
|
|
|
|
, SMTPPort : Natural
|
|
|
|
, ForceTrustServerCert : Bool
|
|
|
|
, EnableHELO : Bool
|
|
|
|
, HELOHostname : Text
|
|
|
|
, Auth : Text
|
|
|
|
, From : Text
|
|
|
|
, User : Text
|
|
|
|
, Password : Text
|
|
|
|
, SubjectPrefix : Text
|
|
|
|
, SendPlainText : Bool
|
|
|
|
}
|
|
|
|
, LiveMode : Bool
|
|
|
|
, DevelMode : Bool
|
|
|
|
, AppPath : Text
|
|
|
|
, Session :
|
|
|
|
{ CookieName : Text
|
|
|
|
, CookieAuthSecret : Text
|
|
|
|
, CookieEncrSecret : Text
|
|
|
|
, MaxAge : Natural
|
|
|
|
}
|
|
|
|
, Logger : { JSON : Bool, Fmt : Optional Text }
|
|
|
|
, Init : { CreateAdmin : Bool, AdminPassword : Text }
|
|
|
|
, Registration : { Allowed : Bool }
|
|
|
|
}
|
2023-05-23 22:42:07 +02:00
|
|
|
}
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
\caption{Dhall configuration schema version 0.0.1-rc.2}
|
|
|
|
\label{fig:dhallschema}
|
|
|
|
\end{figure}
|
2023-05-24 16:47:18 +02:00
|
|
|
|
|
|
|
The main configuration is comprised of both raw attributes and child records,
|
|
|
|
which allow for grouping of related functionality. For instance, configuration
|
|
|
|
settings pertaining mailserver setup are grouped in a record named
|
|
|
|
\textbf{Mailer}. Its attribute \textbf{Enabled} is annotated as \textbf{Bool},
|
|
|
|
which was deemed appropriate for a on-off switch-like functionality, with the
|
|
|
|
only permissible values being either \emph{True} or \emph{False}. Do note that
|
|
|
|
in Dhall $true != True$, since \textbf{True} is internally a Bool constant,
|
|
|
|
which is built into Dhall (check out ``The Prelude''~\cite{dhallprelude}),
|
|
|
|
while \textbf{true} is evaluated as an \emph{unbound} variable, that is, a
|
|
|
|
variable \emph{not} defined in the current \emph{scope} and thus not
|
|
|
|
\emph{present} in the current scope.
|
|
|
|
|
|
|
|
Another one of specialties of Dhall is that $==$ and $!=$ equality operators
|
|
|
|
only work on values of type \texttt{Bool}, which for example means that
|
|
|
|
variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text}
|
|
|
|
(\texttt{string}) cannot be compared directly as in other languages, which
|
|
|
|
either leaves the work for a higher-level language (such as Go), or from the
|
|
|
|
perspective of the Dhall authors, \emph{enums} are promoted when the value
|
|
|
|
matters.
|
|
|
|
|
|
|
|
|
|
|
|
\n{3}{Safety considerations}
|
|
|
|
|
|
|
|
Having a programmable configuration language that understands functions and
|
|
|
|
allows importing not only arbitrary text from random internet URLs, but also
|
|
|
|
importing and \emph{evaluating} (i.e.\ running) potentially untrusted code, it
|
|
|
|
is important that there are some safety mechanisms employed, which can be
|
|
|
|
relied on by the user. Dhall offers this in multiple features: enforcing a
|
|
|
|
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
|
|
|
of the expression being imported.
|
|
|
|
|
|
|
|
|
2023-05-23 22:42:07 +02:00
|
|
|
\begin{figure}[!h]
|
|
|
|
\begin{varwidth}
|
|
|
|
\scriptsize
|
|
|
|
\begin{verbatim}
|
|
|
|
, default =
|
|
|
|
-- | have sane defaults.
|
|
|
|
{ Host = ""
|
|
|
|
, Port = 3000
|
|
|
|
, HTTP =
|
|
|
|
{ Domain = ""
|
|
|
|
, Secure = False
|
|
|
|
, AutoTLS = False
|
|
|
|
, TLSKeyPath = ""
|
|
|
|
, TLSCertKeyPath = ""
|
|
|
|
, HSTSMaxAge = 0
|
|
|
|
, ContentSecurityPolicy = ""
|
|
|
|
, RateLimit = 0
|
|
|
|
, Gzip = 0
|
|
|
|
, Timeout = 0
|
|
|
|
}
|
|
|
|
, Mailer =
|
|
|
|
{ Enabled = False
|
|
|
|
, Protocol = "smtps"
|
|
|
|
, SMTPAddr = ""
|
|
|
|
, SMTPPort = 465
|
|
|
|
, ForceTrustServerCert = False
|
|
|
|
, EnableHELO = False
|
|
|
|
, HELOHostname = ""
|
|
|
|
, Auth = ""
|
|
|
|
, From = ""
|
|
|
|
, User = ""
|
|
|
|
, Password = ""
|
|
|
|
, SubjectPrefix = "pcmt - "
|
|
|
|
, SendPlainText = True
|
|
|
|
}
|
|
|
|
, LiveMode =
|
|
|
|
-- | LiveMode controls whether the application looks for
|
|
|
|
-- | directories "assets" and "templates" on the filesystem or
|
|
|
|
-- | in its bundled Embed.FS.
|
|
|
|
False
|
|
|
|
, DevelMode = False
|
|
|
|
, AppPath =
|
|
|
|
-- | AppPath specifies where the program looks for "assets" and
|
|
|
|
-- | "templates" in case LiveMode is True.
|
|
|
|
"."
|
|
|
|
, Session =
|
|
|
|
{ CookieName = "pcmt_session"
|
|
|
|
, CookieAuthSecret = ""
|
|
|
|
, CookieEncrSecret = ""
|
|
|
|
, MaxAge = 3600
|
|
|
|
}
|
|
|
|
, Logger = { JSON = True, Fmt = None Text }
|
|
|
|
, Init =
|
|
|
|
{ CreateAdmin =
|
|
|
|
-- | if this is True, attempt to create a user with admin
|
|
|
|
-- | privileges with the password specified below (or better -
|
|
|
|
-- | overriden); it fails if users already exist in the DB.
|
|
|
|
False
|
|
|
|
, AdminPassword =
|
|
|
|
-- | used for the first admin, forced change on first login.
|
|
|
|
"50ce50fd0e4f5894d74c4caecb450b00c594681d9397de98ffc0c76af5cff5953eb795f7"
|
|
|
|
}
|
|
|
|
, Registration.Allowed = True
|
|
|
|
}
|
2023-05-19 23:55:38 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
in Schema
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
2023-05-23 22:42:07 +02:00
|
|
|
\caption{Dhall configuration defaults for schema version 0.0.1-rc.2}
|
|
|
|
\label{fig:dhallschemadefaults}
|
2023-05-19 23:55:38 +02:00
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{3}{Possible alternatives}
|
2023-05-24 16:47:18 +02:00
|
|
|
|
|
|
|
While developing the program, the author has also
|
|
|
|
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
|
|
|
cache}, which can generally be observed in the scenario of running the program
|
|
|
|
in an environment that does not allow to write the cache files (a read-only
|
|
|
|
filesystem), of does not keep the written cache files, such as a container that
|
|
|
|
is not configured to mount a persistent volume at the pertinent location.
|
|
|
|
|
|
|
|
To describe the way Dhall works when performing an evaluation, it resolves
|
|
|
|
every expression down to a combination of its most basic types (eliminating all
|
|
|
|
abstraction and indirection) in the process called
|
|
|
|
\textbf{normalisation}~\cite{dhallnorm} and then saves this result in the
|
|
|
|
host's cache. The \texttt{dhall-haskell} binary attempts to resolve the
|
|
|
|
variable \texttt{XDG\_CACHE\_HOME} (have a look at \emph{XDG Base Directory
|
2023-05-19 23:55:38 +02:00
|
|
|
Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the results of
|
|
|
|
the normalisation will be written for repeated use. Do note that this
|
|
|
|
behaviour has been observed on a GNU/Linux host and the author has not verified
|
2023-05-24 16:47:18 +02:00
|
|
|
this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
If normalisation is performed inside an ephemeral container (as opposed to, for
|
|
|
|
instance, an interactive desktop session), the results effectively get lost on
|
|
|
|
each container restart, which is both wasteful and not great for user
|
2023-05-24 16:47:18 +02:00
|
|
|
experience, since the normalisation of just a handful of imports (which
|
|
|
|
internally branches widely) can take an upwards of two minutes, during which
|
|
|
|
the user is left waiting for the hanging application with no reporting on the
|
|
|
|
progress or current status.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
While workarounds for the above mentioned problem can be devised relatively
|
2023-05-24 16:47:18 +02:00
|
|
|
easily (bind mount persistent volumes inside the container in place of the
|
2023-05-19 18:24:17 +02:00
|
|
|
\texttt{XDG\_CACHE\_HOME/dhall} and \texttt{XDG\_CACHE\_HOME/dhall-haskell} to
|
|
|
|
preserve the cache between restarts, or let the cache be pre-computed during
|
|
|
|
container build, since the application is only really expected to run together
|
|
|
|
with a compatible version of the configuration schema and this version
|
|
|
|
\emph{is} known at container build time), it would certainly feel better if
|
|
|
|
there was not need to work \emph{around} the configuration system of choice.
|
|
|
|
|
|
|
|
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
|
|
|
as a potentially almost drop-in replacement for Dhall feature-wise, while also
|
2023-05-24 16:47:18 +02:00
|
|
|
resolving costly \emph{cold cache} normalisation operations, which is in
|
|
|
|
author's view Dhall's titular issue.
|
2023-05-19 21:40:42 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
|
2023-05-22 21:44:52 +02:00
|
|
|
\n{2}{Data integrity and authenticity}
|
|
|
|
|
|
|
|
The user can interact with the application via a web client, such as a browser,
|
|
|
|
and is required to authenticate for all sensitive operations. To not only know
|
|
|
|
\emph{who} the user is but also make sure they are \emph{permitted} to perform
|
|
|
|
the action they are attempting, the program employs an \emph{authorisation}
|
|
|
|
mechanism in the form of sessions. These are on the client side represented by
|
2023-05-24 16:47:18 +02:00
|
|
|
cryptographically signed and encrypted (using 256 bit AES) HTTP cookies. That
|
|
|
|
lays foundations for a few things: the data saved into the cookies can be
|
|
|
|
regarded as private because short of future \emph{quantum computers} only the
|
|
|
|
program itself can decrypt and access the data, and the data can be trusted
|
|
|
|
since it is both signed using the key that only the program controls and
|
|
|
|
\emph{encrypted} with \emph{another} key that equally only the program holds.
|
2023-05-22 21:44:52 +02:00
|
|
|
|
|
|
|
The cookie data is only ever written \emph{or} read at the server side,
|
|
|
|
solidifying the authors decision to let it be encrypted, as there is not point
|
|
|
|
in not encrypting it for some perceived client-side simplification. Users
|
|
|
|
navigating the website send their session cookie in \textbf{every request} (if
|
|
|
|
it exists) to the server, which then verifies the integrity of the data and in
|
|
|
|
case its valid, determines the existence and potential amount of user privilege
|
|
|
|
that should be granted. Public endpoints do not mandate the presence of a valid
|
|
|
|
session by definition, while at protected endpoints the user is authenticated
|
|
|
|
at every request. When a session expires or if there is no session to begin
|
|
|
|
with, the user is either shown a \emph{Not found} error message, the
|
|
|
|
\emph{Unauthorised} error message or redirected to \texttt{/signin}.
|
|
|
|
|
|
|
|
Another aspect that contributes to data integrity from another point of view is
|
|
|
|
utilising database \emph{transactions} for bundling together multiple database
|
|
|
|
operations that collectively change the \emph{state}. Using the transactional
|
|
|
|
jargon, the data is only \emph{committed} if each individual change was
|
|
|
|
successful. In case of any errors, the database is instructed to perform an
|
|
|
|
atomic \emph{rollback}, which brings it back to a state before the changes were
|
|
|
|
ever attempted.
|
|
|
|
|
|
|
|
The author has additionally considered the thought of utilising an embedded
|
|
|
|
immutable database like immudb (\url{https://immudb.io}) for record keeping
|
|
|
|
(verifiably storing data change history) and additional data integrity checks,
|
|
|
|
e.g.\ for tamper protection purposes and similar, however, that work remains
|
|
|
|
yet to be materialised.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
|
|
|
|
\n{2}{User isolation}
|
2023-05-22 22:32:22 +02:00
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
Users are allowed into certain parts of the application based on the role they
|
2023-05-22 22:32:22 +02:00
|
|
|
currently posses. For the moment, two basic roles were envisioned, while this
|
|
|
|
list might get amended in the future, if the need arises:
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\begin{itemize}
|
|
|
|
\item Administrator
|
|
|
|
\item User
|
|
|
|
\end{itemize}
|
|
|
|
|
2023-05-22 22:32:22 +02:00
|
|
|
It is paramount that the program protects itself from the insider threats as
|
|
|
|
well and therefore each role is only able to perform actions that it is
|
|
|
|
explicitly assigned. While there definitely is certain overlap between the
|
|
|
|
capabilities of the two outlined roles, each also possesses unique features
|
|
|
|
that the other does not.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
2023-05-22 22:32:22 +02:00
|
|
|
For example, the administrator role is not able to perform searches on the
|
|
|
|
breach data directly using their administrator account, for that a separate
|
|
|
|
user account has to be devised. Similarly, the regular user is not able to
|
|
|
|
manage breach lists and other users, because that is a privileged operation.
|
|
|
|
|
|
|
|
In-application administrators are not able to view sensitive (any) user data
|
|
|
|
and should therefore only be able to perform the following actions:
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item Create user accounts
|
|
|
|
\item View list of users
|
|
|
|
\item View user email
|
|
|
|
\item Change user email
|
|
|
|
\item Change user email
|
|
|
|
\item Toggle whether user is an administrator
|
|
|
|
\item Delete user accounts
|
|
|
|
\end{itemize}
|
|
|
|
|
2023-05-22 22:32:22 +02:00
|
|
|
Let us consider a case when a user manages self, while demoting from
|
|
|
|
administrator to a regular user is permitted, promoting self to be an
|
|
|
|
administrator would constitute a \emph{privilege escalation} and likely be a
|
|
|
|
precursor to a at least a \emph{denial of service} of sorts.
|
|
|
|
|
|
|
|
|
|
|
|
\n{2}{Zero trust principle}
|
|
|
|
|
|
|
|
\textit{Data confidentiality, i.e.\ not trusting the provider}
|
|
|
|
|
|
|
|
There is no way for the application (and consequently, the in-application
|
|
|
|
administrator) to read user's data. This is possible by virtue of encrypting
|
|
|
|
the pertinent data before saving them in the database by a state-of-the-art
|
|
|
|
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which is in
|
|
|
|
turn safely stored encrypted by a passphrase that only the user controls. Of
|
|
|
|
course, the user-supplied password is run by a password based key derivation
|
|
|
|
function (PBKDF) before letting it encrypt the \emph{age} key.
|
|
|
|
|
|
|
|
The \emph{age} key is only generated when the user changes their password for
|
|
|
|
the first time to prevent scenarios such as in-application administrator with
|
|
|
|
access to physical database being able to both \textbf{recover} the key from
|
|
|
|
the database and \textbf{decrypt} it given that they already know the user
|
|
|
|
password (because they set it), which would subsequently give them unbounded
|
|
|
|
access to any future encrypted data, as long as they would be able to maintain
|
|
|
|
their database access. This is why the \emph{age} key generation and protection
|
|
|
|
are bound to the first password change. Of course, the evil administrator could
|
|
|
|
just perform the change themselves, however, the user would at least be able to
|
|
|
|
find those changes in the activity logs and know not to use the application.
|
|
|
|
But given the scenario of a total database compromise, the author finds all
|
|
|
|
hope is already lost at that point.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
Consequently, both the application operators and the in-application
|
2023-05-22 22:32:22 +02:00
|
|
|
administrators should never be able to learn the details of what the user is
|
2023-05-19 18:24:17 +02:00
|
|
|
tracking, the same being applicable even to potential attackers with direct
|
2023-05-22 22:32:22 +02:00
|
|
|
access to the database. Thus the author maintains that every scenario that
|
|
|
|
could potentially lead to a data breach (apart from a compromised user machine
|
|
|
|
and the like) would have to entail some form of operating memory acquisition,
|
|
|
|
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
|
|
|
|
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
|
2023-05-23 18:15:21 +02:00
|
|
|
\n{2}{Compromise Monitoring}
|
2023-05-19 18:24:17 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{3}{Have I Been Pwned? Integration}
|
2023-05-19 18:24:17 +02:00
|
|
|
TODO
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
\n{3}{Local Dataset Plugin} Breach data from locally available datasets can be
|
|
|
|
imported into the application by first making sure it adheres to the specified
|
|
|
|
schema (have a look at the \emph{breach data schema} in
|
|
|
|
Figure~\ref{fig:breachDataGoSchema}). If it doesn't (which is very likely with
|
|
|
|
random breach data), it needs to be converted to a form that does before
|
|
|
|
importing it to the application, e.g.\ using a Python script or similar.
|
|
|
|
Attempting to import data that does not follow the outlined schema would result
|
|
|
|
in an error. Also, importing a dataset which is over a reasonable size limit
|
|
|
|
would by default be rejected by the program as a precaution, since marshaling
|
|
|
|
e.g.\ a 1 TiB document would likely result in an OOM situation on the host,
|
|
|
|
assuming regular consumer hardware conditions, not HPC.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\begin{varwidth}{\linewidth}
|
|
|
|
\begin{verbatim}
|
|
|
|
type breachDataSchema struct {
|
|
|
|
Name string
|
|
|
|
Time time.Time
|
|
|
|
IsVerified bool
|
|
|
|
ContainsPasswords bool
|
|
|
|
ContainsHashes bool
|
|
|
|
HashType string
|
|
|
|
HashSalted bool
|
|
|
|
HashPepperred bool
|
|
|
|
ContainsUsernames bool
|
|
|
|
ContainsEmails bool
|
|
|
|
Data any
|
|
|
|
}
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
|
|
|
|
\caption{Breach Data Schema represented as a Go struct with imports from the
|
|
|
|
standard library are assumed}
|
|
|
|
\label{fig:breachDataGoSchema}
|
|
|
|
\end{figure}
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
The Go representation shown in Figure~\ref{fig:breachDataGoSchema} will in
|
|
|
|
actuality be written and supplied by the user of the program as a YAML
|
|
|
|
document. YAML was chosen for multiple reasons: relative ease of use (plain
|
|
|
|
text, readable, support for inclusion of comments, its capability to store
|
|
|
|
multiple \emph{documents} inside of a single file with most of the inputs
|
|
|
|
implicitly typed as strings while thanks to being a superset of JSON it sports
|
|
|
|
machine readability. That should allow for documents similar to what can be
|
|
|
|
seen in Figure~\ref{fig:breachDataYAMLSchema} to be ingested by the program,
|
|
|
|
read and written by humans and programs alike.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
|
\centering
|
|
|
|
\begin{varwidth}{\linewidth}
|
|
|
|
\begin{verbatim}
|
|
|
|
---
|
|
|
|
name: Horrible breach
|
|
|
|
time: 2022-04-23T00:00:00Z+02:00
|
|
|
|
isVerified: false
|
|
|
|
containsPasswds: false
|
|
|
|
containsHashes: true
|
|
|
|
containsEmails: true
|
|
|
|
hashType: md5
|
|
|
|
hashSalted: false
|
|
|
|
hashPeppered: false
|
|
|
|
data:
|
|
|
|
hashes:
|
|
|
|
- hash1
|
|
|
|
- hash2
|
|
|
|
- hash3
|
|
|
|
emails:
|
|
|
|
- email1
|
|
|
|
-
|
|
|
|
- email3
|
|
|
|
---
|
|
|
|
# document #2, describing another breach.
|
|
|
|
name: Horrible breach 2
|
|
|
|
...
|
|
|
|
}
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
|
|
|
|
\caption{Example Breach Data Schema supplied to the program as a YAML file, optionally
|
|
|
|
containing multiple documents}
|
|
|
|
\label{fig:breachDataYAMLSchema}
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
Notice how the emails list in Figure~\ref{fig:breachDataYAMLSchema} misses one
|
2023-05-24 16:47:18 +02:00
|
|
|
record, perhaps because it was not supplied or mistakenly ommitted. This is a
|
|
|
|
valid scenario (mistakes happen) and the application needs to be able to handle
|
|
|
|
it. The alternative would be to require the user to prepare the data in such a
|
|
|
|
way that the empty/partial records would be dropped entirely.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
\n{2}{Database configuration}
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
The database schema is not created manually in the database, instead, an
|
|
|
|
Object-relational Mapping (ORM) tool named ent is used. This allows defining
|
|
|
|
the table schema and relations entirely in Go. The best part about ent is that
|
|
|
|
there is not need to define supplemental methods on the models, since ent
|
|
|
|
employs \emph{code generation}, which creates these based on the types of the
|
|
|
|
attributes in the model and the respective relations. For instance, if an
|
|
|
|
attribute is a string value \texttt{Email}, ent can be used to generate code
|
|
|
|
that contains methods on the user object like the following:
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
\begin{itemize}
|
2023-05-24 16:47:18 +02:00
|
|
|
\item EmailIn
|
|
|
|
\item EmailEQ
|
|
|
|
\item EmailNEQ
|
|
|
|
\item EmailHasSuffix
|
2023-01-31 04:11:49 +01:00
|
|
|
\end{itemize}
|
|
|
|
|
2023-05-24 16:47:18 +02:00
|
|
|
|
|
|
|
\n{1}{Production}
|
|
|
|
|
|
|
|
It is, of course, recommended that the application runs in a secure
|
|
|
|
environment, although definitions of that almost certainly differ depending on
|
|
|
|
who you ask. General recommendations would be either to effectively reserve a
|
|
|
|
machine for a single use case - running this program - so as to dramatically
|
|
|
|
decrease the potential attack surface of the host, or run the program isolated
|
|
|
|
in a container or a virtual machine. Further, if the host does not need
|
|
|
|
management access (it is a deployed-to-only machine that is configured
|
|
|
|
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
|
|
|
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
|
|
|
needed. In an ideal scenario, the host machine would have as little software
|
|
|
|
installed as possible besides what the application absolutely requires.
|
|
|
|
|
|
|
|
A demonstration of the above can be found in the multi-stage Containerfile that
|
|
|
|
is available in the main sources. The resulting container image only contains a
|
|
|
|
statically linked copy of the program, a default configuration file and
|
|
|
|
corresponding Dhall expressions cached at build time, which only support the
|
|
|
|
main configuration file. Since the program also needs a database, an example
|
|
|
|
scenario could include the container being run in a Podman pod together with
|
|
|
|
the database, which would not have to be exposed from the pod and would
|
|
|
|
therefore only be available over \texttt{localhost}.
|
|
|
|
|
|
|
|
It goes without saying that the operator should substitute values of any
|
|
|
|
default configuration secrets with new ones that were securely generated.
|
|
|
|
|
|
|
|
|
|
|
|
\n{2}{Deployment recommendations}
|
|
|
|
|
|
|
|
\n{3}{Transport security}
|
|
|
|
|
|
|
|
User connecting to the application should rightfully expect for their data to
|
|
|
|
be protected \textit{in transit} (i.e.\ on the way between their browser and
|
|
|
|
the server), which is what \emph{Transport Layer Security} family of
|
|
|
|
protocols~\cite{tls13rfc8446} was designed for, and which is the underpinning
|
|
|
|
of HTTPS. TLS utilises the primitives of asymmetric cryptography to let the
|
|
|
|
client authenticate the server (verify that it is who it claims it is) and
|
|
|
|
negotiate a symmetric key for encryption in the process named the \emph{TLS
|
|
|
|
handshake} (see Section~\ref{sec:tls} for more details), the final purpose of
|
|
|
|
which is establishing a secure communications connection. The operator should
|
|
|
|
configure the program to either directly utilise TLS using configuration or
|
|
|
|
have it listen behind a TLS-terminating \emph{reverse proxy}.
|
|
|
|
|
|
|
|
|
2023-05-19 18:24:17 +02:00
|
|
|
\n{3}{Containerisation}
|
|
|
|
Whether the pre-built or a custom container image is used to deploy the
|
|
|
|
application, it still needs access to secrets, such as database connection
|
|
|
|
string (containing database host, port, user, password/encrypted password,
|
|
|
|
authentication method and database name).
|
|
|
|
|
|
|
|
Currently, the application is able to handle \emph{peer}, \emph{scram-sha-256},
|
|
|
|
\emph{user name maps} and raw \emph{password} as Postgres authentication
|
|
|
|
methods~\cite{pgauthmethods}, although the \emph{password} option should not be
|
|
|
|
used in production, \emph{unless} the connection to the database is protected
|
|
|
|
by TLS.\ In any case, using the \emph{scram-sha-256}~\cite{scramsha256rfc7677}
|
|
|
|
method is preferable and one way to verify in development environment that
|
|
|
|
everything works as intended is the \emph{Password generator for PostgreSQL}
|
|
|
|
tool~\cite{goscramsha256}, which allows to get the encrypted string from a raw
|
|
|
|
user input.
|
|
|
|
|
|
|
|
If the application running in a container wants to use the \emph{peer}
|
|
|
|
authentication method, it is up to the operator to supply the Postgres socket
|
|
|
|
to the application (e.g.\ as a volume bind mount). This scenario was not
|
|
|
|
tested, however, and the author is also not entirely certain how \emph{user
|
|
|
|
namespaces} (on GNU/Linux) would influence the process (given that the
|
|
|
|
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
|
|
|
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
|
|
|
need to account.
|
|
|
|
|
|
|
|
Equally, if the application is running inside the container, the operator needs
|
2023-05-19 23:55:38 +02:00
|
|
|
to make sure that the database is either running in a network that is also
|
|
|
|
directly attached to the container or that there is a mechanism in place that
|
|
|
|
routes the requests for the database hostname to the destination.
|
2023-05-19 18:24:17 +02:00
|
|
|
|
|
|
|
One such mechanism is container name based routing inside \emph{pods}
|
|
|
|
(Podman/Kubernetes), where the resolution of container names is the
|
|
|
|
responsibility of a specially configured piece of software called Aardvark for
|
|
|
|
the former and CoreDNS for the latter.
|
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
|
|
|
|
\n{1}{Validation}
|
|
|
|
|
|
|
|
\n{2}{Unit tests}
|
|
|
|
|
2023-05-23 18:40:39 +02:00
|
|
|
Unit testing is a hot topic for many people and the author does not count
|
|
|
|
himself to be a staunch supporter of neither extreme. The ``no unit tests''
|
|
|
|
seems to discount any benefit there is to unit testing, while a `` TDD-only''
|
|
|
|
(TDD, or Test Driven Development is a development methodology whereby tests are
|
|
|
|
written first, then a complementary piece of code that is supposed to be
|
|
|
|
tested, just enough to get past the compile errors and to see the test fail,
|
|
|
|
then the code is refactored to make the test pass and then it can be fearlessly
|
2023-05-24 16:42:29 +02:00
|
|
|
extended because the test is the safety net catching us when the user slips and
|
|
|
|
alters the originally intended behaviour) approach can be a little too much for
|
|
|
|
some people's taste. The author tends to sport a \emph{middle ground} approach
|
|
|
|
here, with writing enough tests where meaningful but not necessarily testing
|
2023-05-23 18:40:39 +02:00
|
|
|
everything or writing tests prior to code, although arguably that practice
|
|
|
|
should result in writing a \emph{better} designed code, particularly because
|
|
|
|
there has to be a prior though about it because it needs to be tested
|
|
|
|
\emph{first}.
|
|
|
|
|
|
|
|
Thanks to Go's built in support for testing in its \texttt{testing} package and
|
|
|
|
the tooling in the \texttt{go} tool, writing tests is relatively simple. Go
|
|
|
|
looks for files in the form \texttt{<filename>\_test.go} in the present working
|
2023-05-24 16:42:29 +02:00
|
|
|
directory but can be instructed to look for test files in packages recursively
|
|
|
|
found on any path using the ellipsis, like so: \texttt{go test
|
2023-05-23 18:40:39 +02:00
|
|
|
./path/to/package/\ldots}, which then \emph{runs} all the tests found and
|
|
|
|
reports some statistics, such as the time it took to run the test or whether it
|
2023-05-24 16:42:29 +02:00
|
|
|
succeeded or failed. To be precise, the test files also need to contain test
|
2023-05-23 18:40:39 +02:00
|
|
|
functions, which are functions with the signature \texttt{func TestWhatever(t
|
2023-05-24 16:42:29 +02:00
|
|
|
*testing.T)\{\}} and where the function prefix ``Test'' is equally as important
|
|
|
|
as the signature. Without it, the function is not considered to be a testing
|
|
|
|
function despite having the required signature and is therefore \emph{not}
|
|
|
|
executed during testing.
|
|
|
|
|
|
|
|
This test lookup behaviour, however, also has a neat side-effect: all the test
|
|
|
|
files can be kept side-by-side their regular source counterparts, there is no
|
|
|
|
need to segregate them into a specially blessed \texttt{tests} folder or
|
|
|
|
similar, which in author's opinion improves readability. As a failsafe, in case
|
|
|
|
no actual test are found, the current behaviour of the tool is to print a note
|
|
|
|
informing the developer that no tests were found, which is handy to learn if it
|
|
|
|
was not intended/expected. When compiling regular source code, the Go files
|
|
|
|
with \texttt{\_test} in the name are simply ignored by the build tool.
|
2023-05-23 18:40:39 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{2}{Integration tests}
|
|
|
|
|
2023-05-24 16:41:45 +02:00
|
|
|
Integrating with external software, namely the database in case of this
|
|
|
|
program, is designed to utilise the same mechanism that was mentioned in the
|
|
|
|
previous section: Go's \texttt{testing} package. These tests verify that the
|
|
|
|
code changes can still perform the same actions with the external software that
|
|
|
|
were possible before the change and are run before every commit locally and
|
|
|
|
then after pushing to remote in the CI.
|
|
|
|
|
|
|
|
\n{3}{func TestUserExists(t *testing.T)}
|
|
|
|
|
|
|
|
An example integration test shown in Figure~\ref{fig:integrationtest} can be
|
|
|
|
seen to declare a helper function \texttt{getCtx() context.Context}, which
|
|
|
|
takes no arguments and returns a new \texttt{context.Context} initialised with
|
|
|
|
a value of the global logger, which is how the logger gets injected into the
|
|
|
|
user module functions. The function \texttt{TestUserExists(t *testing.T)} first
|
|
|
|
declares a database connection string and attempting to open a connection to
|
|
|
|
the database. The database in use here is SQLite3 running in memory mode,
|
|
|
|
meaning no file is actually written to disk during this process. Since the
|
|
|
|
testing data is not needed after the test, this is deemed good enough. Next, a
|
|
|
|
defer statement calling the \texttt{Close()} method on the database object is
|
|
|
|
made, which is the idiomatic Go way of closing files and network connections
|
|
|
|
(which are also an abstraction over files on UNIX-like operating systems such
|
|
|
|
as GNU/Linux). The \emph{defer} statement gets called when after all of the
|
|
|
|
statements in the surrounding function, which makes sure no file descriptors
|
|
|
|
(FDs) are leaked and the file is properly closed when the function returns.
|
|
|
|
|
|
|
|
In the next step a database schema creation is attempted, handling the
|
|
|
|
potential error in a Go idiomatic way, which uses the return value from the
|
|
|
|
function in an assignment to a variable declared in the \texttt{if} statement,
|
|
|
|
and checks whether the \texttt{err} was \texttt{nil} or not. In case the
|
|
|
|
\texttt{err} was not \texttt{nil}, i.e.\ \emph{there was an error in the callee
|
|
|
|
function}, the condition evaluates to \texttt{true}, which is followed by
|
|
|
|
entering the inner block. Inside the inner block the error is announced to the
|
|
|
|
user (likely a developer running the test in this case) and the testing
|
|
|
|
object's \texttt{FailNow()} method is called, which marks the test function as
|
|
|
|
having failed and stops its execution, which in this case is the desired
|
|
|
|
outcome, since if the database schema creation call fails there really is no
|
|
|
|
point in continuing testing of user creation.
|
|
|
|
|
|
|
|
Conversely, if the schema does get created without an error, the code continues
|
|
|
|
to declare a few variables: \texttt{username}, \texttt{email} and \texttt{ctx},
|
|
|
|
to which the context injected with the logger is saved. Some of them are
|
|
|
|
subsequently passed into the \texttt{UsernameExists} function, context as the
|
|
|
|
first argument, with the database pointer and username being passed next, while
|
|
|
|
the \texttt{email} variable is only used at a later stage, but was declared
|
|
|
|
here to give a sense of grouping. The error value returned from this function
|
|
|
|
is again checked and if everything goes well, the value of the
|
|
|
|
\texttt{usernameFound} boolean is checked next. Since the database has just
|
|
|
|
been created, there should be no users, which is checked in the next
|
|
|
|
\texttt{if} statement. The same check is then performed for the
|
|
|
|
earlier-declared user email that is also expected to fail.
|
|
|
|
|
|
|
|
The final statements of the described test attempts a user creation call, which
|
|
|
|
is again checked for both error and \emph{nilability}. The test continues with
|
|
|
|
more similar checks but it has been cut short for brevity.
|
|
|
|
|
|
|
|
A neat thing about error handling in Go is it allows for very easy checking of
|
|
|
|
all paths, not just the \emph{happy path} where there are no issues.
|
|
|
|
|
|
|
|
\begin{figure}[!h]
|
|
|
|
\centering
|
|
|
|
\scriptsize
|
|
|
|
\begin{varwidth}{\linewidth}
|
|
|
|
\begin{verbatim}
|
|
|
|
// modules/user/user_test.go
|
|
|
|
package user
|
|
|
|
|
|
|
|
import (
|
|
|
|
"context"
|
|
|
|
"testing"
|
|
|
|
|
|
|
|
"git.dotya.ml/mirre-mt/pcmt/ent/enttest"
|
|
|
|
"git.dotya.ml/mirre-mt/pcmt/slogging"
|
|
|
|
_ "github.com/xiaoqidun/entps"
|
|
|
|
)
|
|
|
|
|
|
|
|
func getCtx() context.Context {
|
|
|
|
l := slogging.Init(false)
|
|
|
|
ctx := context.WithValue(context.Background(), CtxKey{}, l)
|
|
|
|
|
|
|
|
return ctx
|
|
|
|
}
|
|
|
|
|
|
|
|
func TestUserExists(t *testing.T) {
|
|
|
|
connstr := "file:ent_tests?mode=memory&_fk=1"
|
|
|
|
db := enttest.Open(t, "sqlite3", connstr)
|
|
|
|
defer db.Close()
|
|
|
|
|
|
|
|
if err := db.Schema.Create(context.Background()); err != nil {
|
|
|
|
t.Errorf("failed to create schema resources: %v", err)
|
|
|
|
t.FailNow()
|
|
|
|
}
|
|
|
|
|
|
|
|
username := "dude"
|
|
|
|
email := "dude@b.cc"
|
|
|
|
ctx := getCtx()
|
|
|
|
|
|
|
|
usernameFound, err := UsernameExists(ctx, db, username)
|
|
|
|
if err != nil {
|
|
|
|
t.Errorf("error checking for username {%s} existence: %q",
|
|
|
|
username,
|
|
|
|
err,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
if usernameFound {
|
|
|
|
t.Errorf("unexpected: user{%s} should not have been found",
|
|
|
|
username,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
if _, err := EmailExists(ctx, db, email); err != nil {
|
|
|
|
t.Errorf("unexpected: user email '%s' should not have been found",
|
|
|
|
email,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
usr, err := CreateUser(ctx, db, email, username, "so strong")
|
|
|
|
if err != nil {
|
|
|
|
t.Errorf("failed to create user, error: %q", err)
|
|
|
|
t.FailNow()
|
|
|
|
} else if usr == nil {
|
|
|
|
t.Error("got nil usr back")
|
|
|
|
t.FailNow()
|
|
|
|
}
|
|
|
|
|
|
|
|
if usr.Username != username {
|
|
|
|
t.Errorf("got back wrong username, want: %s, got: %s",
|
|
|
|
username, usr.Username,
|
|
|
|
)
|
|
|
|
}
|
|
|
|
// ...more checks...
|
|
|
|
}
|
|
|
|
\end{verbatim}
|
|
|
|
\end{varwidth}
|
|
|
|
\caption{Example integration test}
|
|
|
|
\label{fig:integrationtest}
|
|
|
|
\end{figure}
|
2023-05-23 18:40:39 +02:00
|
|
|
|
2023-01-31 04:11:49 +01:00
|
|
|
\n{2}{Click-ops}
|
|
|
|
|
2023-01-30 22:48:23 +01:00
|
|
|
% =========================================================================== %
|
|
|
|
\nn{Conclusion}
|
|
|
|
|
|
|
|
% =========================================================================== %
|