tex: add extensive changes
This commit is contained in:
parent
b8dcac8235
commit
39428c908a
BIN
graphics/pcmt-use-case.pdf
Normal file
BIN
graphics/pcmt-use-case.pdf
Normal file
Binary file not shown.
@ -51,4 +51,70 @@ Blake3:\\
|
||||
SHA3-256:\\
|
||||
\texttt{66ebbdb20b5459360368d29615e6e80f36bcf464d5519ca08ae651f27a8970bf}\\
|
||||
|
||||
|
||||
\priloha{Whys}\label{appendix:whys}
|
||||
|
||||
This appendix is concerned with explaining why certain technologies were used.
|
||||
|
||||
\n{2}{Why Go}\label{appendix:whygo}
|
||||
|
||||
First, a question of \textit{`Why pick Go for building a web application?'}
|
||||
might arise, so the following few lines will try to address that.
|
||||
|
||||
Go~\cite{golang}, or \emph{Golang} for SEO-friendliness and disambiguating Go
|
||||
the ancient game, is a strongly typed, high-level \emph{garbage-collected}
|
||||
language where functions are first-class citizens and errors are values.
|
||||
|
||||
The appeal for the author comes from a number of features of the language, such
|
||||
as built-in support for concurrency and unit testing, sane \emph{zero} values,
|
||||
lack of pointer arithmetic, inheritance and implicit type conversions,
|
||||
easy-to-read syntax, producing a statically linked binary by default, etc., on
|
||||
top of that, the language has got a cute mascot. Thanks to the foresight of the
|
||||
Go Authors regarding \emph{the formatting question} (i.e.\ where to put the
|
||||
braces, \textbf{tabs vs.\ spaces}, etc.), most of the discussions on this topic
|
||||
have been foregone. Every \emph{gopher}\footnote{euph.\ a person writing in the
|
||||
Go programming language} is expected to format their source code with the
|
||||
official formatter (\texttt{gofmt}), which automatically ensures that the code
|
||||
adheres to the one formatting standard. Then, there is \emph{The Promise} of
|
||||
backwards compatibility for Go 1.x, which makes it a good choice for long-term
|
||||
without the fear of being rug-pulled.
|
||||
|
||||
|
||||
\n{2}{Why Nix/devenv}\label{appendix:whynix}
|
||||
|
||||
Nix (\url{https://builtwithnix.org/}) is a functional programming language
|
||||
resembling Haskell and a declarative package manager, which has been used in
|
||||
this project in the form of \texttt{devenv} tool (\url{https://devenv.sh/}) to
|
||||
create \textbf{declarable} and \textbf{reproducible} development environment.
|
||||
The author has previously used Nix directly with \emph{flakes} and liked
|
||||
\texttt{devenv}, as it effectively exposed only a handful of parameters for
|
||||
configuration, and rid of the need to manage the full flake, which is of course
|
||||
still an option for people who choose so. See \texttt{devenv.nix} in the
|
||||
repository root.
|
||||
|
||||
|
||||
\priloha{Terminology}\label{appendix:terms}
|
||||
|
||||
\n{2}{Linux}
|
||||
|
||||
The term \emph{Linux} is exclusively used in the meaning of the
|
||||
Linux kernel~\cite{linux}.
|
||||
|
||||
\n{2}{GNU/Linux}
|
||||
|
||||
As far as a Linux-based operating system is concerned, the term ``GNU/Linux''
|
||||
as defined by the Free Software Foundation~\cite{fsfgnulinux} is used. While it
|
||||
is longer and arguably a little bit cumbersome, the author aligns with the
|
||||
opinion that this term more correctly describes its actual target. Being aware
|
||||
that there are many people who conflate the complete operating system with its
|
||||
(be it core) component, the kernel, the author is taking care to distinguish
|
||||
the two, although writing from experience, colloquially, this probably brings
|
||||
more confusion and a lengthy explanation is usually required.
|
||||
|
||||
\n{2}{The program}
|
||||
|
||||
By \emph{the program} or \emph{the application} without any additional context
|
||||
the author most probably means the Password Compromise Monitoring Tool program.
|
||||
|
||||
|
||||
% =========================================================================== %
|
||||
|
@ -179,7 +179,7 @@
|
||||
@misc{age,
|
||||
howpublished = {[online]},
|
||||
title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.},
|
||||
author = {Filippo Sotille and Ben Cox and age contributors},
|
||||
author = {Filippo Valsorda and Ben Cox and age contributors},
|
||||
year = 2021,
|
||||
note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-23]}}
|
||||
}
|
||||
|
1676
tex/text.tex
1676
tex/text.tex
@ -72,177 +72,130 @@ practices in an effort to build a maintainable and long-lasting piece of
|
||||
software that serves its users well. When deployed, it could provide real
|
||||
value.
|
||||
|
||||
Terminology is located in Appendix~\ref{appendix:terms}, feel free to give it a
|
||||
read.
|
||||
|
||||
% =========================================================================== %
|
||||
\part{Theoretical part}
|
||||
|
||||
\n{1}{Terminology}
|
||||
|
||||
\n{2}{Linux}
|
||||
|
||||
The term \emph{Linux} is exclusively used in the meaning of the
|
||||
Linux kernel~\cite{linux}.
|
||||
|
||||
|
||||
\n{2}{GNU/Linux}
|
||||
|
||||
As far as a Linux-based operating system is concerned, the term ``GNU/Linux''
|
||||
as defined by the Free Software Foundation~\cite{fsfgnulinux} is used. While it
|
||||
is longer and arguably a little bit cumbersome, the author aligns with the
|
||||
opinion that this term more correctly describes its actual target. Being aware
|
||||
there are many people that conflate the complete operating system with its (be
|
||||
it core) component, the kernel, the author is taking care to distinguish the
|
||||
two, although writing from experience, colloquially, this probably brings more
|
||||
confusion and a lengthy explanation is usually required.
|
||||
|
||||
|
||||
\n{2}{Containers}
|
||||
|
||||
When the concept of \emph{containerisation} and \emph{containers} is mentioned
|
||||
throughout this work, the author has OCI containers~\cite{ocicontainers} in
|
||||
mind, which is broadly a superset of \emph{Linux Containers} where some set of
|
||||
processes is presented with a view of kernel resources (there are multiple
|
||||
kinds of resources, such as IPC queues; network devices, stacks, ports; mount
|
||||
points, process IDs, user and group IDs, Cgroups and others) that differs for
|
||||
each different set of processes, similar in thought to FreeBSD
|
||||
\emph{jails}~\cite{freebsdjails} with the distinction being that they are, of
|
||||
course, facilitated by the Linux kernel namespace
|
||||
functionality~\cite{linuxnamespaces}, which is in turn regarded to be
|
||||
\emph{inspired} by Plan 9's namespaces~\cite{plan9namespaces}, Plan 9 being a
|
||||
Bell Labs successor to Unix 8th Edition, discontinued in 2015.
|
||||
While there without a doubt \emph{is} specificity bound to using each of the
|
||||
tools that enable creating (Podman vs.\ Buildah vs.\ Docker BuildX) or running
|
||||
(ContainerD vs.\ runC vs.\ crun) container images, when describing an action
|
||||
that gets performed with or onto a container, the process should generally be
|
||||
explained in such a way that it is repeatable using any spec-conforming tool
|
||||
that is available and \emph{intended for the job}.
|
||||
|
||||
\vspace*{-\baselineskip}
|
||||
\n{2}{The program}
|
||||
|
||||
By \emph{the program} or \emph{the application} without any additional context
|
||||
the author usually means the Password Compromise Monitoring Tool program.
|
||||
|
||||
|
||||
\n{1}{Cryptography primer}\label{sec:cryptographyprimer}
|
||||
|
||||
\n{2}{Encryption}
|
||||
|
||||
Encryption is the process of transforming certain data, called a
|
||||
\emph{message}, using, as Aumasson writes in Serious Cryptography, ``an
|
||||
algorithm called a \emph{cipher} and a secret value called the
|
||||
key''~\cite{seriouscryptography}. Its purpose is to protect the said message so
|
||||
that only its intended recipients that know/hold the key are able to
|
||||
\emph{decipher} and read it.
|
||||
|
||||
\n{3}{Symmetric encryption}
|
||||
|
||||
Symmetric encryption is simply when the \emph{key} used is to facilitate both
|
||||
encryption and decryption operations.
|
||||
|
||||
\n{3}{Asymmetric encryption}
|
||||
|
||||
Asymmetric encryption is different from symmetric encryption in that there are
|
||||
now two keys in use - a key \emph{pair}. One part is used solely for
|
||||
encryption, while the other part's only purpose is to decrypt. This notion of
|
||||
two keys is generally transposed to a domain called \emph{public key
|
||||
cryptography}, whereby the decryption component is declared private and the
|
||||
encryption component is called \emph{public}, hence the name. The rationale is
|
||||
that everybody can encrypt messages \emph{for} the recipient but only they are
|
||||
able to \emph{decrypt} them, which is a feature allowed by the mathematical
|
||||
complementarity of the two components, and also explains why the private key
|
||||
should be kept \emph{private}. Compared to symmetric encryption, this variant
|
||||
is generally slower.
|
||||
|
||||
\n{3}{The key exchange problem}
|
||||
|
||||
Suppose a communication scheme that is protected by a pre-shared secret.
|
||||
In order to establish secure communications, this secret needs to be
|
||||
distributed to the other party via untrusted channels. In 1976 Whitfield Diffie
|
||||
and Martin Hellman published a paper in which they devised a \emph{public-key
|
||||
distribution scheme}, which allows the two parties to arrive at a shared secret
|
||||
by exchanging information via insecure channels with the presence of an
|
||||
eavesdropper. This scheme (or its variations) is in use to this day.
|
||||
\textbf{TODO:} add \emph{why} we care and how it's going to be used.
|
||||
|
||||
\n{2}{Hash functions}
|
||||
|
||||
Hash functions are cryptographic algorithms used to help with a number of
|
||||
things: integrity verification, password protection, digital signature,
|
||||
public-key encryption and others. Hashes are used in forensic analysis to prove
|
||||
authenticity of digital artifacts, to uniquely identify a change-set within
|
||||
revision-based source code management systems such as Git, Subversion or
|
||||
Mercurial, to detect known-malicious software by anti-virus programs or by
|
||||
advanced filesystems in order to verify block integrity and enable repairs, and
|
||||
also in many other applications that each person using a modern computing
|
||||
device has come across, such as when connecting to a website protected by the
|
||||
famed HTTPS.
|
||||
Hash functions are algorithms used to help with a number of things: integrity
|
||||
verification, password protection, digital signature, public-key encryption and
|
||||
others. Hashes are used in forensic analysis to prove authenticity of digital
|
||||
artifacts, to uniquely identify a change-set within revision-based source code
|
||||
management systems such as Git, Subversion or Mercurial, to detect
|
||||
known-malicious software by anti-virus programs or by advanced filesystems in
|
||||
order to verify block integrity and enable repairs, and also in many other
|
||||
applications that each person using a modern computing device has come across,
|
||||
such as when connecting to a website protected by the famed HTTPS.
|
||||
|
||||
The popularity stems from a common use case: the need to identify a chunk of
|
||||
data. Of course, two chunks of data, two files, frames or packets could always
|
||||
be compared bit by bit, but that can get prohibitive from both cost and energy
|
||||
point of view relatively quickly. That is when the hash functions come in,
|
||||
since they are able to take a long input and produce a short output, named a
|
||||
digest or a hash value. It also does not work the other way around, a file
|
||||
cannot be reconstructed from the hash digest, it is a one-way function.
|
||||
The popularity of hash functions stems from a common use case: the need to
|
||||
simplify reliably identifying a chunk of data. Of course, two chunks of data,
|
||||
two files, frames or packets could always be compared bit by bit, but that can
|
||||
get prohibitive from both cost and energy point of view relatively quickly.
|
||||
That is when the hash functions come in, since they are able to take a long
|
||||
input and produce a short output, named a digest or a hash value. The function
|
||||
also only works one way.
|
||||
|
||||
\n{3}{Rainbow tables}
|
||||
A file, or any original input data for that matter, cannot be reconstructed
|
||||
from the hash digest alone by somehow \emph{reversing} the hashing operation,
|
||||
since at the heart of any hash function there is essentially a compression
|
||||
function.
|
||||
|
||||
As passwords are in more responsible scenarios stored not directly but as
|
||||
hashes, attackers that would be interested in recovering the passwords really
|
||||
only have one option (except finding a critical vulnerability in the hash
|
||||
function): rainbow tables. Rainbow tables are lists of pre-computed hashes
|
||||
paired with the passwords that were used to create them. When attackers gain
|
||||
access to a password breach that contains hashes, all it takes is to find a
|
||||
match within the rainbow table and reversely resolve that to the known
|
||||
message: the password.
|
||||
Most alluringly, hashes are frequently used with the intent of
|
||||
\emph{protecting} passwords by making those unreadable, while still being able
|
||||
to verify that the user knows the password, therefore should be authorised.
|
||||
|
||||
One of the popular counter-measures to pre-computed tables is adding a
|
||||
\emph{salt} to the user-provided password before passing it to the KDF (Key
|
||||
Derivation Function) or the hash function. Of course, the salt should be random
|
||||
\textbf{per-user} and not reused, as that would mean that two users with the
|
||||
same password would still end up with the same hash, and the salt should also
|
||||
be adequately long to be effective. As the salt is supposed to be
|
||||
\emph{random}, it would be a good idea to use an actual CSPRNG, such as
|
||||
\textbf{Fortuna}~\cite{fortuna} as a source of entropy (randomness). In
|
||||
FreeBSD, Fortuna is in fact the one serving \texttt{/dev/random}.
|
||||
As the hashing operation is irreversible, once the one-way function produces a
|
||||
short a digest, there is no way to reconstruct the original message from it.
|
||||
That is, unless the input of the hash function is also known, in which case all
|
||||
it takes is hashing the supposed input and comparing the digest with existing
|
||||
digests that are known to be digests of passwords.
|
||||
|
||||
\\ \textbf{TODO:} ad more on \emph{why} we care and what types of hashes should be
|
||||
used (with refs) and why.
|
||||
|
||||
|
||||
\n{3}{TLS}\label{sec:tls}
|
||||
\n{3}{Types and use cases}
|
||||
|
||||
Hash functions can be loosely categorised based on their intended use case to
|
||||
\emph{password protection hashes}, \emph{integrity verification hashes},
|
||||
\emph{message authentication codes} and \emph{cryptographic hashes}. Each of
|
||||
these possess unique characteristics and using the wrong type of hash function
|
||||
for the wrong job can potentially result in a security breach.
|
||||
|
||||
As an example, suppose \texttt{MD5}, a popular hash function internally using
|
||||
the same data structure - \emph{Merkle-Damgård} construction - as
|
||||
\texttt{BLAKE3}. While the former produces 128 bit digests, the latter by
|
||||
default outputs 256 bit digest with no upper limit (Merkle tree extensibility).
|
||||
|
||||
There is a list of differences that could further be mentioned, however, they
|
||||
both have one thing in common: they are \emph{designed} to be \emph{fast}. The
|
||||
latter, as a cryptographic hash function, is conjectured to be \emph{random
|
||||
oracle indifferentiable}, secure against length extension, but it is also in
|
||||
fact faster than all of \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and
|
||||
even \texttt{Blake2} family of functions.
|
||||
|
||||
The use case of both is to (quickly) verify integrity of a given chunk of data,
|
||||
in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not
|
||||
to secure a password by hashing it first, which poses a big issue when used
|
||||
to...secure passwords by hashing them first.
|
||||
|
||||
A password hash function, such as \texttt{argon2} or \texttt{bcrypt} are good
|
||||
choices for securely storing hashed passwords, namely because they place CPU
|
||||
and memory burden on the host computing the digest, as well as limit potential
|
||||
parallelism, thus preventing the scale at which an exhaustive search could be
|
||||
launched. Additionally, both functions automatically \emph{salt} the passwords
|
||||
before hashing them, which means that two exact same passwords of two different
|
||||
users will not end up hashing to the same digest value, making it that much
|
||||
harder to recover the original, supposedly weak password.
|
||||
|
||||
|
||||
\n{3}{Why are hashes interesting}
|
||||
|
||||
As already mentioned, since hashes are often used to store the
|
||||
representation of the password instead of the password itself, they become a
|
||||
subject of interest when they get leaked. There have been enough instances of
|
||||
leaked raw passwords that anyone with enough interest can put together a neat
|
||||
list of hashes of the most popular passwords.
|
||||
|
||||
So while the service does not store plain text passwords, which is good, using
|
||||
a hashing function not designed to protect passwords does not offer much
|
||||
additional protection in case of weak passwords, which are the most commonly
|
||||
used ones.
|
||||
|
||||
It seems logical that a service that is not using cryptographic primitives
|
||||
correctly is more likely to get hacked and have its users' passwords/hashes
|
||||
leaked. Then, the Internet ends up serving as a storage of every data dump,
|
||||
often exposing these passwords/hashes for everyone to access.
|
||||
|
||||
|
||||
\n{2}{TLS}\label{sec:tls}
|
||||
|
||||
The Transport Layer Security protocol (or TLS) serves as as an encryption and
|
||||
\emph{authentication} protocol to secure internet communications. An important
|
||||
part of the protocol is the \emph{handhake}, during which the two communicating
|
||||
part of the protocol is the \emph{handshake}, during which the two communicating
|
||||
parties exchange messages that acknowledge each other's presence, verify each
|
||||
other, choose what cryptographic algorithms will be used and decide session
|
||||
keys. As there are multiple versions of the protocol in active duty even at the
|
||||
moment, the server together with the client need to agree upon the version they
|
||||
are going to use (these days it should be 1.2 or 1.3), pick cipher suites
|
||||
(TLSv1.3 dramatically reduced the number of available suites), the client
|
||||
verifies the server's public key (and the signature of the certificate
|
||||
authority that issued it) and they both generate session keys for use after
|
||||
handshake completion.
|
||||
are going to use (these days it is recommended to use either 1.2 or 1.3),
|
||||
pick cipher suites (), the client verifies the server's public key (and the signature of the
|
||||
certificate authority that issued it) and they both generate session keys for
|
||||
use after handshake completion.
|
||||
|
||||
The handshake consists of multiple stages (again, depending on the version), for
|
||||
TLSv1.3 that would be:
|
||||
|
||||
\begin{itemize}
|
||||
\item \textbf{Client hello}: client sends a client hello message containing
|
||||
the protocol version, a list of cipher suites and the client random value.
|
||||
The client in this step also includes the ephemeral Diffie-Helman (EDH)
|
||||
parameters, which are later used for calculating the pre-master key.
|
||||
\item \textbf{Server generating a master secret}: the server has got the
|
||||
cipher suites, the client's paramaters and client random and already has
|
||||
the server random, which means it can create the master secret.
|
||||
\item \textbf{Server hello and ``Finished''}: the server includes in the
|
||||
hello its certificate, digital signature, server random, the chosen
|
||||
cipher suite, and sends a ``Finished'' (meaning \emph{ready}) message.
|
||||
\item \textbf{Signature and certificate verification}: the client at this
|
||||
step verifies server's certificate and signature, generates the master
|
||||
secret and is ready (sends the ``Finished'' message).
|
||||
\end{itemize}
|
||||
|
||||
At the end of the process, the connection is protected by symmetric encryption
|
||||
using the session key that the both parties have arrived at.
|
||||
TLSv1.3 dramatically reduced the number of available suites to only include the
|
||||
ones deemed secure enough, which is why it is no longer needed to manually
|
||||
specify what cipher suite should be used (or rely on the client/server to
|
||||
choose wisely). While possibly facing compatibility issues with legacy devices,
|
||||
the simplicity that enabling TLSv1.3 brings is a worthy trade-off.
|
||||
|
||||
|
||||
\n{1}{Passwords}\label{sec:passwords}
|
||||
@ -381,151 +334,7 @@ internet that is discussed in the next sections and covers what browsers are,
|
||||
what they do and how they relate to web security.
|
||||
|
||||
|
||||
\n{2}{Browsers}\label{sec:browsers}
|
||||
|
||||
Browsers, sometimes used together with the word that can serve as a real tell
|
||||
for their specialisation - \emph{web} browsers - are programs intended for
|
||||
\emph{browsing} of \emph{the web}. In more technical terms, browsers are
|
||||
programs that facilitate (directly or via intermediary tools) domain name
|
||||
lookups, connecting to web servers, optionally establishing a secure
|
||||
connection, requesting the web page in question, determining its \emph{security
|
||||
policy} and resolving what accompanying resources the web page specifies and
|
||||
depending on the applicable security policy, requesting those from their
|
||||
respective origins, applying stylesheets and running scripts. Constructing a
|
||||
program that can speak many protocols and securely runs untrusted code from the
|
||||
internet is no easy task.
|
||||
|
||||
\n{3}{Complexity}
|
||||
|
||||
Browsers these days are also quite ubiquitous programs running on
|
||||
\emph{billions} of consumer grade mobile devices (which are also notorious for
|
||||
bad update hygiene) or desktop devices all over the world. Regular users
|
||||
usually expect them to work flawlessly with a multitude of network conditions,
|
||||
network scenarios (the proverbial café WiFi, cellular data in a remote
|
||||
location, home broadband that is DNS-poisoned by the ISP), differently tuned
|
||||
(or commonly misconfigured) web servers, a combination of modern and
|
||||
\emph{legacy} encryption schemes and different levels of conformance to web
|
||||
standards from both web server and website developers. Of course, if a website
|
||||
is broken, it is the browser's fault. Browsers are expected to detect if
|
||||
\emph{captive portals} (a type of access control that usually tries to force
|
||||
the user through a webpage with terms of use) are active and offer redirects.
|
||||
All of this is immense complexity and the combination of ubiquity and great
|
||||
exposure that this type of software gets is, in the author's opinion, the cause
|
||||
behind a staggering amount of vulnerabilities found, reported and fixed in
|
||||
browsers every year.
|
||||
|
||||
\n{3}{Standardisation}
|
||||
|
||||
Over the years, a consortium of parties interested in promoting and developing
|
||||
the web (also due to its potential as a digital marketplace, i.e.\ financial
|
||||
incentives) and browser vendors (of which the most neutral participant is
|
||||
perhaps \emph{Mozilla}, with Chrome being run by Google, Edge by Microsoft and
|
||||
Safari/Webkit by Apple) has evolved a great volume of web standards, which are
|
||||
also relatively frequently getting updated or deprecated and replaced by
|
||||
revised or new ones, rendering the browser maintenance task into essentially a
|
||||
cat-and-mouse game.
|
||||
|
||||
It is the web's extensibility that enabled this build-up and ironically has
|
||||
been proclaimed by some to be its greatest asset. It has also been ostensibly
|
||||
been criticised~\cite{ddvweb} in the past and the frustration with the status
|
||||
quo of web standards has relatively recently prompted a group of people to even
|
||||
create ``\textit{a new application-level internet protocol for the distribution
|
||||
of arbitrary files, with some special consideration for serving a lightweight
|
||||
hypertext format which facilitates linking between files}'':
|
||||
Gemini~\cite{gemini}\cite{geminispec} that in the words of its authors can be
|
||||
thought of as ``\textit{the web, stripped right back to its essence}'' or as
|
||||
``\textit{Gopher, souped up and modernised just a little}'', depending upon the
|
||||
reader's perspective, noting that the latter view is probably more accurate.
|
||||
|
||||
\n{3}{HTTP}
|
||||
|
||||
Originally, HTTP was also designed just for fetching hypertext
|
||||
\emph{resources}, but it has evolved since then, particularly due to its
|
||||
extensibility, to allow for fetching of all sorts of web resources a modern
|
||||
website of today provides, such as scripts or images, or even to \emph{post}
|
||||
content back to servers.
|
||||
|
||||
HTTP relies on TCP (Transmission Control Protocol), which is one of the
|
||||
\emph{reliable} (mandated by HTTP) protocols used to send data across
|
||||
contemporary IP (Internet Protocol) networks, to deliver the data it requests
|
||||
or sends. When Tim Berners-Lee invented the World Wide Web (WWW) in 1989 while
|
||||
working at CERN (The European Organization for Nuclear Research) with a rather
|
||||
noble intent as a ``\emph{wide-area hypermedia information retrieval initiative
|
||||
to give universal access to a large universe of documents}''~\cite{wwwf}, he
|
||||
also invented the HyperText Markup Language (HTML) to serve as a formatting
|
||||
method for these new hypermedia documents. The first website was written
|
||||
roughly the same way as today's websites are, using HTML, although the markup
|
||||
language has changed since, with the current version being HTML5.
|
||||
|
||||
It has been mentioned that the client \textbf{requests} a \textbf{resource} and
|
||||
receives a \textbf{response}, so those terms should probably be defined.
|
||||
|
||||
A request is what the client sends to the server. A resource is what it
|
||||
requests and a response is the answer provided by the server.
|
||||
|
||||
HTTP follows a classic client-server model whereby it is \textbf{always} the
|
||||
client that initiates the request.
|
||||
|
||||
A web page is, to be blunt, a chunk of \emph{hypertext}. To display a web page,
|
||||
a browser first needs to send a request to fetch the HTML representing the
|
||||
page, which is then parsed and additional requests for sub-resources are made.
|
||||
If a page defines a layout information in the form of CSS, that is parsed as
|
||||
well.
|
||||
|
||||
A web page needs to be present on the local computer first \emph{before} it can
|
||||
be parsed by the browser, and since websites are usually still served by
|
||||
programs called \emph{web servers} as in the \emph{early days}, that presents a
|
||||
problem of how tell the browser where the resource should be fetched from. In
|
||||
today's browsers, the issue is sorted (short of the CLI) by the \emph{address
|
||||
bar}, a place into which user types what they wish the browser to fetch for
|
||||
them.
|
||||
|
||||
The formal name of this segment is a \emph{Universal Resource Locator}, or URL,
|
||||
and it contains the schema (or the protocol, such as \texttt{http://}), the
|
||||
host address or a domain name and a (TCP) port number.
|
||||
|
||||
Since a TCP connection needs to be established first, to connect to a server
|
||||
whose only URL contains a domain name, the browser needs to perform a domain
|
||||
name \emph{lookup} using system facilities, or as was the case for a couple of
|
||||
notorious Chromium versions, send some additional and unrelated queries which
|
||||
(with Chromium-based derivatives' numbers) ended up placing unnecessary load
|
||||
directly at the root DNS servers~\cite{chromiumrootdns}.
|
||||
|
||||
If a raw IP address+port combination is used, the browser attempts to connect
|
||||
to it directly and requests the user-requested page by default using the
|
||||
\texttt{GET} \emph{method}. A \emph{well-known} HTTP port 80 is assumed unless
|
||||
other port is explicitly specified and it can be omitted both if host is a
|
||||
domain name or an IP address.
|
||||
|
||||
The method is a way for the user-agent to define what operation it wants to
|
||||
perform. \texttt{GET} is used for fetching resources while \texttt{POST} is
|
||||
used to send data to the server, such as to post the values of an HTML form.
|
||||
|
||||
A server response is comprised of a \textbf{status code}, a status message,
|
||||
HTTP \textbf{headers} and an optional \textbf{body} containing the content. The
|
||||
status code indicates if the original request was successful or not and the
|
||||
browser is generally there to interpret these status codes to the user. There
|
||||
is enough status codes to be confused by the sheer numbers but luckily, there
|
||||
is a method to the madness and they can be divided into groups/classes:
|
||||
|
||||
\begin{itemize}
|
||||
\item 1xx: Informational responses
|
||||
\item 2xx: Successful responses
|
||||
\item 3xx: Redirection responses
|
||||
\item 4xx: Client error responses
|
||||
\item 5xx: Server error responses
|
||||
\end{itemize}
|
||||
|
||||
In case the \emph{user agent} (a web \emph{client}) such as a browser receives
|
||||
a response with content, it has to parse it.
|
||||
|
||||
A header is additional information sent by both the server and the client that
|
||||
can guide or alter the behaviour of software reading it. For instance a
|
||||
\texttt{Cache-control} header with a duration value can be used by the server
|
||||
to signify that the client can store certain resources for some time before
|
||||
needing to re-fetch them, if they are not \emph{expired}.
|
||||
|
||||
\n{3}{Site Isolation}
|
||||
\n{2}{Site Isolation}
|
||||
|
||||
Modern browsers such as Firefox or Chromium come with a security focus in mind.
|
||||
Their developers are acutely aware of the dangers that parsing untrusted code
|
||||
@ -558,6 +367,7 @@ access to session tokens and any cookies associated with the website's origin,
|
||||
apart from being able to rewrite the HTML content. The results of XSS can
|
||||
range from account compromise to identity theft.
|
||||
|
||||
|
||||
\n{2}{Content Security Policy}\label{sec:csp}
|
||||
|
||||
Content Security Policy (CSP) has been an important addition to the arsenal of
|
||||
@ -600,15 +410,380 @@ in production. There are many more directives and settings than mentioned in
|
||||
this section, the author encourages anybody interested to give it a read, e.g.\
|
||||
at \url{https://web.dev/csp/}.
|
||||
|
||||
\n{2}{Summary}
|
||||
\textbf{TODO}: add more concrete examples.
|
||||
|
||||
Passwords are in use everywhere and probably will be for the foreseeable
|
||||
future. As long as passwords are going to be handled and stored by
|
||||
service/application providers, they are going to get leaked, be it due to
|
||||
provider carelessness or the attackers' resolve and wit. Of course, sifting
|
||||
through all the available password breach data by hand is not a reasonable
|
||||
option, and therefore tools should come in to provide assistance. The next part
|
||||
of the thesis will explore that and offer a solution.
|
||||
|
||||
\n{1}{Configuration}
|
||||
|
||||
Every non-trivial program usually offers at least \emph{some} way to
|
||||
tweak/manage its behaviour, and these changes are usually persisted
|
||||
\emph{somewhere} on the filesystem of the host: in a local SQLite3 database, a
|
||||
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
||||
configuration file. These configuration files need to be read and checked at
|
||||
least on program start-up and either stored into operating memory for the
|
||||
duration of the runtime of the program, or loaded and parsed and the memory
|
||||
subsequently \emph{freed} (initial configuration).
|
||||
|
||||
There is an abundance of configuration languages (or file formats used to craft
|
||||
configuration files, whether they were intended for it or not) available, TOML,
|
||||
INI, JSON, YAML, to name some of the popular ones (as of today).
|
||||
|
||||
Dhall stood out as a language that was designed with both security and the
|
||||
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
||||
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
||||
few of its concepts from Haskell), and in its apparent core being very similar
|
||||
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
||||
is: ``a programmable configuration language that you can think of as: JSON +
|
||||
functions + types + imports''~\cite{dhalllang}.
|
||||
|
||||
Among all of the listed features, the especially intriguing one to the author
|
||||
was the promise of \emph{types}. There are multiple examples directly on the
|
||||
project's documentation webpage demonstrating for instance the declaration and
|
||||
usage of custom types (that are, of course merely combinations of the primitive
|
||||
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
||||
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
||||
designing a custom configuration \emph{schema} for the program.
|
||||
Dhall not being a Turing-complete language also guarantees that evaluation
|
||||
\emph{always} terminates eventually, which is a good attribute to possess as a
|
||||
configuration language.
|
||||
|
||||
\n{3}{Safety considerations}
|
||||
|
||||
Having a programmable configuration language that understands functions and
|
||||
allows importing not only arbitrary text from random internet URLs, but also
|
||||
importing and \emph{evaluating} (i.e.\ running) potentially untrusted code, it
|
||||
is important that there are some safety mechanisms employed, which can be
|
||||
relied on by the user. Dhall offers this in multiple features: enforcing a
|
||||
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
||||
of the expression being imported.
|
||||
|
||||
\n{3}{Possible alternatives}
|
||||
|
||||
While developing the program, the author has also
|
||||
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
||||
cache}, which can generally be observed in the scenario of running the program
|
||||
in an environment that does not allow to write the cache files (a read-only
|
||||
filesystem), of does not keep the written cache files, such as a container that
|
||||
is not configured to mount a persistent volume at the pertinent location.
|
||||
|
||||
To describe the way Dhall works when performing an evaluation, it resolves
|
||||
every expression down to a combination of its most basic types (eliminating all
|
||||
abstraction and indirection) in the process called
|
||||
\textbf{normalisation}~\cite{dhallnorm} and then saves this result in the
|
||||
host's cache. The \texttt{dhall-haskell} binary attempts to resolve the
|
||||
variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
|
||||
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
|
||||
results of the normalisation will be written for repeated use. Do note that
|
||||
this behaviour has been observed on a GNU/Linux host and the author has not
|
||||
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
||||
|
||||
If normalisation is performed inside an ephemeral container (as opposed to, for
|
||||
instance, an interactive desktop session), the results effectively get lost on
|
||||
each container restart. That is both wasteful and not great for user
|
||||
experience, since the normalisation of just a handful of imports (which
|
||||
internally branches widely) can take an upwards of two minutes, during which
|
||||
the user is left waiting for the hanging application with no reporting on the
|
||||
progress or current status.
|
||||
|
||||
While workarounds for the above mentioned problem can be devised relatively
|
||||
easily (such as bind mounting persistent volumes inside the container in place
|
||||
of the \texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
|
||||
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} to preserve the cache between
|
||||
restarts, or let the cache be pre-computed during container build, since the
|
||||
application is only really expected to run together with a compatible version
|
||||
of the configuration schema and this version \emph{is} known at container build
|
||||
time), it would certainly feel better if there was no need to work
|
||||
\emph{around} the configuration system of choice.
|
||||
|
||||
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
||||
as a potentially almost drop-in replacement for Dhall feature-wise, while also
|
||||
resolving costly \emph{cold cache} normalisation operations, which is in
|
||||
author's view Dhall's titular issue.
|
||||
|
||||
|
||||
\n{1}{Compromise Monitoring}
|
||||
|
||||
There are, of course, several ways one could approach monitoring of compromised
|
||||
of credentials, some more \emph{manual} in nature than others. When using a
|
||||
service that is suspected/expected to be breached in the future, one can always
|
||||
create a unique username/password combination specifically for the subject
|
||||
service and never use that combination anywhere else. That way, if the
|
||||
credentials ever \emph{do} happen to appear in a data dump online in the
|
||||
future, it is going to be a safe assumption as to where they came from.
|
||||
|
||||
Unfortunately, the task of actually \emph{monitoring} the credentials can prove
|
||||
to be a little more arduous than one could expect at first. There are a couple
|
||||
of points that can prove to pose a challenge in case the search is performed by
|
||||
hand, namely:
|
||||
|
||||
\begin{itemize}
|
||||
\item finding the breached data to look through
|
||||
\item verifying the trustworthiness of the data
|
||||
\item varying quality of the data
|
||||
\item sifting through (possibly) unstructured data by hand
|
||||
\end{itemize}
|
||||
|
||||
Of course, as this is a popular topic for a number of people, the above
|
||||
mentioned work has already been packaged into neat and practical online
|
||||
offerings. In case one decides in favour of using those, an additional range of
|
||||
issues (the previous one still applicable) arises:
|
||||
|
||||
\begin{itemize}
|
||||
\item the need to trust the provider with input credentials
|
||||
\item relying on the goodwill of the provider to be able to access the data
|
||||
\item hoping that the terms of service are kept
|
||||
\end{itemize}
|
||||
|
||||
Besides that, there is a plethora of breaches floating around the Internet
|
||||
available simply as zip files, which makes the job even harder.
|
||||
|
||||
The overarching goal of this thesis is devising and implementing a system in
|
||||
which the user can \emph{monitor} whether their credentials have been
|
||||
\emph{compromised} (at least as far as the data can tell), and allowing them to
|
||||
do so without needing to entrust their sensitive data to a provider.
|
||||
|
||||
|
||||
\n{2}{Data Sources}\label{sec:dataSources}
|
||||
|
||||
A data source in this place is considered anything that provides the
|
||||
application with data that it understands.
|
||||
|
||||
Of course, the results of credential compromise verification/monitoring is only
|
||||
going to be as good as the data underpinning it, which is why it is imperative
|
||||
that high quality data sources be used, if at all possible. While great care
|
||||
does have to be taken to only choose the highest quality data sources, the
|
||||
application must offer a means to be able to utilise these.
|
||||
|
||||
The sources from which breached data can be loaded into an application can be
|
||||
split into two basic categories: \textbf{online} or \textbf{local}, and it is
|
||||
possible to further discern between \emph{structured} and \emph{unstructured}
|
||||
data.
|
||||
|
||||
An online source is generally a service that ideally exposes a programmatic
|
||||
API, which an application can query and from which it can request the necessary
|
||||
subsets of data.
|
||||
These types of services often additionally front the data by a user-friendly
|
||||
web interface for one-off searches, which is, however, not of use here.
|
||||
|
||||
Among some examples of online services could be named:
|
||||
|
||||
\begin{itemize}
|
||||
\item {Have I Been Pwned?} - \url{https://haveibeenpawned.com}
|
||||
\item {DeHashed} - \url{https://dehashed.com}
|
||||
\end{itemize}
|
||||
|
||||
Large lumps of unstructured data available on forums or shady web servers would
|
||||
technically also count here, given that they provide data and are available
|
||||
online. However, even though data is frequently found online precisely in this
|
||||
form, it is also not of direct use for the application without manual
|
||||
\emph{preprocessing}, as it is attended to in
|
||||
Section~\ref{sec:localDatasetPlugin}.
|
||||
|
||||
Another source is then simply any locally supplied data, which, of course,
|
||||
could have been obtained from a breach available online beforehand.
|
||||
|
||||
Locally supplied data is specific in that it needs to be formatted in such a
|
||||
way that it can be understood by the application. That is, the data is not in
|
||||
its raw form anymore but has been morphed into the precise shape the
|
||||
application needs for further processing. Once imported, the application can
|
||||
query the data at will, as it knows exactly the shape of it.
|
||||
|
||||
This supposes the existence of a \emph{format} for importing, schema of which
|
||||
is devised in Section~\ref{sec:localDatasetPlugin}.
|
||||
|
||||
|
||||
\n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin}
|
||||
|
||||
Unstructured breach data from locally available datasets can be imported into
|
||||
the application by first making sure it adheres to the specified schema (have a
|
||||
look at the \emph{Breach Data Schema} in Listing~\ref{breachDataGoSchema}). If
|
||||
it does not (which is very likely with random breach data, as already mentioned
|
||||
in Section~\ref{sec:dataSources}), it needs to be converted to a form that
|
||||
\emph{does} before importing it to the application, e.g.\ using a Python script
|
||||
or a similar method.
|
||||
|
||||
Attempting to import data that does not follow the outlined schema should
|
||||
result in an error. Equally so, importing a dataset which is over a reasonable
|
||||
size limit should by default be rejected by the program as a precaution.
|
||||
Unmarshaling, for instance, a 1 TiB document would most likely result in an
|
||||
out-of-memory (OOM) situation on the host running the application, assuming
|
||||
contemporary consumer hardware conditions (not HPC).
|
||||
|
||||
\vspace{\parskip}
|
||||
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go
|
||||
struct with imports from the standard library assumed},
|
||||
label=breachDataGoSchema]
|
||||
type breachDataSchema struct {
|
||||
Name string
|
||||
Time time.Time
|
||||
IsVerified bool
|
||||
ContainsPasswords bool
|
||||
ContainsHashes bool
|
||||
HashType string
|
||||
HashSalted bool
|
||||
HashPepperred bool
|
||||
ContainsUsernames bool
|
||||
ContainsEmails bool
|
||||
Data any
|
||||
}
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
|
||||
actuality translate to a YAML document written and supplied by an
|
||||
administrative user of the program. The YAML format was chosen for several
|
||||
reasons:
|
||||
|
||||
\begin{itemize}
|
||||
\item relative ease of use (plain text, readability)
|
||||
\item capability to store multiple \emph{documents} inside of a single file
|
||||
\item most of the inputs being implicitly typed as strings
|
||||
\item support for inclusion of comments
|
||||
\item machine readability thanks to being a superset of JSON
|
||||
\end{itemize}
|
||||
|
||||
The last point specifically should allow for documents similar to what can be
|
||||
seen in Listing~\ref{breachDataYAMLSchema} to be ingested by the program, read
|
||||
and written by humans and programs alike.
|
||||
|
||||
\smallskip
|
||||
\begin{lstlisting}[language=YAML, caption={Example Breach Data Schema supplied
|
||||
to the program as a YAML file, optionally containing multiple documents},
|
||||
label=breachDataYAMLSchema]
|
||||
---
|
||||
name: Horrible breach
|
||||
time: 2022-04-23T00:00:00Z+02:00
|
||||
isVerified: false
|
||||
containsPasswds: false
|
||||
containsHashes: true
|
||||
containsEmails: true
|
||||
hashType: md5
|
||||
hashSalted: false
|
||||
hashPeppered: false
|
||||
data:
|
||||
hashes:
|
||||
- hash1
|
||||
- hash2
|
||||
- hash3
|
||||
emails:
|
||||
- email1
|
||||
-
|
||||
- email3
|
||||
---
|
||||
# document #2, describing another breach.
|
||||
name: Horrible breach 2
|
||||
...
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
Notice how the emails list in Listing~\ref{breachDataYAMLSchema} misses one
|
||||
record, perhaps because it was not supplied or mistakenly omitted. This is a
|
||||
valid scenario (mistakes happen) and the application needs to be able to handle
|
||||
it. The alternative would be to require the user to prepare the data in such a
|
||||
way that the empty/partial records would be dropped entirely.
|
||||
|
||||
\n{3}{Have I Been Pwned? Integration}
|
||||
|
||||
Troy Hunt's \textbf{Have I Been Pwned?} online service
|
||||
(\url{https://haveibeenpwned.com/}) has been chosen as the online source of
|
||||
compromised data. The service offers private APIs that are protected by API
|
||||
keys. The application's \texttt{hibp} module and database representation models
|
||||
the values returned by this API, which allows searching in large breaches using
|
||||
email addresses.\\
|
||||
The architecture there is relatively simple: the application administrator
|
||||
configures an API key for HIBP, the user enters the query parameters, the
|
||||
application constructs a query and calls the API and waits for a response. As
|
||||
the API is rate-limited based on the key supplied, this can pose an issue and
|
||||
it has not been fully resolved in the UI. The application then parses the
|
||||
returned data and binds it to the local model for validation. If that goes
|
||||
well, the data is saved into the database as a cache and the search query is
|
||||
performed on the saved data. If it returns anything, it is displayed to the
|
||||
user for browsing.
|
||||
|
||||
|
||||
\n{1}{Deployment recommendations}\label{sec:deploymentRecommendations}
|
||||
|
||||
It is, of course, recommended that the application runs in a secure environment
|
||||
\allowbreak although definitions of that almost certainly differ depending on
|
||||
who you ask. General recommendations would be either to effectively reserve a
|
||||
machine for a single use case - running this program - so as to dramatically
|
||||
decrease the potential attack surface of the host, or run the program isolated
|
||||
in a container or a virtual machine. Further, if the host does not need
|
||||
management access (it is a deployed-to-only machine that is configured
|
||||
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
||||
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
||||
needed. In an ideal scenario, the host machine would have as little software
|
||||
installed as possible besides what the application absolutely requires.
|
||||
|
||||
System-wide cryptographic policies should target highest feasible security
|
||||
level, if at all available (such as by default on Fedora or RHEL), covering
|
||||
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
|
||||
and SELinux (kernel-level mandatory access control and security policy
|
||||
mechanism) running in \emph{enforcing} mode, if available.
|
||||
|
||||
\n{2}{Transport security}
|
||||
|
||||
User connecting to the application should rightfully expect for their data to
|
||||
be protected \textit{in transit} (i.e.\ on the way between their browser and
|
||||
the server), which is what \emph{Transport Layer Security} family of
|
||||
protocols~\cite{tls13rfc8446} was designed for, and which is the underpinning
|
||||
of HTTPS. TLS utilises the primitives of asymmetric cryptography to let the
|
||||
client authenticate the server (verify that it is who it claims it is) and
|
||||
negotiate a symmetric key for encryption in the process named the \emph{TLS
|
||||
handshake} (see Section~\ref{sec:tls} for more details), the final purpose of
|
||||
which is establishing a secure communications connection. The operator should
|
||||
configure the program to either directly utilise TLS using configuration or
|
||||
have it listen behind a TLS-terminating \emph{reverse proxy}.
|
||||
|
||||
|
||||
\n{2}{Containerisation}
|
||||
|
||||
Whether the pre-built or a custom container image is used to deploy the
|
||||
application, it still needs access to secrets, such as database connection
|
||||
string (containing database host, port, user, password/encrypted password,
|
||||
authentication method and database name).
|
||||
|
||||
The application should be able to handle the most common Postgres
|
||||
authentication methods~\cite{pgauthmethods}, namely \emph{peer},
|
||||
\emph{scram-sha-256}, \emph{user name maps} and raw \emph{password}, although
|
||||
the \emph{password} option should not be used in production, \emph{unless} the
|
||||
connection to the database is protected by TLS.\ In any case, using the
|
||||
\emph{scram-sha-256}~\cite{scramsha256rfc7677} method is preferable. One of the
|
||||
ways to verify in development environment that everything works as intended is
|
||||
the \emph{Password generator for PostgreSQL} tool~\cite{goscramsha256}, which
|
||||
allows retrieving the encrypted string from a raw user input.
|
||||
|
||||
If the application running in a container wants to use the \emph{peer}
|
||||
authentication method, it is up to the operator to supply the Postgres socket
|
||||
to the application (e.g.\ as a volume bind mount). This scenario was not
|
||||
tested; however, and the author is also not entirely certain how \emph{user
|
||||
namespaces} (on GNU/Linux) would influence the process (as in when the
|
||||
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
||||
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
||||
need to account.
|
||||
|
||||
Equally, if the application is running inside the container, the operator needs
|
||||
to make sure that the database is either running in a network that is also
|
||||
directly attached to the container or that there is a mechanism in place that
|
||||
routes the requests for the database hostname to the destination.
|
||||
|
||||
One such mechanism is container name based routing inside \emph{pods}
|
||||
(Podman/Kubernetes), where the resolution of container names is the
|
||||
responsibility of a specially configured (often auto-configured) piece of
|
||||
software called Aardvark for the former and CoreDNS for the latter.
|
||||
|
||||
|
||||
\n{1}{Summary}
|
||||
|
||||
Passwords (and/or passphrases) are in use everywhere and quite probably will be
|
||||
for the foreseeable future. If not as \textit{the} principal way to
|
||||
authenticate, then at least as \textit{a} way to authenticate. As long as
|
||||
passwords are going to be handled and stored by service/application providers,
|
||||
they are going to get leaked, be it due to provider carelessness or the
|
||||
attackers' resolve and wit. Of course, sifting through all the available
|
||||
password breach data by hand is not a reasonable option, and therefore tools
|
||||
providing assistance come in handy. The next part of this diploma thesis will
|
||||
explore that issue and introduce a solution.
|
||||
|
||||
|
||||
% =========================================================================== %
|
||||
@ -616,14 +791,10 @@ of the thesis will explore that and offer a solution.
|
||||
|
||||
\n{1}{Kudos}
|
||||
|
||||
\textbf{Disclaimer:} the author is not affiliated in any way with any of the
|
||||
projects described on this page.
|
||||
|
||||
The \textit{Password Compromise Monitoring Tool} (\texttt{pcmt}) program has
|
||||
been developed using and utilising a great deal of free (as in Freedom) and
|
||||
open-source software in the process, either directly or as an outstanding work
|
||||
tool, and the author would like to take this opportunity to recognise that
|
||||
fact.
|
||||
The program that has been developed as part of this thesis used and utilised a
|
||||
great deal of free (as in \textit{freedom}) and open-source software in the
|
||||
process, either directly or as an outstanding work tool, and the author would
|
||||
like to take this opportunity to recognise that fact\footnotemark.
|
||||
|
||||
In particular, the author acknowledges that this work would not be the same
|
||||
without:
|
||||
@ -641,9 +812,12 @@ without:
|
||||
|
||||
All of the code written has been typed into VIM (\texttt{9.0}), the shell used
|
||||
to run the commands was ZSH, both running in the author's terminal emulator of
|
||||
choice - \texttt{kitty} on a \raisebox{.8ex}{\texttildelow}8 month (at the time
|
||||
of writing) installation of \textit{Arch Linux (by the way)} using a
|
||||
\texttt{6.3.x-wanderer-zfs-xanmod1} variant of the Linux kernel.
|
||||
choice, \texttt{kitty}. The development machines ran a recent installation of
|
||||
\textit{Arch Linux (by the way)} and Fedora 38, both using a \texttt{6.3.x}
|
||||
XanMod variant of the Linux kernel.
|
||||
|
||||
\footnotetext{\textbf{Disclaimer:} the author is not affiliated in any way with any
|
||||
of the projects described on this page.}
|
||||
|
||||
|
||||
\n{1}{Development}
|
||||
@ -689,9 +863,9 @@ There is one caveat to this though, git first needs some additional
|
||||
configuration for the code in Listing~\ref{gitverif} to work as one would
|
||||
expect. Namely that the public key used to verify the signature needs to be
|
||||
stored in git's ``allowed signers file'', then git needs to be told where that
|
||||
file is using the configuration value \texttt{gpg.ssh.allowedsignersfile} and
|
||||
finally the configuration value of the \texttt{gpg.format} field needs to be
|
||||
set to \texttt{ssh}.
|
||||
file is located using the configuration value
|
||||
\texttt{gpg.ssh.allowedsignersfile} and finally the configuration value of the
|
||||
\texttt{gpg.format} field needs to be set to \texttt{ssh}.
|
||||
|
||||
Because git allows the configuration values to be local to each repository,
|
||||
both of the mentioned issues can be solved by running the following commands
|
||||
@ -703,10 +877,11 @@ label=gitsshprep, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
||||
% # set the signature format for the local repository.
|
||||
% git config --local gpg.format ssh
|
||||
% # save the public key.
|
||||
% cat >./tmp/.allowed_signers \
|
||||
<<<'leo ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKwshTdBgLzwY4d8N7VainZCngH88OwvPGhZ6bm87rBO'
|
||||
% cat > ./.tmp-allowed_signers \
|
||||
<<<'surtur <insert literal surtur pubkey>
|
||||
leo <insert literal leo pubkey>'
|
||||
% # set the allowed signers file path for the local repository.
|
||||
% git config --local gpg.ssh.allowedsignersfile=./tmp/.allowed_signers
|
||||
% git config --local gpg.ssh.allowedsignersfile=./.tmp-allowed_signers
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
@ -767,17 +942,17 @@ The fourth pipeline focuses on linting the Containerfile and building the
|
||||
container, although the latter action is only performed on feature branches,
|
||||
\emph{pull requests} or \emph{tag} events.
|
||||
|
||||
The median build time as of writing was 1 minute, which includes running all
|
||||
four pipelines, and that is acceptable. Build times might of course vary
|
||||
depending on the hardware, for reference, these builds were being run on a
|
||||
machine equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4
|
||||
3200MHz RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS)
|
||||
and a 400Mbps downlink, software-wise running Arch with an author-flavoured
|
||||
Xanmod kernel version 6.3.x.
|
||||
|
||||
\obr{Drone CI median build
|
||||
time}{fig:drone-median-build}{.84}{graphics/drone-median-build}
|
||||
|
||||
The median build time as of writing was 1 minute, which includes running all
|
||||
four pipelines, and that is acceptable. Build times might of course vary
|
||||
depending on the hardware, for reference, these builds were run on a machine
|
||||
equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4 3200MHz
|
||||
RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS) and a
|
||||
400Mbps downlink, software-wise running Arch with an author-flavoured Xanmod
|
||||
kernel version 6.3.x.
|
||||
|
||||
|
||||
\n{2}{Source code repositories}\label{sec:repos}
|
||||
|
||||
@ -805,20 +980,28 @@ The repository containing the \LaTeX{} source code of this thesis:\\
|
||||
|
||||
\n{2}{Toolchain}
|
||||
|
||||
Throughout the creation of this work, the \emph{current} version of the Go
|
||||
Throughout the creation of this work, the \emph{then-current} version of the Go
|
||||
programming language was used, i.e. \texttt{go1.20}.
|
||||
|
||||
To read more on why Go was chosen, see Appendix~\ref{appendix:whygo}.
|
||||
Nix/\texttt{devenv} tools have also aided heavily during development, see
|
||||
Appendix~\ref{appendix:whynix} to learn more.
|
||||
|
||||
\tab{Tool/Library-Usage Matrix}{tab:toolchain}{1.0}{ll}{
|
||||
\textbf{Name} & \textbf{Usage} \\
|
||||
Go programming language & program core \\
|
||||
Dhall configuration language & program configuration \\
|
||||
Echo & HTTP handlers, controllers, web server \\
|
||||
ent & ORM using graph-based modelling \\
|
||||
bluemonday & HTML sanitising \\
|
||||
bluemonday & sanitising HTML \\
|
||||
TailwindCSS & stylesheets using a utility-first approach \\
|
||||
PostgreSQL & storing data \\
|
||||
PostgreSQL & persistently storing data \\
|
||||
}
|
||||
|
||||
Table~\ref{tab:depsversionmx} contains the names and versions of the most
|
||||
important libraries and supporting software that were used to build the
|
||||
application.
|
||||
|
||||
\tab{Dependency-Version Matrix}{tab:depsversionmx}{1.0}{ll}{
|
||||
\textbf{Name} & \textbf{version} \\
|
||||
\texttt{echo} (\url{https://echo.labstack.com/}) & 4.10.2 \\
|
||||
@ -829,90 +1012,85 @@ programming language was used, i.e. \texttt{go1.20}.
|
||||
\texttt{PostgreSQL} (\url{https://www.postgresql.org/}) & 15.2 \\
|
||||
}
|
||||
|
||||
\n{2}{A word about Go}
|
||||
First, a question of \textit{`Why pick Go for building a web
|
||||
application?'} might arise, so the following few lines will try to address
|
||||
that.
|
||||
|
||||
Go~\cite{golang}, or \emph{Golang} for SEO-friendliness and disambiguating Go
|
||||
the ancient game, is a strongly typed, high-level \emph{garbage-collected}
|
||||
language where functions are first-class citizens and errors are values.
|
||||
|
||||
The appeal for the author comes from a number of features of the language, such
|
||||
as built-in support for concurrency and unit testing, sane \emph{zero} values,
|
||||
lack of pointer arithmetic, inheritance and implicit type conversions,
|
||||
easy-to-read syntax, producing a statically linked binary by default, etc., on
|
||||
top of that, the language has got a cute mascot. Thanks to the foresight of the
|
||||
Go Authors regarding \emph{the formatting question} (i.e.\ where to put the
|
||||
braces, \textbf{tabs vs.\ spaces}, etc.), most of the discussions on this topic
|
||||
have been foregone. Every \emph{gopher}~\footnote{euph.\ a person writing in
|
||||
the Go programming language} is expected to format their source code with the
|
||||
official formatter (\texttt{gofmt}), which automatically ensures that the code
|
||||
adheres to the one formatting standard. Then, there is \emph{The Promise} of
|
||||
backwards compatibility for Go 1.x, which makes it a good choice for long-term
|
||||
without the fear of being rug-pulled.
|
||||
|
||||
|
||||
\n{2}{A word about Nix/devenv}
|
||||
|
||||
Nix (\url{https://builtwithnix.org/}) is a declarative package manager and a
|
||||
functional programming language resembling Haskell, which has been used in this
|
||||
project in the form of \texttt{devenv} tool (\url{https://devenv.sh/}) to
|
||||
create \textbf{declarable} and \textbf{reproducible} development environment.
|
||||
The author has previously used Nix directly with \emph{flakes} and liked
|
||||
\texttt{devenv}, as it effectively exposed only a handful of parameters for
|
||||
configuration, and rid of the need to manage the full flake, which is of course
|
||||
still an option for people who choose so. See \texttt{devenv.nix} in the
|
||||
repository root.
|
||||
|
||||
\n{1}{Application architecture}
|
||||
|
||||
The source code of the main module further is split into Go \emph{packages}
|
||||
appropriately along a couple of domains: logging, core application, web
|
||||
routers, configuration and settings, etc. In Go, packages are delimited by
|
||||
folder structure -- each folder can be package.
|
||||
\n{2}{Package structure}
|
||||
|
||||
The source code of the main module is organised into smaller, self-contained Go
|
||||
\emph{packages} appropriately along a couple of domains: logging, core
|
||||
application, web routers, configuration and settings, etc. In Go, packages are
|
||||
delimited by folder structure -- each folder can be a package.
|
||||
|
||||
Generally speaking, the program aggregates decision points into central places,
|
||||
such as \texttt{run.go}, which imports child packages that facilitate each of
|
||||
loading the configuration, connecting to the database and running migrations,
|
||||
consolidating flag, environment variable and configuration-based values into
|
||||
canonical \emph{settings}, setting up routes and handling graceful shutdown.
|
||||
such as \texttt{run.go}, which then imports child packages that facilitate each
|
||||
of the task of loading the configuration, connecting to the database and
|
||||
running migrations, consolidating flag, environment variable and
|
||||
configuration-based values into canonical \emph{settings}, setting up routes
|
||||
and handling graceful shutdown.
|
||||
|
||||
\n{3}{Internal package}
|
||||
|
||||
The \texttt{internal} package was not used as of writing, but the author plans
|
||||
to eventually migrate \emph{internal} logic of the program into the internal
|
||||
package to prevent accidental imports.
|
||||
|
||||
|
||||
\n{2}{Logging}
|
||||
|
||||
The program uses dependency injection to share a single logger instance,
|
||||
similar applies to the database client. These are passed around as a pointer,
|
||||
so the underlying data stays the same. As a rule of thumb, every larger
|
||||
\texttt{struct} that needs to be passed around is passed around as a pointer.
|
||||
|
||||
The \texttt{internal} package was not used as of writing, but the author plans
|
||||
to eventually migrate \emph{internal} logic of the program into the internal
|
||||
package to prevent accidental imports.
|
||||
|
||||
The authentication logic is relatively simple and the author would like to
|
||||
\n{2}{Authentication}
|
||||
|
||||
The authentication logic is relatively simple and the author attempted to
|
||||
isolate it into a custom \emph{middleware}. User passwords are hashed using a
|
||||
secure KDF before being sent to the database. The KDF used is \texttt{bcrypt}
|
||||
(with a sane \emph{Cost} of 10), which automatically includes \emph{salt} for
|
||||
the password and provides ``length-constant'' time hash comparisons. The author
|
||||
plans to add support for the more modern \texttt{scrypt} and the
|
||||
state-of-the-art, P-H-C (Password Hashing Competition) winner algorithm
|
||||
\texttt{Argon2} (\url{https://github.com/P-H-C/phc-winner-argon2}). Besides, no
|
||||
raw queries are used to access the database, helping decrease the likelihood of
|
||||
SQL injection attacks.
|
||||
secure KDF before being sent to the database. The KDF of choice is
|
||||
\texttt{bcrypt} (with a sane \emph{Cost} of 10), which automatically includes
|
||||
\emph{salt} for the password and provides ``length-constant'' time hash
|
||||
comparisons. The author plans to add support for the more modern
|
||||
\texttt{scrypt} and the state-of-the-art, P-H-C (Password Hashing Competition)
|
||||
winner algorithm \texttt{Argon2}
|
||||
(\url{https://github.com/P-H-C/phc-winner-argon2}) for flexibility.
|
||||
|
||||
\n{2}{SQLi prevention}
|
||||
|
||||
No raw SQL queries are directly used to access the database, thus decreasing
|
||||
the likelihood of SQL injection attacks. Instead, parametric queries are
|
||||
constructed in code using a graph-like API of the \texttt{ent} library, which
|
||||
is attended to in-depth in Section~\ref{sec:dbschema}.
|
||||
|
||||
|
||||
\n{2}{Configurability}
|
||||
|
||||
Virtually any important value in the program has been made into a configuration
|
||||
value, so that the operator can customise the experience as needed. A choice of
|
||||
sane configuration defaults was attempted, which resulted in the configuration
|
||||
file essentially only needing to contain secrets, unless there is a need to
|
||||
override the defaults. It is not entirely \emph{zero-config} situation, rather
|
||||
a \emph{minimal-config} one. An example can be seen in
|
||||
Section~\ref{sec:configuration}.
|
||||
|
||||
|
||||
\n{2}{Embedded assets}
|
||||
|
||||
An important thing to mention is embedded assets and templates. Go has multiple
|
||||
mechanisms to natively embed arbitrary files directly into the binary during
|
||||
the regular build process. The built-in \texttt{embed} package was used to
|
||||
bundle all template files and web assets, such as images, logos and stylesheets
|
||||
at the package level, and these are also the passed around the application as
|
||||
needed. There is also a toggle in the application configuration, which can
|
||||
instruct the program at start to either rely entirely on embedded assets or
|
||||
pull live files from the filesystem. The former option makes the application
|
||||
more portable, while the latter allows for flexibility not only during
|
||||
development. Basically, any important value in the program has been made into a
|
||||
configuration value, so that the operator can customise the experience as
|
||||
needed. A choice of sane configuration defaults was attempted, which resulted
|
||||
in the configuration file essentially only needing to contain secrets, unless
|
||||
there is a need to override the defaults. It is not entirely \emph{zero-config}
|
||||
situation, rather a \emph{minimal-config} one.
|
||||
needed.
|
||||
|
||||
There is also a toggle in the application configuration, which can instruct the
|
||||
program at start to either rely entirely on embedded assets or pull live files
|
||||
from the filesystem. The former option makes the application more portable,
|
||||
while the latter allows for flexibility not only during development.
|
||||
|
||||
|
||||
\n{2}{Composability}
|
||||
|
||||
Templates used for rendering of the web pages were created in a composable
|
||||
manner, split into smaller, reusable parts, such as \texttt{footer.tmpl} and
|
||||
@ -924,6 +1102,9 @@ performed ergonomically and directly using Echo's built-in facilities. A
|
||||
popular HTML sanitiser \emph{bluemonday} has been employed to aid with battling
|
||||
XSS.
|
||||
|
||||
|
||||
\n{2}{Server-side rendering}
|
||||
|
||||
The application constructs the web pages entirely server-side and it runs
|
||||
without a single line of JavaScript, of which the author is especially proud.
|
||||
It improves load times, decreases attack surface, increases maintainability and
|
||||
@ -933,12 +1114,8 @@ updates (where \texttt{PUT}s should be used) and the accompanying frequent
|
||||
full-page refreshes, but that still is not enough to warrant the use of
|
||||
JavaScript.
|
||||
|
||||
As an aside, the author has briefly experimented with WebAssembly for this
|
||||
project, but has ultimately scrapped the functionality in favour of the
|
||||
entirely server-side rendered one. It is possible that it would get revisited
|
||||
if the client-side dynamic functionality was necessary and performance
|
||||
mattered. Even from the short experiments it was obvious how much faster
|
||||
WebAssembly was compared to JavaScript.
|
||||
|
||||
\n{2}{Frontend}
|
||||
|
||||
Frontend-side, the application was styled using TailwindCSS, which promotes
|
||||
using of flexible \emph{utility-first} classes in the markup (HTML) instead of
|
||||
@ -950,61 +1127,112 @@ need to be parsed by Tailwind in order to construct its final stylesheet and
|
||||
there is also an original CLI tool for that called \texttt{tailwindcss}.
|
||||
Overall, simple and accessible layouts had preference over convoluted ones.
|
||||
|
||||
\n{3}{Frontend experiments}
|
||||
|
||||
As an aside, the author has briefly experimented with WebAssembly for this
|
||||
project, but has ultimately scrapped the functionality in favour of the
|
||||
entirely server-side rendered one. It is possible that it would get revisited
|
||||
if the client-side dynamic functionality was necessary and performance
|
||||
mattered. Even from the short experiments it was obvious how much faster
|
||||
WebAssembly was compared to JavaScript.
|
||||
|
||||
|
||||
\newpage
|
||||
\n{2}{User isolation}
|
||||
|
||||
Users are allowed into certain parts of the application based on the role they
|
||||
currently posses. For the moment, two basic roles were envisioned, while this
|
||||
list might get amended in the future, should the need arise:
|
||||
|
||||
\begin{itemize}
|
||||
\item Administrator
|
||||
\item User
|
||||
\end{itemize}
|
||||
|
||||
\obr{Application use case diagram}{fig:usecasediagram}{.9}{graphics/pcmt-use-case.pdf}
|
||||
|
||||
It is paramount that the program protects itself from the insider threats as
|
||||
well and therefore each role is only able to perform actions that it is
|
||||
explicitly assigned. While there definitely is certain overlap between the
|
||||
capabilities of the two outlined roles, each also possesses unique features
|
||||
that the other one does not.
|
||||
|
||||
For example, the administrator role is not able to perform searches on the
|
||||
breach data directly using their administrator account, for that a separate
|
||||
user account has to be devised. Similarly, the regular user is not able to
|
||||
manage breach lists and other users, because that is a privileged operation.
|
||||
|
||||
In-application administrators are not able to view sensitive (any) user data
|
||||
and should therefore only be able to perform the following actions:
|
||||
|
||||
\begin{itemize}
|
||||
\item Create user accounts
|
||||
\item View list of users
|
||||
\item View user email
|
||||
\item Change user email
|
||||
\item Toggle whether user is an administrator
|
||||
\item Delete user accounts
|
||||
\end{itemize}
|
||||
|
||||
Let us consider a case when a user manages self, while demoting from
|
||||
administrator to a regular user is permitted, promoting self to be an
|
||||
administrator would constitute a \emph{privilege escalation} and likely be a
|
||||
precursor to at least a \emph{denial of service} of sorts.
|
||||
|
||||
|
||||
\n{2}{Zero trust principle}
|
||||
|
||||
\textit{Confidentiality, i.e.\ not trusting the provider}
|
||||
|
||||
There is no way for the application (and consequently, the in-application
|
||||
administrator) to read user's data. This is possible by virtue of encrypting
|
||||
the pertinent data before saving them in the database by a state-of-the-art
|
||||
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which is in
|
||||
turn safely stored encrypted by a passphrase that only the user controls. Of
|
||||
course, the user-supplied password is run by a password based key derivation
|
||||
function (PBKDF: a key derivation function with a sliding computational cost)
|
||||
before letting it encrypt the \emph{age} key.
|
||||
|
||||
The \emph{age} key is only generated when the user changes their password for
|
||||
the first time to prevent scenarios such as in-application administrator with
|
||||
access to physical database being able to both \textbf{recover} the key from
|
||||
the database and \textbf{decrypt} it given that they already know the user
|
||||
password (because they set it), which would subsequently give them unbounded
|
||||
access to any future encrypted data, as long as they would be able to maintain
|
||||
their database access. This is why the \emph{age} key generation and protection
|
||||
are bound to the first password change. Of course, the evil administrator could
|
||||
just perform the change themselves; however, the user would at least be able to
|
||||
find those changes in the activity logs and know not to use the application.
|
||||
But given the scenario of a total database compromise, the author finds all
|
||||
hope is already lost at that point. At least when the database is dumped, it
|
||||
only contains non-sensitive, functional information in plain test, everything
|
||||
else should be encrypted.
|
||||
|
||||
Consequently, both the application operators and the in-application
|
||||
administrators should never be able to learn the details of what the user is
|
||||
tracking, the same being applicable even to potential attackers with direct
|
||||
access to the database. Thus the author maintains that every scenario that
|
||||
could potentially lead to a data breach (apart from a compromised user machine
|
||||
and the like) would have to entail some form of operating memory acquisition,
|
||||
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
|
||||
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
|
||||
|
||||
|
||||
\n{1}{Implementation}
|
||||
|
||||
\n{2}{Configuration}
|
||||
|
||||
Every non-trivial program usually offers at least \emph{some} way to
|
||||
tweak/manage its behaviour, and these changes are usually persisted
|
||||
\emph{somewhere} on the filesystem of the host: in a local SQLite3 database, a
|
||||
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
||||
configuration file. These configuration files need to be read and checked at
|
||||
least on program start-up and either stored into operating memory for the
|
||||
duration of the runtime of the program, or loaded and parsed and the memory
|
||||
subsequently \emph{freed} (initial configuration).
|
||||
|
||||
There is an abundance of configuration languages (or file formats used to craft
|
||||
configuration files, whether they were intended for it or not) available, TOML,
|
||||
INI, JSON, YAML, to name some of the popular ones (as of today).
|
||||
|
||||
Dhall stood out as a language that was designed with both security and the
|
||||
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
||||
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
||||
few of its concepts from Haskell), and in its apparent core being very similar
|
||||
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
||||
is: ``a programmable configuration language that you can think of as: JSON +
|
||||
functions + types + imports''~\cite{dhalllang}.
|
||||
|
||||
Among all of the listed features, the especially intriguing one to the author
|
||||
was the promise of \emph{types}. There are multiple examples directly on the
|
||||
project's documentation webpage demonstrating for instance the declaration and
|
||||
usage of custom types (that are, of course merely combinations of the primitive
|
||||
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
||||
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
||||
designing a custom configuration \emph{schema} for the program.
|
||||
Dhall not being a Turing-complete language also guarantees that evaluation
|
||||
\emph{always} terminates eventually, which is a good attribute to possess as a
|
||||
configuration language.
|
||||
|
||||
|
||||
\n{3}{Dhall Schema}
|
||||
\n{2}{Dhall Configuration Schema}\label{sec:configuration}
|
||||
|
||||
The configuration schema was at first being developed as part of the main
|
||||
project's repository, before it was determined that it would benefit both the
|
||||
development and overall clarity if the schema lived in its own repository (see
|
||||
Section~\ref{sec:repos} for details). This enabled it to be independently
|
||||
developed and versioned, and only pulled into the main application whenever it
|
||||
is determined the application is ready for it.
|
||||
Section~\ref{sec:repos} for details). This now enables the schema to be
|
||||
independently developed and versioned, and only pulled into the main
|
||||
application whenever the application is determined to be ready for it.
|
||||
|
||||
The full schema with type annotations can be seen in Listing~\ref{dhallschema}.
|
||||
The \texttt{let} statement declares a variable called \texttt{Schema} and
|
||||
assigns it the result of the expression on the right side of the equals sign,
|
||||
which has for practical reasons been trimmed and is displayed without the
|
||||
\emph{default} block, which is instead shown in its own
|
||||
Listing~\ref{dhallschemadefaults}.
|
||||
|
||||
\vspace{\parskip}
|
||||
% \vspace{\parskip}
|
||||
\smallskip
|
||||
% \vspace{\baselineskip}
|
||||
\begin{lstlisting}[language=Haskell, caption={Dhall configuration schema version 0.0.1-rc.2},
|
||||
label=dhallschema, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
||||
let Schema =
|
||||
@ -1055,8 +1283,16 @@ let Schema =
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The main configuration is comprised of both raw attributes and child records,
|
||||
which allow for grouping of related functionality. For instance, configuration
|
||||
Full schema with type annotations can be seen in Listing~\ref{dhallschema}.
|
||||
|
||||
The \texttt{let} statement declares a variable called \texttt{Schema} and
|
||||
assigns to it the result of the expression on the right side of the equals
|
||||
sign, which has for practical reasons been trimmed and is displayed without the
|
||||
\emph{default} block. The default block is instead shown in its own
|
||||
Listing~\ref{dhallschemadefaults}.
|
||||
|
||||
The main configuration comprises both raw attributes and child records, which
|
||||
allow for grouping of related functionality. For instance, configuration
|
||||
settings pertaining mailserver setup are grouped in a record named
|
||||
\textbf{Mailer}. Its attribute \textbf{Enabled} is annotated as \textbf{Bool},
|
||||
which was deemed appropriate for a on-off switch-like functionality, with the
|
||||
@ -1067,10 +1303,19 @@ while \textbf{true} is evaluated as an \emph{unbound} variable, that is, a
|
||||
variable \emph{not} defined in the current \emph{scope} and thus not
|
||||
\emph{present} in the current scope.
|
||||
|
||||
\vspace{\parskip}
|
||||
Another one of Dhall specialties is that `$==$' and `$!=$' (in)equality
|
||||
operators \textbf{only} work on values of type \texttt{Bool}, which for example
|
||||
means that variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text}
|
||||
(\texttt{string}) cannot be compared directly as in other languages, which
|
||||
either leaves the work for a higher-level language (such as Go), or from the
|
||||
perspective of the Dhall authors, \emph{enums} are promoted when the value
|
||||
matters.
|
||||
|
||||
\newpage
|
||||
% \vspace{\parskip}
|
||||
\begin{lstlisting}[language=Haskell, caption={Dhall configuration defaults for
|
||||
schema version 0.0.1-rc.2},
|
||||
label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
||||
label=dhallschemadefaults, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
||||
, default =
|
||||
-- | have sane defaults.
|
||||
{ Host = ""
|
||||
@ -1122,8 +1367,7 @@ label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
||||
, Init =
|
||||
{ CreateAdmin =
|
||||
-- | if this is True, attempt to create a user with admin
|
||||
-- | privileges with the password specified below (or better -
|
||||
-- | overriden); it fails if users already exist in the DB.
|
||||
-- | privileges with the password specified below
|
||||
False
|
||||
, AdminPassword =
|
||||
-- | used for the first admin, forced change on first login.
|
||||
@ -1135,71 +1379,9 @@ label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
||||
|
||||
in Schema
|
||||
\end{lstlisting}
|
||||
|
||||
Another one of specialties of Dhall is that $==$ and $!=$ equality operators
|
||||
only work on values of type \texttt{Bool}, which for example means that
|
||||
variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text}
|
||||
(\texttt{string}) cannot be compared directly as in other languages, which
|
||||
either leaves the work for a higher-level language (such as Go), or from the
|
||||
perspective of the Dhall authors, \emph{enums} are promoted when the value
|
||||
matters.
|
||||
|
||||
|
||||
\n{3}{Safety considerations}
|
||||
|
||||
Having a programmable configuration language that understands functions and
|
||||
allows importing not only arbitrary text from random internet URLs, but also
|
||||
importing and \emph{evaluating} (i.e.\ running) potentially untrusted code, it
|
||||
is important that there are some safety mechanisms employed, which can be
|
||||
relied on by the user. Dhall offers this in multiple features: enforcing a
|
||||
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
||||
of the expression being imported.
|
||||
|
||||
|
||||
\n{3}{Possible alternatives}
|
||||
|
||||
While developing the program, the author has also
|
||||
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
||||
cache}, which can generally be observed in the scenario of running the program
|
||||
in an environment that does not allow to write the cache files (a read-only
|
||||
filesystem), of does not keep the written cache files, such as a container that
|
||||
is not configured to mount a persistent volume at the pertinent location.
|
||||
|
||||
To describe the way Dhall works when performing an evaluation, it resolves
|
||||
every expression down to a combination of its most basic types (eliminating all
|
||||
abstraction and indirection) in the process called
|
||||
\textbf{normalisation}~\cite{dhallnorm} and then saves this result in the
|
||||
host's cache. The \texttt{dhall-haskell} binary attempts to resolve the
|
||||
variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
|
||||
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
|
||||
results of the normalisation will be written for repeated use. Do note that
|
||||
this behaviour has been observed on a GNU/Linux host and the author has not
|
||||
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
||||
|
||||
If normalisation is performed inside an ephemeral container (as opposed to, for
|
||||
instance, an interactive desktop session), the results effectively get lost on
|
||||
each container restart, which is both wasteful and not great for user
|
||||
experience, since the normalisation of just a handful of imports (which
|
||||
internally branches widely) can take an upwards of two minutes, during which
|
||||
the user is left waiting for the hanging application with no reporting on the
|
||||
progress or current status.
|
||||
|
||||
While workarounds for the above mentioned problem can be devised relatively
|
||||
easily (such as bind mounting persistent volumes inside the container in place
|
||||
of the \texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
|
||||
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} to preserve the cache between
|
||||
restarts, or let the cache be pre-computed during container build, since the
|
||||
application is only really expected to run together with a compatible version
|
||||
of the configuration schema and this version \emph{is} known at container build
|
||||
time), it would certainly feel better if there was no need to work
|
||||
\emph{around} the configuration system of choice.
|
||||
|
||||
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
||||
as a potentially almost drop-in replacement for Dhall feature-wise, while also
|
||||
resolving costly \emph{cold cache} normalisation operations, which is in
|
||||
author's view Dhall's titular issue.
|
||||
|
||||
|
||||
\vspace*{-\baselineskip}
|
||||
\vspace*{-\baselineskip}
|
||||
\vspace*{-\baselineskip}
|
||||
\n{2}{Data integrity and authenticity}
|
||||
|
||||
The user can interact with the application via a web client, such as a browser,
|
||||
@ -1243,194 +1425,20 @@ e.g.\ for tamper protection purposes and similar; however, that work remains
|
||||
yet to be materialised.
|
||||
|
||||
|
||||
\n{2}{User isolation}
|
||||
|
||||
Users are allowed into certain parts of the application based on the role they
|
||||
currently posses. For the moment, two basic roles were envisioned, while this
|
||||
list might get amended in the future, if the need arises:
|
||||
|
||||
\begin{itemize}
|
||||
\item Administrator
|
||||
\item User
|
||||
\end{itemize}
|
||||
|
||||
It is paramount that the program protects itself from the insider threats as
|
||||
well and therefore each role is only able to perform actions that it is
|
||||
explicitly assigned. While there definitely is certain overlap between the
|
||||
capabilities of the two outlined roles, each also possesses unique features
|
||||
that the other does not.
|
||||
|
||||
For example, the administrator role is not able to perform searches on the
|
||||
breach data directly using their administrator account, for that a separate
|
||||
user account has to be devised. Similarly, the regular user is not able to
|
||||
manage breach lists and other users, because that is a privileged operation.
|
||||
|
||||
In-application administrators are not able to view sensitive (any) user data
|
||||
and should therefore only be able to perform the following actions:
|
||||
|
||||
\begin{itemize}
|
||||
\item Create user accounts
|
||||
\item View list of users
|
||||
\item View user email
|
||||
\item Change user email
|
||||
\item Change user email
|
||||
\item Toggle whether user is an administrator
|
||||
\item Delete user accounts
|
||||
\end{itemize}
|
||||
|
||||
Let us consider a case when a user manages self, while demoting from
|
||||
administrator to a regular user is permitted, promoting self to be an
|
||||
administrator would constitute a \emph{privilege escalation} and likely be a
|
||||
precursor to at least a \emph{denial of service} of sorts.
|
||||
|
||||
|
||||
\n{2}{Zero trust principle}
|
||||
|
||||
\textit{Data confidentiality, i.e.\ not trusting the provider}
|
||||
|
||||
There is no way for the application (and consequently, the in-application
|
||||
administrator) to read user's data. This is possible by virtue of encrypting
|
||||
the pertinent data before saving them in the database by a state-of-the-art
|
||||
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which in
|
||||
turn is safely stored encrypted by a passphrase that only the user controls. Of
|
||||
course, the user-supplied password is run by a password based key derivation
|
||||
function (PBKDF: a key derivation function with a sliding computational cost)
|
||||
before letting it encrypt the \emph{age} key.
|
||||
|
||||
The \emph{age} key is only generated when the user changes their password for
|
||||
the first time to prevent scenarios such as in-application administrator with
|
||||
access to physical database being able to both \textbf{recover} the key from
|
||||
the database and \textbf{decrypt} it given that they already know the user
|
||||
password (because they set it), which would subsequently give them unbounded
|
||||
access to any future encrypted data, as long as they would be able to maintain
|
||||
their database access. This is why the \emph{age} key generation and protection
|
||||
are bound to the first password change. Of course, the evil administrator could
|
||||
just perform the change themselves; however, the user would at least be able to
|
||||
find those changes in the activity logs and know not to use the application.
|
||||
But given the scenario of a total database compromise, the author finds all
|
||||
hope is already lost at that point. At least when the database is dumped, it
|
||||
only contains non-sensitive, functional information in plain test, everything
|
||||
else should be encrypted.
|
||||
|
||||
Consequently, both the application operators and the in-application
|
||||
administrators should never be able to learn the details of what the user is
|
||||
tracking, the same being applicable even to potential attackers with direct
|
||||
access to the database. Thus the author maintains that every scenario that
|
||||
could potentially lead to a data breach (apart from a compromised user machine
|
||||
and the like) would have to entail some form of operating memory acquisition,
|
||||
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
|
||||
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
|
||||
|
||||
|
||||
\n{2}{Compromise Monitoring}
|
||||
|
||||
\n{3}{Have I Been Pwned? Integration}
|
||||
|
||||
Troy Hunt's Have I Been Pwned? online service
|
||||
(\url{https://haveibeenpwned.com/}) has been chosen as the online source of
|
||||
compromised data. The service offers private APIs that are protected by API
|
||||
keys. The application's \texttt{hibp} module and database representation models
|
||||
the values returned by this API, which allows searching in large breaches using
|
||||
email addresses.\\
|
||||
The architecture there is relatively simple: the application administrator
|
||||
configures an API key for HIBP, the user enters the query parameters, the
|
||||
application constructs a query and calls the API and waits for a response. As
|
||||
the API is rate-limited based on the key supplied, this can pose an issue and
|
||||
it has not been fully resolved in the UI. The application then parses the
|
||||
returned data and binds it to the local model for validation. If that goes
|
||||
well, the data is saved into the database as a cache and the search query is
|
||||
performed on the saved data. If it returns anything, it is displayed to the
|
||||
user for browsing.
|
||||
|
||||
|
||||
\n{3}{Local Dataset Plugin} Breach data from locally available datasets can be
|
||||
imported into the application by first making sure it adheres to the specified
|
||||
schema (have a look at the \emph{breach data schema} in
|
||||
Listing~\ref{breachDataGoSchema}). If it doesn't (which is very likely with
|
||||
random breach data), it needs to be converted to a form that does before
|
||||
importing it to the application, e.g.\ using a Python script or similar.
|
||||
Attempting to import data that does not follow the outlined schema would result
|
||||
in an error. Also, importing a dataset which is over a reasonable size limit
|
||||
would by default be rejected by the program as a precaution, since marshaling
|
||||
e.g.\ a 1 TiB document would likely result in an OOM situation on the host,
|
||||
assuming regular consumer hardware conditions, not HPC.
|
||||
|
||||
\vspace{\parskip}
|
||||
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go struct with imports from the standard library are assumed},
|
||||
label=breachDataGoSchema]
|
||||
type breachDataSchema struct {
|
||||
Name string
|
||||
Time time.Time
|
||||
IsVerified bool
|
||||
ContainsPasswords bool
|
||||
ContainsHashes bool
|
||||
HashType string
|
||||
HashSalted bool
|
||||
HashPepperred bool
|
||||
ContainsUsernames bool
|
||||
ContainsEmails bool
|
||||
Data any
|
||||
}
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
|
||||
actuality be written and supplied by the user of the program as a YAML
|
||||
document. YAML was chosen for multiple reasons: relative ease of use (plain
|
||||
text, readable, support for inclusion of comments, its capability to store
|
||||
multiple \emph{documents} inside of a single file with most of the inputs
|
||||
implicitly typed as strings while thanks to being a superset of JSON it sports
|
||||
machine readability. That should allow for documents similar to what can be
|
||||
seen in Listing~\ref{breachDataYAMLSchema} to be ingested by the program,
|
||||
read and written by humans and programs alike.
|
||||
|
||||
\smallskip
|
||||
\begin{lstlisting}[language=YAML, caption={Example Breach Data Schema supplied
|
||||
to the program as a YAML file, optionally containing multiple documents},
|
||||
label=breachDataYAMLSchema]
|
||||
---
|
||||
name: Horrible breach
|
||||
time: 2022-04-23T00:00:00Z+02:00
|
||||
isVerified: false
|
||||
containsPasswds: false
|
||||
containsHashes: true
|
||||
containsEmails: true
|
||||
hashType: md5
|
||||
hashSalted: false
|
||||
hashPeppered: false
|
||||
data:
|
||||
hashes:
|
||||
- hash1
|
||||
- hash2
|
||||
- hash3
|
||||
emails:
|
||||
- email1
|
||||
-
|
||||
- email3
|
||||
---
|
||||
# document #2, describing another breach.
|
||||
name: Horrible breach 2
|
||||
...
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
Notice how the emails list in Listing~\ref{breachDataYAMLSchema} misses one
|
||||
record, perhaps because it was not supplied or mistakenly omitted. This is a
|
||||
valid scenario (mistakes happen) and the application needs to be able to handle
|
||||
it. The alternative would be to require the user to prepare the data in such a
|
||||
way that the empty/partial records would be dropped entirely.
|
||||
|
||||
|
||||
\n{2}{Database configuration}
|
||||
\n{2}{Database schema}\label{sec:dbschema}
|
||||
|
||||
The database schema is not being created manually in the database. Instead, an
|
||||
Object-relational Mapping (ORM) tool named ent is used, which allows defining
|
||||
the table schema and relations entirely in Go.
|
||||
the table schema and relations entirely in Go. The application does not need
|
||||
for the database schema to be pre-created when the application starts, it only
|
||||
requires a connection string providing access to the database for a reasonably
|
||||
privileged user if that is the case.
|
||||
|
||||
The best part about ent is that there is no need to define supplemental methods
|
||||
on the models, since with ent these are meant to be \emph{code generated} (in
|
||||
the older sense of word, not with Large Language Models). That creates files
|
||||
with models based on the types of the attributes in the database model, and the
|
||||
The best part about \texttt{ent} is that there is no need to define
|
||||
supplemental methods on the models, as with \texttt{ent} these are meant to be
|
||||
\emph{code generated} (in the older sense of word, not with Large Language
|
||||
Models) into existence. Code generation creates files with actual Go models
|
||||
based on the types of the attributes in the database schema model, and the
|
||||
respective relations are transformed into methods on the receiver or functions
|
||||
taking object attributes as arguments.
|
||||
|
||||
@ -1449,90 +1457,164 @@ These methods can further be imported into other packages and this makes
|
||||
working with the database a morning breeze.
|
||||
|
||||
|
||||
\n{1}{Production}
|
||||
\n{1}{Deployment}
|
||||
|
||||
It is, of course, recommended that the application runs in a secure environment
|
||||
\allowbreak although definitions of that almost certainly differ depending on
|
||||
who you ask. General recommendations would be either to effectively reserve a
|
||||
machine for a single use case - running this program - so as to dramatically
|
||||
decrease the potential attack surface of the host, or run the program isolated
|
||||
in a container or a virtual machine. Further, if the host does not need
|
||||
management access (it is a deployed-to-only machine that is configured
|
||||
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
||||
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
||||
needed. In an ideal scenario, the host machine would have as little software
|
||||
installed as possible besides what the application absolutely requires.
|
||||
\textbf{TODO}: mention how \texttt{systemd} aids in running the pod.
|
||||
|
||||
A demonstration of the above can be found in the multi-stage Containerfile that
|
||||
is available in the main sources. The resulting container image only contains a
|
||||
statically linked copy of the program, a default configuration file and
|
||||
corresponding Dhall expressions cached at build time, which only support the
|
||||
main configuration file. Since the program also needs a database, an example
|
||||
scenario could include the container being run in a Podman pod together with
|
||||
the database, which would not have to be exposed from the pod and would
|
||||
therefore only be available over \texttt{localhost}.
|
||||
A deployment setup as suggested in Section~\ref{sec:deploymentRecommendations}
|
||||
is already partially covered by the multi-stage \texttt{Containerfile} that is
|
||||
available in the main sources. Once built, the resulting container image only
|
||||
contains a handful of things it absolutely needs:
|
||||
|
||||
It goes without saying that the operator should substitute values of any
|
||||
default configuration secrets with the new ones that were securely generated.
|
||||
\begin{itemize}
|
||||
\item a statically linked copy of the program
|
||||
\item a default configuration file and corresponding Dhall expressions cached
|
||||
at build time
|
||||
\item a recent CA certs bundle
|
||||
\end{itemize}
|
||||
|
||||
System-wide cryptographic policies should target highest feasible security
|
||||
level, if at all available (such as by default on Fedora or RHEL), covering
|
||||
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
|
||||
and SELinux (kernel-level mandatory access control and security policy
|
||||
mechanism) running in \emph{enforcing} mode, if available.
|
||||
Since the program also needs a database for proper functioning, an example
|
||||
scenario includes the application container being run in a Podman \textbf{pod}
|
||||
together with the database. That results in not having to expose the database
|
||||
to the entire host or out of the pod at all, it is only be available over pod's
|
||||
\texttt{localhost}.
|
||||
|
||||
It goes without saying that the default values of any configuration secrets
|
||||
should be substituted by the application operator with new, securely generated
|
||||
ones.
|
||||
|
||||
|
||||
\n{2}{Deployment recommendations}
|
||||
\n{2}{Rootless Podman}
|
||||
|
||||
\n{3}{Transport security}
|
||||
Assuming rootless Podman set up and the \texttt{just} tool installed on the
|
||||
host, the application could be deployed by following a series of relatively
|
||||
simple steps:
|
||||
|
||||
User connecting to the application should rightfully expect for their data to
|
||||
be protected \textit{in transit} (i.e.\ on the way between their browser and
|
||||
the server), which is what \emph{Transport Layer Security} family of
|
||||
protocols~\cite{tls13rfc8446} was designed for, and which is the underpinning
|
||||
of HTTPS. TLS utilises the primitives of asymmetric cryptography to let the
|
||||
client authenticate the server (verify that it is who it claims it is) and
|
||||
negotiate a symmetric key for encryption in the process named the \emph{TLS
|
||||
handshake} (see Section~\ref{sec:tls} for more details), the final purpose of
|
||||
which is establishing a secure communications connection. The operator should
|
||||
configure the program to either directly utilise TLS using configuration or
|
||||
have it listen behind a TLS-terminating \emph{reverse proxy}.
|
||||
\begin{itemize}
|
||||
\item build (or pull) the application container image
|
||||
\item create a pod with user namespacing, exposing the application port
|
||||
\item run the database container inside the pod
|
||||
\item run the application inside the pod
|
||||
\end{itemize}
|
||||
|
||||
In concrete terms, it would resemble something along the lines of
|
||||
Listing~\ref{podmanDeployment}. Do note that all the commands are executed
|
||||
under the unprivileged \texttt{user@containerHost} that is running rootless
|
||||
Podman, i.e.\ it has \texttt{UID}/\texttt{GID} mapping entries in
|
||||
\texttt{/etc/setuid} and \texttt{\etc/setgid} files \textbf{prior} to running any
|
||||
Podman commands.
|
||||
|
||||
\n{3}{Containerisation}
|
||||
Whether the pre-built or a custom container image is used to deploy the
|
||||
application, it still needs access to secrets, such as database connection
|
||||
string (containing database host, port, user, password/encrypted password,
|
||||
authentication method and database name).
|
||||
% \newpage
|
||||
|
||||
Currently, the application is able to handle \emph{peer}, \emph{scram-sha-256},
|
||||
\emph{user name maps} and raw \emph{password} as Postgres authentication
|
||||
methods~\cite{pgauthmethods}, although the \emph{password} option should not be
|
||||
used in production, \emph{unless} the connection to the database is protected
|
||||
by TLS.\ In any case, using the \emph{scram-sha-256}~\cite{scramsha256rfc7677}
|
||||
method is preferable and one way to verify in development environment that
|
||||
everything works as intended is the \emph{Password generator for PostgreSQL}
|
||||
tool~\cite{goscramsha256}, which allows to get the encrypted string from a raw
|
||||
user input.
|
||||
\begin{lstlisting}[language=bash, caption={Example application deployment using
|
||||
rootless Podman},
|
||||
label=podmanDeployment, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||
# From inside the project folder, build the image locally using kaniko.
|
||||
just kaniko
|
||||
|
||||
If the application running in a container wants to use the \emph{peer}
|
||||
authentication method, it is up to the operator to supply the Postgres socket
|
||||
to the application (e.g.\ as a volume bind mount). This scenario was not
|
||||
tested; however, and the author is also not entirely certain how \emph{user
|
||||
namespaces} (on GNU/Linux) would influence the process (given that the
|
||||
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
||||
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
||||
need to account.
|
||||
# Create a pod.
|
||||
podman pod create --userns=keep-id -p3005:3000 --name pcmt
|
||||
|
||||
Equally, if the application is running inside the container, the operator needs
|
||||
to make sure that the database is either running in a network that is also
|
||||
directly attached to the container or that there is a mechanism in place that
|
||||
routes the requests for the database hostname to the destination.
|
||||
# Run the database in the pod.
|
||||
podman run --pod pcmt --replace -d --name "pcmt-pg" --rm \
|
||||
-e POSTGRES_INITDB_ARGS="--auth-host=scram-sha-256 \
|
||||
--auth-local=scram-sha-256" \
|
||||
-e POSTGRES_PASSWORD=postgres -v $PWD/tmp/db:/var/lib/postgresql/data \
|
||||
docker.io/library/postgres:15.2-alpine3.17
|
||||
|
||||
One such mechanism is container name based routing inside \emph{pods}
|
||||
(Podman/Kubernetes), where the resolution of container names is the
|
||||
responsibility of a specially configured piece of software called Aardvark for
|
||||
the former and CoreDNS for the latter.
|
||||
# Run the application in the pod.
|
||||
podman run --pod pcmt --replace --name pcmt-og -d --rm \
|
||||
-e PCMT_LIVE=False \
|
||||
-e PCMT_DBTYPE="postgres" \
|
||||
-e PCMT_CONNSTRING="host=pcmt-pg port=5432 sslmode=disable \
|
||||
user=postgres dbname=postgres password=postgres"
|
||||
-v $PWD/config.dhall:/config.dhall:ro \
|
||||
docker.io/immawanderer/pcmt:testbuild -config /config.dhall
|
||||
\end{lstlisting}
|
||||
|
||||
To summarise Listing~\ref{podmanDeployment}, first, the application
|
||||
container is built from inside the project folder using \texttt{kaniko}.
|
||||
Alternatively, the container image could be pulled from the container
|
||||
repository, but it makes more sense showing the image being built from sources
|
||||
since the listing depicts a \texttt{:testbuild} tag being used.
|
||||
|
||||
Next, a \emph{pod} is created and given a name, setting the port binding for
|
||||
the application. Then, the database container is started inside the pod.
|
||||
|
||||
As a final step, the application container itself is run inside the pod. The application configuration named \texttt{config.dhall} located in
|
||||
\texttt{\$PWD} is mounted as a volume into container's \texttt{/config.dhall},
|
||||
providing the application with a default configuration. The default container
|
||||
does contain a default configuration for reference, however, running the
|
||||
container as is without additional configuration would fail as it does not
|
||||
contain the necessary secrets.
|
||||
|
||||
\n{3}{Sanity checks}
|
||||
|
||||
Do also note that the application connects to the database using its
|
||||
\emph{container} name, i.e.\ not the IP address. That is possible thanks to
|
||||
Podman setting up DNS inside the pod in such a way that all containers in the
|
||||
pod can reach each other using their (container) names. Interestingly,
|
||||
connecting via \texttt{localhost} would also work, as from inside the pod, any
|
||||
container in the pod can reach any other container in the same pod via pod's
|
||||
\texttt{localhost}.
|
||||
In fact, \emph{pinging} the database or application containers from an ad-hoc
|
||||
\texttt{alpine} container added to the pod yields:
|
||||
|
||||
\vspace{\parskip}
|
||||
\begin{lstlisting}[language=bash, caption={Pinging pod containers using their
|
||||
names}, label=podmanPing, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||
user@containerHost % podman run --rm -it --user=0 --pod=pcmt \
|
||||
docker.io/library/alpine:3.18
|
||||
/ # ping -c2 pcmt-og
|
||||
PING pcmt-og (127.0.0.1): 56 data bytes
|
||||
64 bytes from 127.0.0.1: seq=0 ttl=42 time=0.072 ms
|
||||
64 bytes from 127.0.0.1: seq=1 ttl=42 time=0.118 ms
|
||||
|
||||
--- pcmt-og ping statistics ---
|
||||
2 packets transmitted, 2 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.072/0.095/0.118 ms
|
||||
/ # ping -c2 pcmt-pg
|
||||
PING pcmt-pg (127.0.0.1): 56 data bytes
|
||||
64 bytes from 127.0.0.1: seq=0 ttl=42 time=0.045 ms
|
||||
64 bytes from 127.0.0.1: seq=1 ttl=42 time=0.077 ms
|
||||
|
||||
--- pcmt-pg ping statistics ---
|
||||
2 packets transmitted, 2 packets received, 0% packet loss
|
||||
round-trip min/avg/max = 0.045/0.061/0.077 ms
|
||||
/ #
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The pod created in Listing~\ref{podmanDeployment} only set the binding for a
|
||||
port used by the application (\texttt{5005/tcp}). The Postgres default port
|
||||
\texttt{5432/tcp} is not among pod's port bindings, as can be seen in the pod
|
||||
creation command. This can also easily be verified using the command in
|
||||
Listing~\ref{podmanPortBindings}:
|
||||
|
||||
\begin{lstlisting}[language=bash, caption={Podman pod port bindings},
|
||||
label=podmanPortBindings, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||
user@containerHost % podman pod inspect pcmt \
|
||||
--format="Port bindings: {{.InfraConfig.PortBindings}}\n\
|
||||
Host network: {{.InfraConfig.HostNetwork}}"
|
||||
Port bindings: map[3000/tcp:[{ 5005}]]
|
||||
Host network: false
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
To be absolutely sure, trying to connect to the database from outside of the
|
||||
pod (i.e. from the container host) should \emph{fail}, unless, of course, there
|
||||
is another process listening on that port:
|
||||
|
||||
\begin{lstlisting}[language=bash, caption={In-pod database is unreachable from
|
||||
the host}, breaklines=true, label=podDbUnreachable,
|
||||
basicstyle=\linespread{0.9}\small\ttfamily]
|
||||
user@containerHost % curl localhost:5432
|
||||
--> curl: (7) Failed to connect to localhost port 5432 after 0 ms: Couldn't connect to server
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The error in Listing~\ref{podDbUnreachable} is expected, as it is the result of
|
||||
the database port not been exposed from the pod.
|
||||
|
||||
|
||||
\n{1}{Validation}
|
||||
@ -1541,19 +1623,15 @@ the former and CoreDNS for the latter.
|
||||
|
||||
Unit testing is a hot topic for many people and the author does not count
|
||||
himself to be a staunch supporter of neither extreme. The ``no unit tests''
|
||||
seems to discount any benefit there is to unit testing, while a `` TDD-only''
|
||||
(TDD, or Test Driven Development is a development methodology whereby tests are
|
||||
written first, then a complementary piece of code that is supposed to be
|
||||
tested, just enough to get past the compile errors and to see the test fail,
|
||||
then the code is refactored to make the test pass and then it can be fearlessly
|
||||
extended because the test is the safety net catching us when the user slips and
|
||||
alters the originally intended behaviour) approach can be a little too much for
|
||||
some people's taste. The author tends to sport a \emph{middle ground} approach
|
||||
here, with writing enough tests where meaningful but not necessarily testing
|
||||
everything or writing tests prior to code, although arguably that practice
|
||||
should result in writing a \emph{better} designed code, particularly because
|
||||
there has to be a prior though about it because it needs to be tested
|
||||
\emph{first}.
|
||||
seems to discount any benefit there is to unit testing, while a ``
|
||||
TDD-only''\footnotemark{} approach can be a little too much for some people's
|
||||
taste. The author tends to prefer a \emph{middle ground} approach in this
|
||||
particular case, i.e. writing enough tests where meaningful but not necessarily
|
||||
testing everything or writing tests prior to business logic code. Arguably,
|
||||
following the practice of TDD should result in writing a \emph{better designed}
|
||||
code, particularly because there needs to be a prior thought about the shape
|
||||
and function of the code, as it is tested for before it is even written, but it
|
||||
adds an slight inconvenience to what is otherwise a straightforward process.
|
||||
|
||||
Thanks to Go's built in support for testing via its \texttt{testing} package
|
||||
and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go
|
||||
@ -1578,6 +1656,15 @@ informing the developer that no tests were found, which is handy to learn if it
|
||||
was not intended/expected. When compiling regular source code, the Go files
|
||||
with \texttt{\_test} in the name are simply ignored by the build tool.
|
||||
|
||||
\footnotetext{TDD, or Test Driven Development, is a development methodology
|
||||
whereby tests are written \emph{first}, then a complementary piece of code
|
||||
that is supposed to be tested is added, just enough to get past the compile
|
||||
errors and to see the test \emph{fail} and then is the code finally
|
||||
refactored to make the test \emph{pass}. The code can then be fearlessly
|
||||
extended because the test is the safety net catching the programmer when the
|
||||
mind slips and alters the originally intended behaviour of the code.}
|
||||
|
||||
|
||||
\n{2}{Integration tests}
|
||||
|
||||
Integrating with external software, namely the database in case of this
|
||||
@ -1724,26 +1811,29 @@ by \emph{Let's Encrypt}\allowbreak issued, short-lived, ECDSA
|
||||
a testing instance; therefore, limits to prevent abuse might be imposed.
|
||||
|
||||
|
||||
\n{3}{Deployment validation}
|
||||
|
||||
TODO: show the results of testing the app in prod using
|
||||
\url{https://testssl.sh/}.
|
||||
|
||||
|
||||
% =========================================================================== %
|
||||
\nn{Conclusion}
|
||||
|
||||
The objectives of the thesis have been to create the Password Compromise
|
||||
Monitoring Tool aimed at security-conscious user in order to validate their
|
||||
assumptions on the security of their credentials. The thesis opened by
|
||||
introducing common terminology and continued with a dive into cryptography
|
||||
topics such as encryption, Diffie-Hellman key distribution scheme and briefly
|
||||
mentioned TLS. Furthermore, it discussed the inner workings of browsers and the
|
||||
protocols that underpin them.
|
||||
assumptions on the security of their credentials. The thesis opened by diving
|
||||
into cryptography topics such as encryption and briefly mentioned TLS.
|
||||
|
||||
Additionally, security mechanisms such as Site Isolation and Content Security
|
||||
Policy, commonly employed by mainstream browsers of today, were
|
||||
introduced and the reader learnt how Content Security Policy is easily and
|
||||
dynamically configured.
|
||||
Policy, commonly employed by mainstream browsers of today, were introduced and
|
||||
the reader learnt how Content Security Policy is easily and dynamically
|
||||
configured.
|
||||
|
||||
An extensive body of the thesis then revolved around the practical part,
|
||||
describing everything from tooling used through application high-level-view
|
||||
architecture to implementation of specific parts of the application across the
|
||||
stack.
|
||||
describing everything from tooling used through high-level view of
|
||||
application's architecture to implementation of specific parts of the
|
||||
application across the stack.
|
||||
|
||||
Finally, the practical part concluded by broadly depicting validation
|
||||
methods used to verify if the application worked correctly.
|
||||
|
Reference in New Issue
Block a user