tex: add extensive changes
This commit is contained in:
parent
b8dcac8235
commit
39428c908a
BIN
graphics/pcmt-use-case.pdf
Normal file
BIN
graphics/pcmt-use-case.pdf
Normal file
Binary file not shown.
@ -51,4 +51,70 @@ Blake3:\\
|
|||||||
SHA3-256:\\
|
SHA3-256:\\
|
||||||
\texttt{66ebbdb20b5459360368d29615e6e80f36bcf464d5519ca08ae651f27a8970bf}\\
|
\texttt{66ebbdb20b5459360368d29615e6e80f36bcf464d5519ca08ae651f27a8970bf}\\
|
||||||
|
|
||||||
|
|
||||||
|
\priloha{Whys}\label{appendix:whys}
|
||||||
|
|
||||||
|
This appendix is concerned with explaining why certain technologies were used.
|
||||||
|
|
||||||
|
\n{2}{Why Go}\label{appendix:whygo}
|
||||||
|
|
||||||
|
First, a question of \textit{`Why pick Go for building a web application?'}
|
||||||
|
might arise, so the following few lines will try to address that.
|
||||||
|
|
||||||
|
Go~\cite{golang}, or \emph{Golang} for SEO-friendliness and disambiguating Go
|
||||||
|
the ancient game, is a strongly typed, high-level \emph{garbage-collected}
|
||||||
|
language where functions are first-class citizens and errors are values.
|
||||||
|
|
||||||
|
The appeal for the author comes from a number of features of the language, such
|
||||||
|
as built-in support for concurrency and unit testing, sane \emph{zero} values,
|
||||||
|
lack of pointer arithmetic, inheritance and implicit type conversions,
|
||||||
|
easy-to-read syntax, producing a statically linked binary by default, etc., on
|
||||||
|
top of that, the language has got a cute mascot. Thanks to the foresight of the
|
||||||
|
Go Authors regarding \emph{the formatting question} (i.e.\ where to put the
|
||||||
|
braces, \textbf{tabs vs.\ spaces}, etc.), most of the discussions on this topic
|
||||||
|
have been foregone. Every \emph{gopher}\footnote{euph.\ a person writing in the
|
||||||
|
Go programming language} is expected to format their source code with the
|
||||||
|
official formatter (\texttt{gofmt}), which automatically ensures that the code
|
||||||
|
adheres to the one formatting standard. Then, there is \emph{The Promise} of
|
||||||
|
backwards compatibility for Go 1.x, which makes it a good choice for long-term
|
||||||
|
without the fear of being rug-pulled.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Why Nix/devenv}\label{appendix:whynix}
|
||||||
|
|
||||||
|
Nix (\url{https://builtwithnix.org/}) is a functional programming language
|
||||||
|
resembling Haskell and a declarative package manager, which has been used in
|
||||||
|
this project in the form of \texttt{devenv} tool (\url{https://devenv.sh/}) to
|
||||||
|
create \textbf{declarable} and \textbf{reproducible} development environment.
|
||||||
|
The author has previously used Nix directly with \emph{flakes} and liked
|
||||||
|
\texttt{devenv}, as it effectively exposed only a handful of parameters for
|
||||||
|
configuration, and rid of the need to manage the full flake, which is of course
|
||||||
|
still an option for people who choose so. See \texttt{devenv.nix} in the
|
||||||
|
repository root.
|
||||||
|
|
||||||
|
|
||||||
|
\priloha{Terminology}\label{appendix:terms}
|
||||||
|
|
||||||
|
\n{2}{Linux}
|
||||||
|
|
||||||
|
The term \emph{Linux} is exclusively used in the meaning of the
|
||||||
|
Linux kernel~\cite{linux}.
|
||||||
|
|
||||||
|
\n{2}{GNU/Linux}
|
||||||
|
|
||||||
|
As far as a Linux-based operating system is concerned, the term ``GNU/Linux''
|
||||||
|
as defined by the Free Software Foundation~\cite{fsfgnulinux} is used. While it
|
||||||
|
is longer and arguably a little bit cumbersome, the author aligns with the
|
||||||
|
opinion that this term more correctly describes its actual target. Being aware
|
||||||
|
that there are many people who conflate the complete operating system with its
|
||||||
|
(be it core) component, the kernel, the author is taking care to distinguish
|
||||||
|
the two, although writing from experience, colloquially, this probably brings
|
||||||
|
more confusion and a lengthy explanation is usually required.
|
||||||
|
|
||||||
|
\n{2}{The program}
|
||||||
|
|
||||||
|
By \emph{the program} or \emph{the application} without any additional context
|
||||||
|
the author most probably means the Password Compromise Monitoring Tool program.
|
||||||
|
|
||||||
|
|
||||||
% =========================================================================== %
|
% =========================================================================== %
|
||||||
|
@ -179,7 +179,7 @@
|
|||||||
@misc{age,
|
@misc{age,
|
||||||
howpublished = {[online]},
|
howpublished = {[online]},
|
||||||
title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.},
|
title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.},
|
||||||
author = {Filippo Sotille and Ben Cox and age contributors},
|
author = {Filippo Valsorda and Ben Cox and age contributors},
|
||||||
year = 2021,
|
year = 2021,
|
||||||
note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-23]}}
|
note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-23]}}
|
||||||
}
|
}
|
||||||
|
1676
tex/text.tex
1676
tex/text.tex
@ -72,177 +72,130 @@ practices in an effort to build a maintainable and long-lasting piece of
|
|||||||
software that serves its users well. When deployed, it could provide real
|
software that serves its users well. When deployed, it could provide real
|
||||||
value.
|
value.
|
||||||
|
|
||||||
|
Terminology is located in Appendix~\ref{appendix:terms}, feel free to give it a
|
||||||
|
read.
|
||||||
|
|
||||||
% =========================================================================== %
|
% =========================================================================== %
|
||||||
\part{Theoretical part}
|
\part{Theoretical part}
|
||||||
|
|
||||||
\n{1}{Terminology}
|
|
||||||
|
|
||||||
\n{2}{Linux}
|
|
||||||
|
|
||||||
The term \emph{Linux} is exclusively used in the meaning of the
|
|
||||||
Linux kernel~\cite{linux}.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{GNU/Linux}
|
|
||||||
|
|
||||||
As far as a Linux-based operating system is concerned, the term ``GNU/Linux''
|
|
||||||
as defined by the Free Software Foundation~\cite{fsfgnulinux} is used. While it
|
|
||||||
is longer and arguably a little bit cumbersome, the author aligns with the
|
|
||||||
opinion that this term more correctly describes its actual target. Being aware
|
|
||||||
there are many people that conflate the complete operating system with its (be
|
|
||||||
it core) component, the kernel, the author is taking care to distinguish the
|
|
||||||
two, although writing from experience, colloquially, this probably brings more
|
|
||||||
confusion and a lengthy explanation is usually required.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Containers}
|
|
||||||
|
|
||||||
When the concept of \emph{containerisation} and \emph{containers} is mentioned
|
|
||||||
throughout this work, the author has OCI containers~\cite{ocicontainers} in
|
|
||||||
mind, which is broadly a superset of \emph{Linux Containers} where some set of
|
|
||||||
processes is presented with a view of kernel resources (there are multiple
|
|
||||||
kinds of resources, such as IPC queues; network devices, stacks, ports; mount
|
|
||||||
points, process IDs, user and group IDs, Cgroups and others) that differs for
|
|
||||||
each different set of processes, similar in thought to FreeBSD
|
|
||||||
\emph{jails}~\cite{freebsdjails} with the distinction being that they are, of
|
|
||||||
course, facilitated by the Linux kernel namespace
|
|
||||||
functionality~\cite{linuxnamespaces}, which is in turn regarded to be
|
|
||||||
\emph{inspired} by Plan 9's namespaces~\cite{plan9namespaces}, Plan 9 being a
|
|
||||||
Bell Labs successor to Unix 8th Edition, discontinued in 2015.
|
|
||||||
While there without a doubt \emph{is} specificity bound to using each of the
|
|
||||||
tools that enable creating (Podman vs.\ Buildah vs.\ Docker BuildX) or running
|
|
||||||
(ContainerD vs.\ runC vs.\ crun) container images, when describing an action
|
|
||||||
that gets performed with or onto a container, the process should generally be
|
|
||||||
explained in such a way that it is repeatable using any spec-conforming tool
|
|
||||||
that is available and \emph{intended for the job}.
|
|
||||||
|
|
||||||
\vspace*{-\baselineskip}
|
|
||||||
\n{2}{The program}
|
|
||||||
|
|
||||||
By \emph{the program} or \emph{the application} without any additional context
|
|
||||||
the author usually means the Password Compromise Monitoring Tool program.
|
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Cryptography primer}\label{sec:cryptographyprimer}
|
\n{1}{Cryptography primer}\label{sec:cryptographyprimer}
|
||||||
|
|
||||||
\n{2}{Encryption}
|
\n{2}{Encryption}
|
||||||
|
|
||||||
Encryption is the process of transforming certain data, called a
|
\textbf{TODO:} add \emph{why} we care and how it's going to be used.
|
||||||
\emph{message}, using, as Aumasson writes in Serious Cryptography, ``an
|
|
||||||
algorithm called a \emph{cipher} and a secret value called the
|
|
||||||
key''~\cite{seriouscryptography}. Its purpose is to protect the said message so
|
|
||||||
that only its intended recipients that know/hold the key are able to
|
|
||||||
\emph{decipher} and read it.
|
|
||||||
|
|
||||||
\n{3}{Symmetric encryption}
|
|
||||||
|
|
||||||
Symmetric encryption is simply when the \emph{key} used is to facilitate both
|
|
||||||
encryption and decryption operations.
|
|
||||||
|
|
||||||
\n{3}{Asymmetric encryption}
|
|
||||||
|
|
||||||
Asymmetric encryption is different from symmetric encryption in that there are
|
|
||||||
now two keys in use - a key \emph{pair}. One part is used solely for
|
|
||||||
encryption, while the other part's only purpose is to decrypt. This notion of
|
|
||||||
two keys is generally transposed to a domain called \emph{public key
|
|
||||||
cryptography}, whereby the decryption component is declared private and the
|
|
||||||
encryption component is called \emph{public}, hence the name. The rationale is
|
|
||||||
that everybody can encrypt messages \emph{for} the recipient but only they are
|
|
||||||
able to \emph{decrypt} them, which is a feature allowed by the mathematical
|
|
||||||
complementarity of the two components, and also explains why the private key
|
|
||||||
should be kept \emph{private}. Compared to symmetric encryption, this variant
|
|
||||||
is generally slower.
|
|
||||||
|
|
||||||
\n{3}{The key exchange problem}
|
|
||||||
|
|
||||||
Suppose a communication scheme that is protected by a pre-shared secret.
|
|
||||||
In order to establish secure communications, this secret needs to be
|
|
||||||
distributed to the other party via untrusted channels. In 1976 Whitfield Diffie
|
|
||||||
and Martin Hellman published a paper in which they devised a \emph{public-key
|
|
||||||
distribution scheme}, which allows the two parties to arrive at a shared secret
|
|
||||||
by exchanging information via insecure channels with the presence of an
|
|
||||||
eavesdropper. This scheme (or its variations) is in use to this day.
|
|
||||||
|
|
||||||
\n{2}{Hash functions}
|
\n{2}{Hash functions}
|
||||||
|
|
||||||
Hash functions are cryptographic algorithms used to help with a number of
|
Hash functions are algorithms used to help with a number of things: integrity
|
||||||
things: integrity verification, password protection, digital signature,
|
verification, password protection, digital signature, public-key encryption and
|
||||||
public-key encryption and others. Hashes are used in forensic analysis to prove
|
others. Hashes are used in forensic analysis to prove authenticity of digital
|
||||||
authenticity of digital artifacts, to uniquely identify a change-set within
|
artifacts, to uniquely identify a change-set within revision-based source code
|
||||||
revision-based source code management systems such as Git, Subversion or
|
management systems such as Git, Subversion or Mercurial, to detect
|
||||||
Mercurial, to detect known-malicious software by anti-virus programs or by
|
known-malicious software by anti-virus programs or by advanced filesystems in
|
||||||
advanced filesystems in order to verify block integrity and enable repairs, and
|
order to verify block integrity and enable repairs, and also in many other
|
||||||
also in many other applications that each person using a modern computing
|
applications that each person using a modern computing device has come across,
|
||||||
device has come across, such as when connecting to a website protected by the
|
such as when connecting to a website protected by the famed HTTPS.
|
||||||
famed HTTPS.
|
|
||||||
|
|
||||||
The popularity stems from a common use case: the need to identify a chunk of
|
The popularity of hash functions stems from a common use case: the need to
|
||||||
data. Of course, two chunks of data, two files, frames or packets could always
|
simplify reliably identifying a chunk of data. Of course, two chunks of data,
|
||||||
be compared bit by bit, but that can get prohibitive from both cost and energy
|
two files, frames or packets could always be compared bit by bit, but that can
|
||||||
point of view relatively quickly. That is when the hash functions come in,
|
get prohibitive from both cost and energy point of view relatively quickly.
|
||||||
since they are able to take a long input and produce a short output, named a
|
That is when the hash functions come in, since they are able to take a long
|
||||||
digest or a hash value. It also does not work the other way around, a file
|
input and produce a short output, named a digest or a hash value. The function
|
||||||
cannot be reconstructed from the hash digest, it is a one-way function.
|
also only works one way.
|
||||||
|
|
||||||
\n{3}{Rainbow tables}
|
A file, or any original input data for that matter, cannot be reconstructed
|
||||||
|
from the hash digest alone by somehow \emph{reversing} the hashing operation,
|
||||||
|
since at the heart of any hash function there is essentially a compression
|
||||||
|
function.
|
||||||
|
|
||||||
As passwords are in more responsible scenarios stored not directly but as
|
Most alluringly, hashes are frequently used with the intent of
|
||||||
hashes, attackers that would be interested in recovering the passwords really
|
\emph{protecting} passwords by making those unreadable, while still being able
|
||||||
only have one option (except finding a critical vulnerability in the hash
|
to verify that the user knows the password, therefore should be authorised.
|
||||||
function): rainbow tables. Rainbow tables are lists of pre-computed hashes
|
|
||||||
paired with the passwords that were used to create them. When attackers gain
|
|
||||||
access to a password breach that contains hashes, all it takes is to find a
|
|
||||||
match within the rainbow table and reversely resolve that to the known
|
|
||||||
message: the password.
|
|
||||||
|
|
||||||
One of the popular counter-measures to pre-computed tables is adding a
|
As the hashing operation is irreversible, once the one-way function produces a
|
||||||
\emph{salt} to the user-provided password before passing it to the KDF (Key
|
short a digest, there is no way to reconstruct the original message from it.
|
||||||
Derivation Function) or the hash function. Of course, the salt should be random
|
That is, unless the input of the hash function is also known, in which case all
|
||||||
\textbf{per-user} and not reused, as that would mean that two users with the
|
it takes is hashing the supposed input and comparing the digest with existing
|
||||||
same password would still end up with the same hash, and the salt should also
|
digests that are known to be digests of passwords.
|
||||||
be adequately long to be effective. As the salt is supposed to be
|
|
||||||
\emph{random}, it would be a good idea to use an actual CSPRNG, such as
|
\\ \textbf{TODO:} ad more on \emph{why} we care and what types of hashes should be
|
||||||
\textbf{Fortuna}~\cite{fortuna} as a source of entropy (randomness). In
|
used (with refs) and why.
|
||||||
FreeBSD, Fortuna is in fact the one serving \texttt{/dev/random}.
|
|
||||||
|
|
||||||
|
|
||||||
\n{3}{TLS}\label{sec:tls}
|
\n{3}{Types and use cases}
|
||||||
|
|
||||||
|
Hash functions can be loosely categorised based on their intended use case to
|
||||||
|
\emph{password protection hashes}, \emph{integrity verification hashes},
|
||||||
|
\emph{message authentication codes} and \emph{cryptographic hashes}. Each of
|
||||||
|
these possess unique characteristics and using the wrong type of hash function
|
||||||
|
for the wrong job can potentially result in a security breach.
|
||||||
|
|
||||||
|
As an example, suppose \texttt{MD5}, a popular hash function internally using
|
||||||
|
the same data structure - \emph{Merkle-Damgård} construction - as
|
||||||
|
\texttt{BLAKE3}. While the former produces 128 bit digests, the latter by
|
||||||
|
default outputs 256 bit digest with no upper limit (Merkle tree extensibility).
|
||||||
|
|
||||||
|
There is a list of differences that could further be mentioned, however, they
|
||||||
|
both have one thing in common: they are \emph{designed} to be \emph{fast}. The
|
||||||
|
latter, as a cryptographic hash function, is conjectured to be \emph{random
|
||||||
|
oracle indifferentiable}, secure against length extension, but it is also in
|
||||||
|
fact faster than all of \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and
|
||||||
|
even \texttt{Blake2} family of functions.
|
||||||
|
|
||||||
|
The use case of both is to (quickly) verify integrity of a given chunk of data,
|
||||||
|
in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not
|
||||||
|
to secure a password by hashing it first, which poses a big issue when used
|
||||||
|
to...secure passwords by hashing them first.
|
||||||
|
|
||||||
|
A password hash function, such as \texttt{argon2} or \texttt{bcrypt} are good
|
||||||
|
choices for securely storing hashed passwords, namely because they place CPU
|
||||||
|
and memory burden on the host computing the digest, as well as limit potential
|
||||||
|
parallelism, thus preventing the scale at which an exhaustive search could be
|
||||||
|
launched. Additionally, both functions automatically \emph{salt} the passwords
|
||||||
|
before hashing them, which means that two exact same passwords of two different
|
||||||
|
users will not end up hashing to the same digest value, making it that much
|
||||||
|
harder to recover the original, supposedly weak password.
|
||||||
|
|
||||||
|
|
||||||
|
\n{3}{Why are hashes interesting}
|
||||||
|
|
||||||
|
As already mentioned, since hashes are often used to store the
|
||||||
|
representation of the password instead of the password itself, they become a
|
||||||
|
subject of interest when they get leaked. There have been enough instances of
|
||||||
|
leaked raw passwords that anyone with enough interest can put together a neat
|
||||||
|
list of hashes of the most popular passwords.
|
||||||
|
|
||||||
|
So while the service does not store plain text passwords, which is good, using
|
||||||
|
a hashing function not designed to protect passwords does not offer much
|
||||||
|
additional protection in case of weak passwords, which are the most commonly
|
||||||
|
used ones.
|
||||||
|
|
||||||
|
It seems logical that a service that is not using cryptographic primitives
|
||||||
|
correctly is more likely to get hacked and have its users' passwords/hashes
|
||||||
|
leaked. Then, the Internet ends up serving as a storage of every data dump,
|
||||||
|
often exposing these passwords/hashes for everyone to access.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{TLS}\label{sec:tls}
|
||||||
|
|
||||||
The Transport Layer Security protocol (or TLS) serves as as an encryption and
|
The Transport Layer Security protocol (or TLS) serves as as an encryption and
|
||||||
\emph{authentication} protocol to secure internet communications. An important
|
\emph{authentication} protocol to secure internet communications. An important
|
||||||
part of the protocol is the \emph{handhake}, during which the two communicating
|
part of the protocol is the \emph{handshake}, during which the two communicating
|
||||||
parties exchange messages that acknowledge each other's presence, verify each
|
parties exchange messages that acknowledge each other's presence, verify each
|
||||||
other, choose what cryptographic algorithms will be used and decide session
|
other, choose what cryptographic algorithms will be used and decide session
|
||||||
keys. As there are multiple versions of the protocol in active duty even at the
|
keys. As there are multiple versions of the protocol in active duty even at the
|
||||||
moment, the server together with the client need to agree upon the version they
|
moment, the server together with the client need to agree upon the version they
|
||||||
are going to use (these days it should be 1.2 or 1.3), pick cipher suites
|
are going to use (these days it is recommended to use either 1.2 or 1.3),
|
||||||
(TLSv1.3 dramatically reduced the number of available suites), the client
|
pick cipher suites (), the client verifies the server's public key (and the signature of the
|
||||||
verifies the server's public key (and the signature of the certificate
|
certificate authority that issued it) and they both generate session keys for
|
||||||
authority that issued it) and they both generate session keys for use after
|
use after handshake completion.
|
||||||
handshake completion.
|
|
||||||
|
|
||||||
The handshake consists of multiple stages (again, depending on the version), for
|
TLSv1.3 dramatically reduced the number of available suites to only include the
|
||||||
TLSv1.3 that would be:
|
ones deemed secure enough, which is why it is no longer needed to manually
|
||||||
|
specify what cipher suite should be used (or rely on the client/server to
|
||||||
\begin{itemize}
|
choose wisely). While possibly facing compatibility issues with legacy devices,
|
||||||
\item \textbf{Client hello}: client sends a client hello message containing
|
the simplicity that enabling TLSv1.3 brings is a worthy trade-off.
|
||||||
the protocol version, a list of cipher suites and the client random value.
|
|
||||||
The client in this step also includes the ephemeral Diffie-Helman (EDH)
|
|
||||||
parameters, which are later used for calculating the pre-master key.
|
|
||||||
\item \textbf{Server generating a master secret}: the server has got the
|
|
||||||
cipher suites, the client's paramaters and client random and already has
|
|
||||||
the server random, which means it can create the master secret.
|
|
||||||
\item \textbf{Server hello and ``Finished''}: the server includes in the
|
|
||||||
hello its certificate, digital signature, server random, the chosen
|
|
||||||
cipher suite, and sends a ``Finished'' (meaning \emph{ready}) message.
|
|
||||||
\item \textbf{Signature and certificate verification}: the client at this
|
|
||||||
step verifies server's certificate and signature, generates the master
|
|
||||||
secret and is ready (sends the ``Finished'' message).
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
At the end of the process, the connection is protected by symmetric encryption
|
|
||||||
using the session key that the both parties have arrived at.
|
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Passwords}\label{sec:passwords}
|
\n{1}{Passwords}\label{sec:passwords}
|
||||||
@ -381,151 +334,7 @@ internet that is discussed in the next sections and covers what browsers are,
|
|||||||
what they do and how they relate to web security.
|
what they do and how they relate to web security.
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Browsers}\label{sec:browsers}
|
\n{2}{Site Isolation}
|
||||||
|
|
||||||
Browsers, sometimes used together with the word that can serve as a real tell
|
|
||||||
for their specialisation - \emph{web} browsers - are programs intended for
|
|
||||||
\emph{browsing} of \emph{the web}. In more technical terms, browsers are
|
|
||||||
programs that facilitate (directly or via intermediary tools) domain name
|
|
||||||
lookups, connecting to web servers, optionally establishing a secure
|
|
||||||
connection, requesting the web page in question, determining its \emph{security
|
|
||||||
policy} and resolving what accompanying resources the web page specifies and
|
|
||||||
depending on the applicable security policy, requesting those from their
|
|
||||||
respective origins, applying stylesheets and running scripts. Constructing a
|
|
||||||
program that can speak many protocols and securely runs untrusted code from the
|
|
||||||
internet is no easy task.
|
|
||||||
|
|
||||||
\n{3}{Complexity}
|
|
||||||
|
|
||||||
Browsers these days are also quite ubiquitous programs running on
|
|
||||||
\emph{billions} of consumer grade mobile devices (which are also notorious for
|
|
||||||
bad update hygiene) or desktop devices all over the world. Regular users
|
|
||||||
usually expect them to work flawlessly with a multitude of network conditions,
|
|
||||||
network scenarios (the proverbial café WiFi, cellular data in a remote
|
|
||||||
location, home broadband that is DNS-poisoned by the ISP), differently tuned
|
|
||||||
(or commonly misconfigured) web servers, a combination of modern and
|
|
||||||
\emph{legacy} encryption schemes and different levels of conformance to web
|
|
||||||
standards from both web server and website developers. Of course, if a website
|
|
||||||
is broken, it is the browser's fault. Browsers are expected to detect if
|
|
||||||
\emph{captive portals} (a type of access control that usually tries to force
|
|
||||||
the user through a webpage with terms of use) are active and offer redirects.
|
|
||||||
All of this is immense complexity and the combination of ubiquity and great
|
|
||||||
exposure that this type of software gets is, in the author's opinion, the cause
|
|
||||||
behind a staggering amount of vulnerabilities found, reported and fixed in
|
|
||||||
browsers every year.
|
|
||||||
|
|
||||||
\n{3}{Standardisation}
|
|
||||||
|
|
||||||
Over the years, a consortium of parties interested in promoting and developing
|
|
||||||
the web (also due to its potential as a digital marketplace, i.e.\ financial
|
|
||||||
incentives) and browser vendors (of which the most neutral participant is
|
|
||||||
perhaps \emph{Mozilla}, with Chrome being run by Google, Edge by Microsoft and
|
|
||||||
Safari/Webkit by Apple) has evolved a great volume of web standards, which are
|
|
||||||
also relatively frequently getting updated or deprecated and replaced by
|
|
||||||
revised or new ones, rendering the browser maintenance task into essentially a
|
|
||||||
cat-and-mouse game.
|
|
||||||
|
|
||||||
It is the web's extensibility that enabled this build-up and ironically has
|
|
||||||
been proclaimed by some to be its greatest asset. It has also been ostensibly
|
|
||||||
been criticised~\cite{ddvweb} in the past and the frustration with the status
|
|
||||||
quo of web standards has relatively recently prompted a group of people to even
|
|
||||||
create ``\textit{a new application-level internet protocol for the distribution
|
|
||||||
of arbitrary files, with some special consideration for serving a lightweight
|
|
||||||
hypertext format which facilitates linking between files}'':
|
|
||||||
Gemini~\cite{gemini}\cite{geminispec} that in the words of its authors can be
|
|
||||||
thought of as ``\textit{the web, stripped right back to its essence}'' or as
|
|
||||||
``\textit{Gopher, souped up and modernised just a little}'', depending upon the
|
|
||||||
reader's perspective, noting that the latter view is probably more accurate.
|
|
||||||
|
|
||||||
\n{3}{HTTP}
|
|
||||||
|
|
||||||
Originally, HTTP was also designed just for fetching hypertext
|
|
||||||
\emph{resources}, but it has evolved since then, particularly due to its
|
|
||||||
extensibility, to allow for fetching of all sorts of web resources a modern
|
|
||||||
website of today provides, such as scripts or images, or even to \emph{post}
|
|
||||||
content back to servers.
|
|
||||||
|
|
||||||
HTTP relies on TCP (Transmission Control Protocol), which is one of the
|
|
||||||
\emph{reliable} (mandated by HTTP) protocols used to send data across
|
|
||||||
contemporary IP (Internet Protocol) networks, to deliver the data it requests
|
|
||||||
or sends. When Tim Berners-Lee invented the World Wide Web (WWW) in 1989 while
|
|
||||||
working at CERN (The European Organization for Nuclear Research) with a rather
|
|
||||||
noble intent as a ``\emph{wide-area hypermedia information retrieval initiative
|
|
||||||
to give universal access to a large universe of documents}''~\cite{wwwf}, he
|
|
||||||
also invented the HyperText Markup Language (HTML) to serve as a formatting
|
|
||||||
method for these new hypermedia documents. The first website was written
|
|
||||||
roughly the same way as today's websites are, using HTML, although the markup
|
|
||||||
language has changed since, with the current version being HTML5.
|
|
||||||
|
|
||||||
It has been mentioned that the client \textbf{requests} a \textbf{resource} and
|
|
||||||
receives a \textbf{response}, so those terms should probably be defined.
|
|
||||||
|
|
||||||
A request is what the client sends to the server. A resource is what it
|
|
||||||
requests and a response is the answer provided by the server.
|
|
||||||
|
|
||||||
HTTP follows a classic client-server model whereby it is \textbf{always} the
|
|
||||||
client that initiates the request.
|
|
||||||
|
|
||||||
A web page is, to be blunt, a chunk of \emph{hypertext}. To display a web page,
|
|
||||||
a browser first needs to send a request to fetch the HTML representing the
|
|
||||||
page, which is then parsed and additional requests for sub-resources are made.
|
|
||||||
If a page defines a layout information in the form of CSS, that is parsed as
|
|
||||||
well.
|
|
||||||
|
|
||||||
A web page needs to be present on the local computer first \emph{before} it can
|
|
||||||
be parsed by the browser, and since websites are usually still served by
|
|
||||||
programs called \emph{web servers} as in the \emph{early days}, that presents a
|
|
||||||
problem of how tell the browser where the resource should be fetched from. In
|
|
||||||
today's browsers, the issue is sorted (short of the CLI) by the \emph{address
|
|
||||||
bar}, a place into which user types what they wish the browser to fetch for
|
|
||||||
them.
|
|
||||||
|
|
||||||
The formal name of this segment is a \emph{Universal Resource Locator}, or URL,
|
|
||||||
and it contains the schema (or the protocol, such as \texttt{http://}), the
|
|
||||||
host address or a domain name and a (TCP) port number.
|
|
||||||
|
|
||||||
Since a TCP connection needs to be established first, to connect to a server
|
|
||||||
whose only URL contains a domain name, the browser needs to perform a domain
|
|
||||||
name \emph{lookup} using system facilities, or as was the case for a couple of
|
|
||||||
notorious Chromium versions, send some additional and unrelated queries which
|
|
||||||
(with Chromium-based derivatives' numbers) ended up placing unnecessary load
|
|
||||||
directly at the root DNS servers~\cite{chromiumrootdns}.
|
|
||||||
|
|
||||||
If a raw IP address+port combination is used, the browser attempts to connect
|
|
||||||
to it directly and requests the user-requested page by default using the
|
|
||||||
\texttt{GET} \emph{method}. A \emph{well-known} HTTP port 80 is assumed unless
|
|
||||||
other port is explicitly specified and it can be omitted both if host is a
|
|
||||||
domain name or an IP address.
|
|
||||||
|
|
||||||
The method is a way for the user-agent to define what operation it wants to
|
|
||||||
perform. \texttt{GET} is used for fetching resources while \texttt{POST} is
|
|
||||||
used to send data to the server, such as to post the values of an HTML form.
|
|
||||||
|
|
||||||
A server response is comprised of a \textbf{status code}, a status message,
|
|
||||||
HTTP \textbf{headers} and an optional \textbf{body} containing the content. The
|
|
||||||
status code indicates if the original request was successful or not and the
|
|
||||||
browser is generally there to interpret these status codes to the user. There
|
|
||||||
is enough status codes to be confused by the sheer numbers but luckily, there
|
|
||||||
is a method to the madness and they can be divided into groups/classes:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item 1xx: Informational responses
|
|
||||||
\item 2xx: Successful responses
|
|
||||||
\item 3xx: Redirection responses
|
|
||||||
\item 4xx: Client error responses
|
|
||||||
\item 5xx: Server error responses
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
In case the \emph{user agent} (a web \emph{client}) such as a browser receives
|
|
||||||
a response with content, it has to parse it.
|
|
||||||
|
|
||||||
A header is additional information sent by both the server and the client that
|
|
||||||
can guide or alter the behaviour of software reading it. For instance a
|
|
||||||
\texttt{Cache-control} header with a duration value can be used by the server
|
|
||||||
to signify that the client can store certain resources for some time before
|
|
||||||
needing to re-fetch them, if they are not \emph{expired}.
|
|
||||||
|
|
||||||
\n{3}{Site Isolation}
|
|
||||||
|
|
||||||
Modern browsers such as Firefox or Chromium come with a security focus in mind.
|
Modern browsers such as Firefox or Chromium come with a security focus in mind.
|
||||||
Their developers are acutely aware of the dangers that parsing untrusted code
|
Their developers are acutely aware of the dangers that parsing untrusted code
|
||||||
@ -558,6 +367,7 @@ access to session tokens and any cookies associated with the website's origin,
|
|||||||
apart from being able to rewrite the HTML content. The results of XSS can
|
apart from being able to rewrite the HTML content. The results of XSS can
|
||||||
range from account compromise to identity theft.
|
range from account compromise to identity theft.
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Content Security Policy}\label{sec:csp}
|
\n{2}{Content Security Policy}\label{sec:csp}
|
||||||
|
|
||||||
Content Security Policy (CSP) has been an important addition to the arsenal of
|
Content Security Policy (CSP) has been an important addition to the arsenal of
|
||||||
@ -600,15 +410,380 @@ in production. There are many more directives and settings than mentioned in
|
|||||||
this section, the author encourages anybody interested to give it a read, e.g.\
|
this section, the author encourages anybody interested to give it a read, e.g.\
|
||||||
at \url{https://web.dev/csp/}.
|
at \url{https://web.dev/csp/}.
|
||||||
|
|
||||||
\n{2}{Summary}
|
\textbf{TODO}: add more concrete examples.
|
||||||
|
|
||||||
Passwords are in use everywhere and probably will be for the foreseeable
|
|
||||||
future. As long as passwords are going to be handled and stored by
|
\n{1}{Configuration}
|
||||||
service/application providers, they are going to get leaked, be it due to
|
|
||||||
provider carelessness or the attackers' resolve and wit. Of course, sifting
|
Every non-trivial program usually offers at least \emph{some} way to
|
||||||
through all the available password breach data by hand is not a reasonable
|
tweak/manage its behaviour, and these changes are usually persisted
|
||||||
option, and therefore tools should come in to provide assistance. The next part
|
\emph{somewhere} on the filesystem of the host: in a local SQLite3 database, a
|
||||||
of the thesis will explore that and offer a solution.
|
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
||||||
|
configuration file. These configuration files need to be read and checked at
|
||||||
|
least on program start-up and either stored into operating memory for the
|
||||||
|
duration of the runtime of the program, or loaded and parsed and the memory
|
||||||
|
subsequently \emph{freed} (initial configuration).
|
||||||
|
|
||||||
|
There is an abundance of configuration languages (or file formats used to craft
|
||||||
|
configuration files, whether they were intended for it or not) available, TOML,
|
||||||
|
INI, JSON, YAML, to name some of the popular ones (as of today).
|
||||||
|
|
||||||
|
Dhall stood out as a language that was designed with both security and the
|
||||||
|
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
||||||
|
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
||||||
|
few of its concepts from Haskell), and in its apparent core being very similar
|
||||||
|
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
||||||
|
is: ``a programmable configuration language that you can think of as: JSON +
|
||||||
|
functions + types + imports''~\cite{dhalllang}.
|
||||||
|
|
||||||
|
Among all of the listed features, the especially intriguing one to the author
|
||||||
|
was the promise of \emph{types}. There are multiple examples directly on the
|
||||||
|
project's documentation webpage demonstrating for instance the declaration and
|
||||||
|
usage of custom types (that are, of course merely combinations of the primitive
|
||||||
|
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
||||||
|
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
||||||
|
designing a custom configuration \emph{schema} for the program.
|
||||||
|
Dhall not being a Turing-complete language also guarantees that evaluation
|
||||||
|
\emph{always} terminates eventually, which is a good attribute to possess as a
|
||||||
|
configuration language.
|
||||||
|
|
||||||
|
\n{3}{Safety considerations}
|
||||||
|
|
||||||
|
Having a programmable configuration language that understands functions and
|
||||||
|
allows importing not only arbitrary text from random internet URLs, but also
|
||||||
|
importing and \emph{evaluating} (i.e.\ running) potentially untrusted code, it
|
||||||
|
is important that there are some safety mechanisms employed, which can be
|
||||||
|
relied on by the user. Dhall offers this in multiple features: enforcing a
|
||||||
|
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
||||||
|
of the expression being imported.
|
||||||
|
|
||||||
|
\n{3}{Possible alternatives}
|
||||||
|
|
||||||
|
While developing the program, the author has also
|
||||||
|
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
||||||
|
cache}, which can generally be observed in the scenario of running the program
|
||||||
|
in an environment that does not allow to write the cache files (a read-only
|
||||||
|
filesystem), of does not keep the written cache files, such as a container that
|
||||||
|
is not configured to mount a persistent volume at the pertinent location.
|
||||||
|
|
||||||
|
To describe the way Dhall works when performing an evaluation, it resolves
|
||||||
|
every expression down to a combination of its most basic types (eliminating all
|
||||||
|
abstraction and indirection) in the process called
|
||||||
|
\textbf{normalisation}~\cite{dhallnorm} and then saves this result in the
|
||||||
|
host's cache. The \texttt{dhall-haskell} binary attempts to resolve the
|
||||||
|
variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
|
||||||
|
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
|
||||||
|
results of the normalisation will be written for repeated use. Do note that
|
||||||
|
this behaviour has been observed on a GNU/Linux host and the author has not
|
||||||
|
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
||||||
|
|
||||||
|
If normalisation is performed inside an ephemeral container (as opposed to, for
|
||||||
|
instance, an interactive desktop session), the results effectively get lost on
|
||||||
|
each container restart. That is both wasteful and not great for user
|
||||||
|
experience, since the normalisation of just a handful of imports (which
|
||||||
|
internally branches widely) can take an upwards of two minutes, during which
|
||||||
|
the user is left waiting for the hanging application with no reporting on the
|
||||||
|
progress or current status.
|
||||||
|
|
||||||
|
While workarounds for the above mentioned problem can be devised relatively
|
||||||
|
easily (such as bind mounting persistent volumes inside the container in place
|
||||||
|
of the \texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
|
||||||
|
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} to preserve the cache between
|
||||||
|
restarts, or let the cache be pre-computed during container build, since the
|
||||||
|
application is only really expected to run together with a compatible version
|
||||||
|
of the configuration schema and this version \emph{is} known at container build
|
||||||
|
time), it would certainly feel better if there was no need to work
|
||||||
|
\emph{around} the configuration system of choice.
|
||||||
|
|
||||||
|
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
||||||
|
as a potentially almost drop-in replacement for Dhall feature-wise, while also
|
||||||
|
resolving costly \emph{cold cache} normalisation operations, which is in
|
||||||
|
author's view Dhall's titular issue.
|
||||||
|
|
||||||
|
|
||||||
|
\n{1}{Compromise Monitoring}
|
||||||
|
|
||||||
|
There are, of course, several ways one could approach monitoring of compromised
|
||||||
|
of credentials, some more \emph{manual} in nature than others. When using a
|
||||||
|
service that is suspected/expected to be breached in the future, one can always
|
||||||
|
create a unique username/password combination specifically for the subject
|
||||||
|
service and never use that combination anywhere else. That way, if the
|
||||||
|
credentials ever \emph{do} happen to appear in a data dump online in the
|
||||||
|
future, it is going to be a safe assumption as to where they came from.
|
||||||
|
|
||||||
|
Unfortunately, the task of actually \emph{monitoring} the credentials can prove
|
||||||
|
to be a little more arduous than one could expect at first. There are a couple
|
||||||
|
of points that can prove to pose a challenge in case the search is performed by
|
||||||
|
hand, namely:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item finding the breached data to look through
|
||||||
|
\item verifying the trustworthiness of the data
|
||||||
|
\item varying quality of the data
|
||||||
|
\item sifting through (possibly) unstructured data by hand
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Of course, as this is a popular topic for a number of people, the above
|
||||||
|
mentioned work has already been packaged into neat and practical online
|
||||||
|
offerings. In case one decides in favour of using those, an additional range of
|
||||||
|
issues (the previous one still applicable) arises:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item the need to trust the provider with input credentials
|
||||||
|
\item relying on the goodwill of the provider to be able to access the data
|
||||||
|
\item hoping that the terms of service are kept
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Besides that, there is a plethora of breaches floating around the Internet
|
||||||
|
available simply as zip files, which makes the job even harder.
|
||||||
|
|
||||||
|
The overarching goal of this thesis is devising and implementing a system in
|
||||||
|
which the user can \emph{monitor} whether their credentials have been
|
||||||
|
\emph{compromised} (at least as far as the data can tell), and allowing them to
|
||||||
|
do so without needing to entrust their sensitive data to a provider.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Data Sources}\label{sec:dataSources}
|
||||||
|
|
||||||
|
A data source in this place is considered anything that provides the
|
||||||
|
application with data that it understands.
|
||||||
|
|
||||||
|
Of course, the results of credential compromise verification/monitoring is only
|
||||||
|
going to be as good as the data underpinning it, which is why it is imperative
|
||||||
|
that high quality data sources be used, if at all possible. While great care
|
||||||
|
does have to be taken to only choose the highest quality data sources, the
|
||||||
|
application must offer a means to be able to utilise these.
|
||||||
|
|
||||||
|
The sources from which breached data can be loaded into an application can be
|
||||||
|
split into two basic categories: \textbf{online} or \textbf{local}, and it is
|
||||||
|
possible to further discern between \emph{structured} and \emph{unstructured}
|
||||||
|
data.
|
||||||
|
|
||||||
|
An online source is generally a service that ideally exposes a programmatic
|
||||||
|
API, which an application can query and from which it can request the necessary
|
||||||
|
subsets of data.
|
||||||
|
These types of services often additionally front the data by a user-friendly
|
||||||
|
web interface for one-off searches, which is, however, not of use here.
|
||||||
|
|
||||||
|
Among some examples of online services could be named:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item {Have I Been Pwned?} - \url{https://haveibeenpawned.com}
|
||||||
|
\item {DeHashed} - \url{https://dehashed.com}
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Large lumps of unstructured data available on forums or shady web servers would
|
||||||
|
technically also count here, given that they provide data and are available
|
||||||
|
online. However, even though data is frequently found online precisely in this
|
||||||
|
form, it is also not of direct use for the application without manual
|
||||||
|
\emph{preprocessing}, as it is attended to in
|
||||||
|
Section~\ref{sec:localDatasetPlugin}.
|
||||||
|
|
||||||
|
Another source is then simply any locally supplied data, which, of course,
|
||||||
|
could have been obtained from a breach available online beforehand.
|
||||||
|
|
||||||
|
Locally supplied data is specific in that it needs to be formatted in such a
|
||||||
|
way that it can be understood by the application. That is, the data is not in
|
||||||
|
its raw form anymore but has been morphed into the precise shape the
|
||||||
|
application needs for further processing. Once imported, the application can
|
||||||
|
query the data at will, as it knows exactly the shape of it.
|
||||||
|
|
||||||
|
This supposes the existence of a \emph{format} for importing, schema of which
|
||||||
|
is devised in Section~\ref{sec:localDatasetPlugin}.
|
||||||
|
|
||||||
|
|
||||||
|
\n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin}
|
||||||
|
|
||||||
|
Unstructured breach data from locally available datasets can be imported into
|
||||||
|
the application by first making sure it adheres to the specified schema (have a
|
||||||
|
look at the \emph{Breach Data Schema} in Listing~\ref{breachDataGoSchema}). If
|
||||||
|
it does not (which is very likely with random breach data, as already mentioned
|
||||||
|
in Section~\ref{sec:dataSources}), it needs to be converted to a form that
|
||||||
|
\emph{does} before importing it to the application, e.g.\ using a Python script
|
||||||
|
or a similar method.
|
||||||
|
|
||||||
|
Attempting to import data that does not follow the outlined schema should
|
||||||
|
result in an error. Equally so, importing a dataset which is over a reasonable
|
||||||
|
size limit should by default be rejected by the program as a precaution.
|
||||||
|
Unmarshaling, for instance, a 1 TiB document would most likely result in an
|
||||||
|
out-of-memory (OOM) situation on the host running the application, assuming
|
||||||
|
contemporary consumer hardware conditions (not HPC).
|
||||||
|
|
||||||
|
\vspace{\parskip}
|
||||||
|
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go
|
||||||
|
struct with imports from the standard library assumed},
|
||||||
|
label=breachDataGoSchema]
|
||||||
|
type breachDataSchema struct {
|
||||||
|
Name string
|
||||||
|
Time time.Time
|
||||||
|
IsVerified bool
|
||||||
|
ContainsPasswords bool
|
||||||
|
ContainsHashes bool
|
||||||
|
HashType string
|
||||||
|
HashSalted bool
|
||||||
|
HashPepperred bool
|
||||||
|
ContainsUsernames bool
|
||||||
|
ContainsEmails bool
|
||||||
|
Data any
|
||||||
|
}
|
||||||
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
|
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
|
||||||
|
actuality translate to a YAML document written and supplied by an
|
||||||
|
administrative user of the program. The YAML format was chosen for several
|
||||||
|
reasons:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item relative ease of use (plain text, readability)
|
||||||
|
\item capability to store multiple \emph{documents} inside of a single file
|
||||||
|
\item most of the inputs being implicitly typed as strings
|
||||||
|
\item support for inclusion of comments
|
||||||
|
\item machine readability thanks to being a superset of JSON
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
The last point specifically should allow for documents similar to what can be
|
||||||
|
seen in Listing~\ref{breachDataYAMLSchema} to be ingested by the program, read
|
||||||
|
and written by humans and programs alike.
|
||||||
|
|
||||||
|
\smallskip
|
||||||
|
\begin{lstlisting}[language=YAML, caption={Example Breach Data Schema supplied
|
||||||
|
to the program as a YAML file, optionally containing multiple documents},
|
||||||
|
label=breachDataYAMLSchema]
|
||||||
|
---
|
||||||
|
name: Horrible breach
|
||||||
|
time: 2022-04-23T00:00:00Z+02:00
|
||||||
|
isVerified: false
|
||||||
|
containsPasswds: false
|
||||||
|
containsHashes: true
|
||||||
|
containsEmails: true
|
||||||
|
hashType: md5
|
||||||
|
hashSalted: false
|
||||||
|
hashPeppered: false
|
||||||
|
data:
|
||||||
|
hashes:
|
||||||
|
- hash1
|
||||||
|
- hash2
|
||||||
|
- hash3
|
||||||
|
emails:
|
||||||
|
- email1
|
||||||
|
-
|
||||||
|
- email3
|
||||||
|
---
|
||||||
|
# document #2, describing another breach.
|
||||||
|
name: Horrible breach 2
|
||||||
|
...
|
||||||
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
|
Notice how the emails list in Listing~\ref{breachDataYAMLSchema} misses one
|
||||||
|
record, perhaps because it was not supplied or mistakenly omitted. This is a
|
||||||
|
valid scenario (mistakes happen) and the application needs to be able to handle
|
||||||
|
it. The alternative would be to require the user to prepare the data in such a
|
||||||
|
way that the empty/partial records would be dropped entirely.
|
||||||
|
|
||||||
|
\n{3}{Have I Been Pwned? Integration}
|
||||||
|
|
||||||
|
Troy Hunt's \textbf{Have I Been Pwned?} online service
|
||||||
|
(\url{https://haveibeenpwned.com/}) has been chosen as the online source of
|
||||||
|
compromised data. The service offers private APIs that are protected by API
|
||||||
|
keys. The application's \texttt{hibp} module and database representation models
|
||||||
|
the values returned by this API, which allows searching in large breaches using
|
||||||
|
email addresses.\\
|
||||||
|
The architecture there is relatively simple: the application administrator
|
||||||
|
configures an API key for HIBP, the user enters the query parameters, the
|
||||||
|
application constructs a query and calls the API and waits for a response. As
|
||||||
|
the API is rate-limited based on the key supplied, this can pose an issue and
|
||||||
|
it has not been fully resolved in the UI. The application then parses the
|
||||||
|
returned data and binds it to the local model for validation. If that goes
|
||||||
|
well, the data is saved into the database as a cache and the search query is
|
||||||
|
performed on the saved data. If it returns anything, it is displayed to the
|
||||||
|
user for browsing.
|
||||||
|
|
||||||
|
|
||||||
|
\n{1}{Deployment recommendations}\label{sec:deploymentRecommendations}
|
||||||
|
|
||||||
|
It is, of course, recommended that the application runs in a secure environment
|
||||||
|
\allowbreak although definitions of that almost certainly differ depending on
|
||||||
|
who you ask. General recommendations would be either to effectively reserve a
|
||||||
|
machine for a single use case - running this program - so as to dramatically
|
||||||
|
decrease the potential attack surface of the host, or run the program isolated
|
||||||
|
in a container or a virtual machine. Further, if the host does not need
|
||||||
|
management access (it is a deployed-to-only machine that is configured
|
||||||
|
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
||||||
|
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
||||||
|
needed. In an ideal scenario, the host machine would have as little software
|
||||||
|
installed as possible besides what the application absolutely requires.
|
||||||
|
|
||||||
|
System-wide cryptographic policies should target highest feasible security
|
||||||
|
level, if at all available (such as by default on Fedora or RHEL), covering
|
||||||
|
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
|
||||||
|
and SELinux (kernel-level mandatory access control and security policy
|
||||||
|
mechanism) running in \emph{enforcing} mode, if available.
|
||||||
|
|
||||||
|
\n{2}{Transport security}
|
||||||
|
|
||||||
|
User connecting to the application should rightfully expect for their data to
|
||||||
|
be protected \textit{in transit} (i.e.\ on the way between their browser and
|
||||||
|
the server), which is what \emph{Transport Layer Security} family of
|
||||||
|
protocols~\cite{tls13rfc8446} was designed for, and which is the underpinning
|
||||||
|
of HTTPS. TLS utilises the primitives of asymmetric cryptography to let the
|
||||||
|
client authenticate the server (verify that it is who it claims it is) and
|
||||||
|
negotiate a symmetric key for encryption in the process named the \emph{TLS
|
||||||
|
handshake} (see Section~\ref{sec:tls} for more details), the final purpose of
|
||||||
|
which is establishing a secure communications connection. The operator should
|
||||||
|
configure the program to either directly utilise TLS using configuration or
|
||||||
|
have it listen behind a TLS-terminating \emph{reverse proxy}.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Containerisation}
|
||||||
|
|
||||||
|
Whether the pre-built or a custom container image is used to deploy the
|
||||||
|
application, it still needs access to secrets, such as database connection
|
||||||
|
string (containing database host, port, user, password/encrypted password,
|
||||||
|
authentication method and database name).
|
||||||
|
|
||||||
|
The application should be able to handle the most common Postgres
|
||||||
|
authentication methods~\cite{pgauthmethods}, namely \emph{peer},
|
||||||
|
\emph{scram-sha-256}, \emph{user name maps} and raw \emph{password}, although
|
||||||
|
the \emph{password} option should not be used in production, \emph{unless} the
|
||||||
|
connection to the database is protected by TLS.\ In any case, using the
|
||||||
|
\emph{scram-sha-256}~\cite{scramsha256rfc7677} method is preferable. One of the
|
||||||
|
ways to verify in development environment that everything works as intended is
|
||||||
|
the \emph{Password generator for PostgreSQL} tool~\cite{goscramsha256}, which
|
||||||
|
allows retrieving the encrypted string from a raw user input.
|
||||||
|
|
||||||
|
If the application running in a container wants to use the \emph{peer}
|
||||||
|
authentication method, it is up to the operator to supply the Postgres socket
|
||||||
|
to the application (e.g.\ as a volume bind mount). This scenario was not
|
||||||
|
tested; however, and the author is also not entirely certain how \emph{user
|
||||||
|
namespaces} (on GNU/Linux) would influence the process (as in when the
|
||||||
|
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
||||||
|
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
||||||
|
need to account.
|
||||||
|
|
||||||
|
Equally, if the application is running inside the container, the operator needs
|
||||||
|
to make sure that the database is either running in a network that is also
|
||||||
|
directly attached to the container or that there is a mechanism in place that
|
||||||
|
routes the requests for the database hostname to the destination.
|
||||||
|
|
||||||
|
One such mechanism is container name based routing inside \emph{pods}
|
||||||
|
(Podman/Kubernetes), where the resolution of container names is the
|
||||||
|
responsibility of a specially configured (often auto-configured) piece of
|
||||||
|
software called Aardvark for the former and CoreDNS for the latter.
|
||||||
|
|
||||||
|
|
||||||
|
\n{1}{Summary}
|
||||||
|
|
||||||
|
Passwords (and/or passphrases) are in use everywhere and quite probably will be
|
||||||
|
for the foreseeable future. If not as \textit{the} principal way to
|
||||||
|
authenticate, then at least as \textit{a} way to authenticate. As long as
|
||||||
|
passwords are going to be handled and stored by service/application providers,
|
||||||
|
they are going to get leaked, be it due to provider carelessness or the
|
||||||
|
attackers' resolve and wit. Of course, sifting through all the available
|
||||||
|
password breach data by hand is not a reasonable option, and therefore tools
|
||||||
|
providing assistance come in handy. The next part of this diploma thesis will
|
||||||
|
explore that issue and introduce a solution.
|
||||||
|
|
||||||
|
|
||||||
% =========================================================================== %
|
% =========================================================================== %
|
||||||
@ -616,14 +791,10 @@ of the thesis will explore that and offer a solution.
|
|||||||
|
|
||||||
\n{1}{Kudos}
|
\n{1}{Kudos}
|
||||||
|
|
||||||
\textbf{Disclaimer:} the author is not affiliated in any way with any of the
|
The program that has been developed as part of this thesis used and utilised a
|
||||||
projects described on this page.
|
great deal of free (as in \textit{freedom}) and open-source software in the
|
||||||
|
process, either directly or as an outstanding work tool, and the author would
|
||||||
The \textit{Password Compromise Monitoring Tool} (\texttt{pcmt}) program has
|
like to take this opportunity to recognise that fact\footnotemark.
|
||||||
been developed using and utilising a great deal of free (as in Freedom) and
|
|
||||||
open-source software in the process, either directly or as an outstanding work
|
|
||||||
tool, and the author would like to take this opportunity to recognise that
|
|
||||||
fact.
|
|
||||||
|
|
||||||
In particular, the author acknowledges that this work would not be the same
|
In particular, the author acknowledges that this work would not be the same
|
||||||
without:
|
without:
|
||||||
@ -641,9 +812,12 @@ without:
|
|||||||
|
|
||||||
All of the code written has been typed into VIM (\texttt{9.0}), the shell used
|
All of the code written has been typed into VIM (\texttt{9.0}), the shell used
|
||||||
to run the commands was ZSH, both running in the author's terminal emulator of
|
to run the commands was ZSH, both running in the author's terminal emulator of
|
||||||
choice - \texttt{kitty} on a \raisebox{.8ex}{\texttildelow}8 month (at the time
|
choice, \texttt{kitty}. The development machines ran a recent installation of
|
||||||
of writing) installation of \textit{Arch Linux (by the way)} using a
|
\textit{Arch Linux (by the way)} and Fedora 38, both using a \texttt{6.3.x}
|
||||||
\texttt{6.3.x-wanderer-zfs-xanmod1} variant of the Linux kernel.
|
XanMod variant of the Linux kernel.
|
||||||
|
|
||||||
|
\footnotetext{\textbf{Disclaimer:} the author is not affiliated in any way with any
|
||||||
|
of the projects described on this page.}
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Development}
|
\n{1}{Development}
|
||||||
@ -689,9 +863,9 @@ There is one caveat to this though, git first needs some additional
|
|||||||
configuration for the code in Listing~\ref{gitverif} to work as one would
|
configuration for the code in Listing~\ref{gitverif} to work as one would
|
||||||
expect. Namely that the public key used to verify the signature needs to be
|
expect. Namely that the public key used to verify the signature needs to be
|
||||||
stored in git's ``allowed signers file'', then git needs to be told where that
|
stored in git's ``allowed signers file'', then git needs to be told where that
|
||||||
file is using the configuration value \texttt{gpg.ssh.allowedsignersfile} and
|
file is located using the configuration value
|
||||||
finally the configuration value of the \texttt{gpg.format} field needs to be
|
\texttt{gpg.ssh.allowedsignersfile} and finally the configuration value of the
|
||||||
set to \texttt{ssh}.
|
\texttt{gpg.format} field needs to be set to \texttt{ssh}.
|
||||||
|
|
||||||
Because git allows the configuration values to be local to each repository,
|
Because git allows the configuration values to be local to each repository,
|
||||||
both of the mentioned issues can be solved by running the following commands
|
both of the mentioned issues can be solved by running the following commands
|
||||||
@ -703,10 +877,11 @@ label=gitsshprep, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
|||||||
% # set the signature format for the local repository.
|
% # set the signature format for the local repository.
|
||||||
% git config --local gpg.format ssh
|
% git config --local gpg.format ssh
|
||||||
% # save the public key.
|
% # save the public key.
|
||||||
% cat >./tmp/.allowed_signers \
|
% cat > ./.tmp-allowed_signers \
|
||||||
<<<'leo ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIKwshTdBgLzwY4d8N7VainZCngH88OwvPGhZ6bm87rBO'
|
<<<'surtur <insert literal surtur pubkey>
|
||||||
|
leo <insert literal leo pubkey>'
|
||||||
% # set the allowed signers file path for the local repository.
|
% # set the allowed signers file path for the local repository.
|
||||||
% git config --local gpg.ssh.allowedsignersfile=./tmp/.allowed_signers
|
% git config --local gpg.ssh.allowedsignersfile=./.tmp-allowed_signers
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
\vspace*{-\baselineskip}
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
@ -767,17 +942,17 @@ The fourth pipeline focuses on linting the Containerfile and building the
|
|||||||
container, although the latter action is only performed on feature branches,
|
container, although the latter action is only performed on feature branches,
|
||||||
\emph{pull requests} or \emph{tag} events.
|
\emph{pull requests} or \emph{tag} events.
|
||||||
|
|
||||||
The median build time as of writing was 1 minute, which includes running all
|
|
||||||
four pipelines, and that is acceptable. Build times might of course vary
|
|
||||||
depending on the hardware, for reference, these builds were being run on a
|
|
||||||
machine equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4
|
|
||||||
3200MHz RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS)
|
|
||||||
and a 400Mbps downlink, software-wise running Arch with an author-flavoured
|
|
||||||
Xanmod kernel version 6.3.x.
|
|
||||||
|
|
||||||
\obr{Drone CI median build
|
\obr{Drone CI median build
|
||||||
time}{fig:drone-median-build}{.84}{graphics/drone-median-build}
|
time}{fig:drone-median-build}{.84}{graphics/drone-median-build}
|
||||||
|
|
||||||
|
The median build time as of writing was 1 minute, which includes running all
|
||||||
|
four pipelines, and that is acceptable. Build times might of course vary
|
||||||
|
depending on the hardware, for reference, these builds were run on a machine
|
||||||
|
equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4 3200MHz
|
||||||
|
RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS) and a
|
||||||
|
400Mbps downlink, software-wise running Arch with an author-flavoured Xanmod
|
||||||
|
kernel version 6.3.x.
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Source code repositories}\label{sec:repos}
|
\n{2}{Source code repositories}\label{sec:repos}
|
||||||
|
|
||||||
@ -805,20 +980,28 @@ The repository containing the \LaTeX{} source code of this thesis:\\
|
|||||||
|
|
||||||
\n{2}{Toolchain}
|
\n{2}{Toolchain}
|
||||||
|
|
||||||
Throughout the creation of this work, the \emph{current} version of the Go
|
Throughout the creation of this work, the \emph{then-current} version of the Go
|
||||||
programming language was used, i.e. \texttt{go1.20}.
|
programming language was used, i.e. \texttt{go1.20}.
|
||||||
|
|
||||||
|
To read more on why Go was chosen, see Appendix~\ref{appendix:whygo}.
|
||||||
|
Nix/\texttt{devenv} tools have also aided heavily during development, see
|
||||||
|
Appendix~\ref{appendix:whynix} to learn more.
|
||||||
|
|
||||||
\tab{Tool/Library-Usage Matrix}{tab:toolchain}{1.0}{ll}{
|
\tab{Tool/Library-Usage Matrix}{tab:toolchain}{1.0}{ll}{
|
||||||
\textbf{Name} & \textbf{Usage} \\
|
\textbf{Name} & \textbf{Usage} \\
|
||||||
Go programming language & program core \\
|
Go programming language & program core \\
|
||||||
Dhall configuration language & program configuration \\
|
Dhall configuration language & program configuration \\
|
||||||
Echo & HTTP handlers, controllers, web server \\
|
Echo & HTTP handlers, controllers, web server \\
|
||||||
ent & ORM using graph-based modelling \\
|
ent & ORM using graph-based modelling \\
|
||||||
bluemonday & HTML sanitising \\
|
bluemonday & sanitising HTML \\
|
||||||
TailwindCSS & stylesheets using a utility-first approach \\
|
TailwindCSS & stylesheets using a utility-first approach \\
|
||||||
PostgreSQL & storing data \\
|
PostgreSQL & persistently storing data \\
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Table~\ref{tab:depsversionmx} contains the names and versions of the most
|
||||||
|
important libraries and supporting software that were used to build the
|
||||||
|
application.
|
||||||
|
|
||||||
\tab{Dependency-Version Matrix}{tab:depsversionmx}{1.0}{ll}{
|
\tab{Dependency-Version Matrix}{tab:depsversionmx}{1.0}{ll}{
|
||||||
\textbf{Name} & \textbf{version} \\
|
\textbf{Name} & \textbf{version} \\
|
||||||
\texttt{echo} (\url{https://echo.labstack.com/}) & 4.10.2 \\
|
\texttt{echo} (\url{https://echo.labstack.com/}) & 4.10.2 \\
|
||||||
@ -829,90 +1012,85 @@ programming language was used, i.e. \texttt{go1.20}.
|
|||||||
\texttt{PostgreSQL} (\url{https://www.postgresql.org/}) & 15.2 \\
|
\texttt{PostgreSQL} (\url{https://www.postgresql.org/}) & 15.2 \\
|
||||||
}
|
}
|
||||||
|
|
||||||
\n{2}{A word about Go}
|
|
||||||
First, a question of \textit{`Why pick Go for building a web
|
|
||||||
application?'} might arise, so the following few lines will try to address
|
|
||||||
that.
|
|
||||||
|
|
||||||
Go~\cite{golang}, or \emph{Golang} for SEO-friendliness and disambiguating Go
|
|
||||||
the ancient game, is a strongly typed, high-level \emph{garbage-collected}
|
|
||||||
language where functions are first-class citizens and errors are values.
|
|
||||||
|
|
||||||
The appeal for the author comes from a number of features of the language, such
|
|
||||||
as built-in support for concurrency and unit testing, sane \emph{zero} values,
|
|
||||||
lack of pointer arithmetic, inheritance and implicit type conversions,
|
|
||||||
easy-to-read syntax, producing a statically linked binary by default, etc., on
|
|
||||||
top of that, the language has got a cute mascot. Thanks to the foresight of the
|
|
||||||
Go Authors regarding \emph{the formatting question} (i.e.\ where to put the
|
|
||||||
braces, \textbf{tabs vs.\ spaces}, etc.), most of the discussions on this topic
|
|
||||||
have been foregone. Every \emph{gopher}~\footnote{euph.\ a person writing in
|
|
||||||
the Go programming language} is expected to format their source code with the
|
|
||||||
official formatter (\texttt{gofmt}), which automatically ensures that the code
|
|
||||||
adheres to the one formatting standard. Then, there is \emph{The Promise} of
|
|
||||||
backwards compatibility for Go 1.x, which makes it a good choice for long-term
|
|
||||||
without the fear of being rug-pulled.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{A word about Nix/devenv}
|
|
||||||
|
|
||||||
Nix (\url{https://builtwithnix.org/}) is a declarative package manager and a
|
|
||||||
functional programming language resembling Haskell, which has been used in this
|
|
||||||
project in the form of \texttt{devenv} tool (\url{https://devenv.sh/}) to
|
|
||||||
create \textbf{declarable} and \textbf{reproducible} development environment.
|
|
||||||
The author has previously used Nix directly with \emph{flakes} and liked
|
|
||||||
\texttt{devenv}, as it effectively exposed only a handful of parameters for
|
|
||||||
configuration, and rid of the need to manage the full flake, which is of course
|
|
||||||
still an option for people who choose so. See \texttt{devenv.nix} in the
|
|
||||||
repository root.
|
|
||||||
|
|
||||||
\n{1}{Application architecture}
|
\n{1}{Application architecture}
|
||||||
|
|
||||||
The source code of the main module further is split into Go \emph{packages}
|
\n{2}{Package structure}
|
||||||
appropriately along a couple of domains: logging, core application, web
|
|
||||||
routers, configuration and settings, etc. In Go, packages are delimited by
|
The source code of the main module is organised into smaller, self-contained Go
|
||||||
folder structure -- each folder can be package.
|
\emph{packages} appropriately along a couple of domains: logging, core
|
||||||
|
application, web routers, configuration and settings, etc. In Go, packages are
|
||||||
|
delimited by folder structure -- each folder can be a package.
|
||||||
|
|
||||||
Generally speaking, the program aggregates decision points into central places,
|
Generally speaking, the program aggregates decision points into central places,
|
||||||
such as \texttt{run.go}, which imports child packages that facilitate each of
|
such as \texttt{run.go}, which then imports child packages that facilitate each
|
||||||
loading the configuration, connecting to the database and running migrations,
|
of the task of loading the configuration, connecting to the database and
|
||||||
consolidating flag, environment variable and configuration-based values into
|
running migrations, consolidating flag, environment variable and
|
||||||
canonical \emph{settings}, setting up routes and handling graceful shutdown.
|
configuration-based values into canonical \emph{settings}, setting up routes
|
||||||
|
and handling graceful shutdown.
|
||||||
|
|
||||||
|
\n{3}{Internal package}
|
||||||
|
|
||||||
|
The \texttt{internal} package was not used as of writing, but the author plans
|
||||||
|
to eventually migrate \emph{internal} logic of the program into the internal
|
||||||
|
package to prevent accidental imports.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Logging}
|
||||||
|
|
||||||
The program uses dependency injection to share a single logger instance,
|
The program uses dependency injection to share a single logger instance,
|
||||||
similar applies to the database client. These are passed around as a pointer,
|
similar applies to the database client. These are passed around as a pointer,
|
||||||
so the underlying data stays the same. As a rule of thumb, every larger
|
so the underlying data stays the same. As a rule of thumb, every larger
|
||||||
\texttt{struct} that needs to be passed around is passed around as a pointer.
|
\texttt{struct} that needs to be passed around is passed around as a pointer.
|
||||||
|
|
||||||
The \texttt{internal} package was not used as of writing, but the author plans
|
|
||||||
to eventually migrate \emph{internal} logic of the program into the internal
|
|
||||||
package to prevent accidental imports.
|
|
||||||
|
|
||||||
The authentication logic is relatively simple and the author would like to
|
\n{2}{Authentication}
|
||||||
|
|
||||||
|
The authentication logic is relatively simple and the author attempted to
|
||||||
isolate it into a custom \emph{middleware}. User passwords are hashed using a
|
isolate it into a custom \emph{middleware}. User passwords are hashed using a
|
||||||
secure KDF before being sent to the database. The KDF used is \texttt{bcrypt}
|
secure KDF before being sent to the database. The KDF of choice is
|
||||||
(with a sane \emph{Cost} of 10), which automatically includes \emph{salt} for
|
\texttt{bcrypt} (with a sane \emph{Cost} of 10), which automatically includes
|
||||||
the password and provides ``length-constant'' time hash comparisons. The author
|
\emph{salt} for the password and provides ``length-constant'' time hash
|
||||||
plans to add support for the more modern \texttt{scrypt} and the
|
comparisons. The author plans to add support for the more modern
|
||||||
state-of-the-art, P-H-C (Password Hashing Competition) winner algorithm
|
\texttt{scrypt} and the state-of-the-art, P-H-C (Password Hashing Competition)
|
||||||
\texttt{Argon2} (\url{https://github.com/P-H-C/phc-winner-argon2}). Besides, no
|
winner algorithm \texttt{Argon2}
|
||||||
raw queries are used to access the database, helping decrease the likelihood of
|
(\url{https://github.com/P-H-C/phc-winner-argon2}) for flexibility.
|
||||||
SQL injection attacks.
|
|
||||||
|
\n{2}{SQLi prevention}
|
||||||
|
|
||||||
|
No raw SQL queries are directly used to access the database, thus decreasing
|
||||||
|
the likelihood of SQL injection attacks. Instead, parametric queries are
|
||||||
|
constructed in code using a graph-like API of the \texttt{ent} library, which
|
||||||
|
is attended to in-depth in Section~\ref{sec:dbschema}.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Configurability}
|
||||||
|
|
||||||
|
Virtually any important value in the program has been made into a configuration
|
||||||
|
value, so that the operator can customise the experience as needed. A choice of
|
||||||
|
sane configuration defaults was attempted, which resulted in the configuration
|
||||||
|
file essentially only needing to contain secrets, unless there is a need to
|
||||||
|
override the defaults. It is not entirely \emph{zero-config} situation, rather
|
||||||
|
a \emph{minimal-config} one. An example can be seen in
|
||||||
|
Section~\ref{sec:configuration}.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Embedded assets}
|
||||||
|
|
||||||
An important thing to mention is embedded assets and templates. Go has multiple
|
An important thing to mention is embedded assets and templates. Go has multiple
|
||||||
mechanisms to natively embed arbitrary files directly into the binary during
|
mechanisms to natively embed arbitrary files directly into the binary during
|
||||||
the regular build process. The built-in \texttt{embed} package was used to
|
the regular build process. The built-in \texttt{embed} package was used to
|
||||||
bundle all template files and web assets, such as images, logos and stylesheets
|
bundle all template files and web assets, such as images, logos and stylesheets
|
||||||
at the package level, and these are also the passed around the application as
|
at the package level, and these are also the passed around the application as
|
||||||
needed. There is also a toggle in the application configuration, which can
|
needed.
|
||||||
instruct the program at start to either rely entirely on embedded assets or
|
|
||||||
pull live files from the filesystem. The former option makes the application
|
There is also a toggle in the application configuration, which can instruct the
|
||||||
more portable, while the latter allows for flexibility not only during
|
program at start to either rely entirely on embedded assets or pull live files
|
||||||
development. Basically, any important value in the program has been made into a
|
from the filesystem. The former option makes the application more portable,
|
||||||
configuration value, so that the operator can customise the experience as
|
while the latter allows for flexibility not only during development.
|
||||||
needed. A choice of sane configuration defaults was attempted, which resulted
|
|
||||||
in the configuration file essentially only needing to contain secrets, unless
|
|
||||||
there is a need to override the defaults. It is not entirely \emph{zero-config}
|
\n{2}{Composability}
|
||||||
situation, rather a \emph{minimal-config} one.
|
|
||||||
|
|
||||||
Templates used for rendering of the web pages were created in a composable
|
Templates used for rendering of the web pages were created in a composable
|
||||||
manner, split into smaller, reusable parts, such as \texttt{footer.tmpl} and
|
manner, split into smaller, reusable parts, such as \texttt{footer.tmpl} and
|
||||||
@ -924,6 +1102,9 @@ performed ergonomically and directly using Echo's built-in facilities. A
|
|||||||
popular HTML sanitiser \emph{bluemonday} has been employed to aid with battling
|
popular HTML sanitiser \emph{bluemonday} has been employed to aid with battling
|
||||||
XSS.
|
XSS.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Server-side rendering}
|
||||||
|
|
||||||
The application constructs the web pages entirely server-side and it runs
|
The application constructs the web pages entirely server-side and it runs
|
||||||
without a single line of JavaScript, of which the author is especially proud.
|
without a single line of JavaScript, of which the author is especially proud.
|
||||||
It improves load times, decreases attack surface, increases maintainability and
|
It improves load times, decreases attack surface, increases maintainability and
|
||||||
@ -933,12 +1114,8 @@ updates (where \texttt{PUT}s should be used) and the accompanying frequent
|
|||||||
full-page refreshes, but that still is not enough to warrant the use of
|
full-page refreshes, but that still is not enough to warrant the use of
|
||||||
JavaScript.
|
JavaScript.
|
||||||
|
|
||||||
As an aside, the author has briefly experimented with WebAssembly for this
|
|
||||||
project, but has ultimately scrapped the functionality in favour of the
|
\n{2}{Frontend}
|
||||||
entirely server-side rendered one. It is possible that it would get revisited
|
|
||||||
if the client-side dynamic functionality was necessary and performance
|
|
||||||
mattered. Even from the short experiments it was obvious how much faster
|
|
||||||
WebAssembly was compared to JavaScript.
|
|
||||||
|
|
||||||
Frontend-side, the application was styled using TailwindCSS, which promotes
|
Frontend-side, the application was styled using TailwindCSS, which promotes
|
||||||
using of flexible \emph{utility-first} classes in the markup (HTML) instead of
|
using of flexible \emph{utility-first} classes in the markup (HTML) instead of
|
||||||
@ -950,61 +1127,112 @@ need to be parsed by Tailwind in order to construct its final stylesheet and
|
|||||||
there is also an original CLI tool for that called \texttt{tailwindcss}.
|
there is also an original CLI tool for that called \texttt{tailwindcss}.
|
||||||
Overall, simple and accessible layouts had preference over convoluted ones.
|
Overall, simple and accessible layouts had preference over convoluted ones.
|
||||||
|
|
||||||
|
\n{3}{Frontend experiments}
|
||||||
|
|
||||||
|
As an aside, the author has briefly experimented with WebAssembly for this
|
||||||
|
project, but has ultimately scrapped the functionality in favour of the
|
||||||
|
entirely server-side rendered one. It is possible that it would get revisited
|
||||||
|
if the client-side dynamic functionality was necessary and performance
|
||||||
|
mattered. Even from the short experiments it was obvious how much faster
|
||||||
|
WebAssembly was compared to JavaScript.
|
||||||
|
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
\n{2}{User isolation}
|
||||||
|
|
||||||
|
Users are allowed into certain parts of the application based on the role they
|
||||||
|
currently posses. For the moment, two basic roles were envisioned, while this
|
||||||
|
list might get amended in the future, should the need arise:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item Administrator
|
||||||
|
\item User
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\obr{Application use case diagram}{fig:usecasediagram}{.9}{graphics/pcmt-use-case.pdf}
|
||||||
|
|
||||||
|
It is paramount that the program protects itself from the insider threats as
|
||||||
|
well and therefore each role is only able to perform actions that it is
|
||||||
|
explicitly assigned. While there definitely is certain overlap between the
|
||||||
|
capabilities of the two outlined roles, each also possesses unique features
|
||||||
|
that the other one does not.
|
||||||
|
|
||||||
|
For example, the administrator role is not able to perform searches on the
|
||||||
|
breach data directly using their administrator account, for that a separate
|
||||||
|
user account has to be devised. Similarly, the regular user is not able to
|
||||||
|
manage breach lists and other users, because that is a privileged operation.
|
||||||
|
|
||||||
|
In-application administrators are not able to view sensitive (any) user data
|
||||||
|
and should therefore only be able to perform the following actions:
|
||||||
|
|
||||||
|
\begin{itemize}
|
||||||
|
\item Create user accounts
|
||||||
|
\item View list of users
|
||||||
|
\item View user email
|
||||||
|
\item Change user email
|
||||||
|
\item Toggle whether user is an administrator
|
||||||
|
\item Delete user accounts
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
Let us consider a case when a user manages self, while demoting from
|
||||||
|
administrator to a regular user is permitted, promoting self to be an
|
||||||
|
administrator would constitute a \emph{privilege escalation} and likely be a
|
||||||
|
precursor to at least a \emph{denial of service} of sorts.
|
||||||
|
|
||||||
|
|
||||||
|
\n{2}{Zero trust principle}
|
||||||
|
|
||||||
|
\textit{Confidentiality, i.e.\ not trusting the provider}
|
||||||
|
|
||||||
|
There is no way for the application (and consequently, the in-application
|
||||||
|
administrator) to read user's data. This is possible by virtue of encrypting
|
||||||
|
the pertinent data before saving them in the database by a state-of-the-art
|
||||||
|
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which is in
|
||||||
|
turn safely stored encrypted by a passphrase that only the user controls. Of
|
||||||
|
course, the user-supplied password is run by a password based key derivation
|
||||||
|
function (PBKDF: a key derivation function with a sliding computational cost)
|
||||||
|
before letting it encrypt the \emph{age} key.
|
||||||
|
|
||||||
|
The \emph{age} key is only generated when the user changes their password for
|
||||||
|
the first time to prevent scenarios such as in-application administrator with
|
||||||
|
access to physical database being able to both \textbf{recover} the key from
|
||||||
|
the database and \textbf{decrypt} it given that they already know the user
|
||||||
|
password (because they set it), which would subsequently give them unbounded
|
||||||
|
access to any future encrypted data, as long as they would be able to maintain
|
||||||
|
their database access. This is why the \emph{age} key generation and protection
|
||||||
|
are bound to the first password change. Of course, the evil administrator could
|
||||||
|
just perform the change themselves; however, the user would at least be able to
|
||||||
|
find those changes in the activity logs and know not to use the application.
|
||||||
|
But given the scenario of a total database compromise, the author finds all
|
||||||
|
hope is already lost at that point. At least when the database is dumped, it
|
||||||
|
only contains non-sensitive, functional information in plain test, everything
|
||||||
|
else should be encrypted.
|
||||||
|
|
||||||
|
Consequently, both the application operators and the in-application
|
||||||
|
administrators should never be able to learn the details of what the user is
|
||||||
|
tracking, the same being applicable even to potential attackers with direct
|
||||||
|
access to the database. Thus the author maintains that every scenario that
|
||||||
|
could potentially lead to a data breach (apart from a compromised user machine
|
||||||
|
and the like) would have to entail some form of operating memory acquisition,
|
||||||
|
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
|
||||||
|
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Implementation}
|
\n{1}{Implementation}
|
||||||
|
|
||||||
\n{2}{Configuration}
|
\n{2}{Dhall Configuration Schema}\label{sec:configuration}
|
||||||
|
|
||||||
Every non-trivial program usually offers at least \emph{some} way to
|
|
||||||
tweak/manage its behaviour, and these changes are usually persisted
|
|
||||||
\emph{somewhere} on the filesystem of the host: in a local SQLite3 database, a
|
|
||||||
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
|
||||||
configuration file. These configuration files need to be read and checked at
|
|
||||||
least on program start-up and either stored into operating memory for the
|
|
||||||
duration of the runtime of the program, or loaded and parsed and the memory
|
|
||||||
subsequently \emph{freed} (initial configuration).
|
|
||||||
|
|
||||||
There is an abundance of configuration languages (or file formats used to craft
|
|
||||||
configuration files, whether they were intended for it or not) available, TOML,
|
|
||||||
INI, JSON, YAML, to name some of the popular ones (as of today).
|
|
||||||
|
|
||||||
Dhall stood out as a language that was designed with both security and the
|
|
||||||
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
|
||||||
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
|
||||||
few of its concepts from Haskell), and in its apparent core being very similar
|
|
||||||
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
|
||||||
is: ``a programmable configuration language that you can think of as: JSON +
|
|
||||||
functions + types + imports''~\cite{dhalllang}.
|
|
||||||
|
|
||||||
Among all of the listed features, the especially intriguing one to the author
|
|
||||||
was the promise of \emph{types}. There are multiple examples directly on the
|
|
||||||
project's documentation webpage demonstrating for instance the declaration and
|
|
||||||
usage of custom types (that are, of course merely combinations of the primitive
|
|
||||||
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
|
||||||
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
|
||||||
designing a custom configuration \emph{schema} for the program.
|
|
||||||
Dhall not being a Turing-complete language also guarantees that evaluation
|
|
||||||
\emph{always} terminates eventually, which is a good attribute to possess as a
|
|
||||||
configuration language.
|
|
||||||
|
|
||||||
|
|
||||||
\n{3}{Dhall Schema}
|
|
||||||
|
|
||||||
The configuration schema was at first being developed as part of the main
|
The configuration schema was at first being developed as part of the main
|
||||||
project's repository, before it was determined that it would benefit both the
|
project's repository, before it was determined that it would benefit both the
|
||||||
development and overall clarity if the schema lived in its own repository (see
|
development and overall clarity if the schema lived in its own repository (see
|
||||||
Section~\ref{sec:repos} for details). This enabled it to be independently
|
Section~\ref{sec:repos} for details). This now enables the schema to be
|
||||||
developed and versioned, and only pulled into the main application whenever it
|
independently developed and versioned, and only pulled into the main
|
||||||
is determined the application is ready for it.
|
application whenever the application is determined to be ready for it.
|
||||||
|
|
||||||
The full schema with type annotations can be seen in Listing~\ref{dhallschema}.
|
|
||||||
The \texttt{let} statement declares a variable called \texttt{Schema} and
|
|
||||||
assigns it the result of the expression on the right side of the equals sign,
|
|
||||||
which has for practical reasons been trimmed and is displayed without the
|
|
||||||
\emph{default} block, which is instead shown in its own
|
|
||||||
Listing~\ref{dhallschemadefaults}.
|
|
||||||
|
|
||||||
\vspace{\parskip}
|
% \vspace{\parskip}
|
||||||
|
\smallskip
|
||||||
|
% \vspace{\baselineskip}
|
||||||
\begin{lstlisting}[language=Haskell, caption={Dhall configuration schema version 0.0.1-rc.2},
|
\begin{lstlisting}[language=Haskell, caption={Dhall configuration schema version 0.0.1-rc.2},
|
||||||
label=dhallschema, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
label=dhallschema, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
||||||
let Schema =
|
let Schema =
|
||||||
@ -1055,8 +1283,16 @@ let Schema =
|
|||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
\vspace*{-\baselineskip}
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
The main configuration is comprised of both raw attributes and child records,
|
Full schema with type annotations can be seen in Listing~\ref{dhallschema}.
|
||||||
which allow for grouping of related functionality. For instance, configuration
|
|
||||||
|
The \texttt{let} statement declares a variable called \texttt{Schema} and
|
||||||
|
assigns to it the result of the expression on the right side of the equals
|
||||||
|
sign, which has for practical reasons been trimmed and is displayed without the
|
||||||
|
\emph{default} block. The default block is instead shown in its own
|
||||||
|
Listing~\ref{dhallschemadefaults}.
|
||||||
|
|
||||||
|
The main configuration comprises both raw attributes and child records, which
|
||||||
|
allow for grouping of related functionality. For instance, configuration
|
||||||
settings pertaining mailserver setup are grouped in a record named
|
settings pertaining mailserver setup are grouped in a record named
|
||||||
\textbf{Mailer}. Its attribute \textbf{Enabled} is annotated as \textbf{Bool},
|
\textbf{Mailer}. Its attribute \textbf{Enabled} is annotated as \textbf{Bool},
|
||||||
which was deemed appropriate for a on-off switch-like functionality, with the
|
which was deemed appropriate for a on-off switch-like functionality, with the
|
||||||
@ -1067,10 +1303,19 @@ while \textbf{true} is evaluated as an \emph{unbound} variable, that is, a
|
|||||||
variable \emph{not} defined in the current \emph{scope} and thus not
|
variable \emph{not} defined in the current \emph{scope} and thus not
|
||||||
\emph{present} in the current scope.
|
\emph{present} in the current scope.
|
||||||
|
|
||||||
\vspace{\parskip}
|
Another one of Dhall specialties is that `$==$' and `$!=$' (in)equality
|
||||||
|
operators \textbf{only} work on values of type \texttt{Bool}, which for example
|
||||||
|
means that variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text}
|
||||||
|
(\texttt{string}) cannot be compared directly as in other languages, which
|
||||||
|
either leaves the work for a higher-level language (such as Go), or from the
|
||||||
|
perspective of the Dhall authors, \emph{enums} are promoted when the value
|
||||||
|
matters.
|
||||||
|
|
||||||
|
\newpage
|
||||||
|
% \vspace{\parskip}
|
||||||
\begin{lstlisting}[language=Haskell, caption={Dhall configuration defaults for
|
\begin{lstlisting}[language=Haskell, caption={Dhall configuration defaults for
|
||||||
schema version 0.0.1-rc.2},
|
schema version 0.0.1-rc.2},
|
||||||
label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
label=dhallschemadefaults, basicstyle=\linespread{0.9}\footnotesize\ttfamily]
|
||||||
, default =
|
, default =
|
||||||
-- | have sane defaults.
|
-- | have sane defaults.
|
||||||
{ Host = ""
|
{ Host = ""
|
||||||
@ -1122,8 +1367,7 @@ label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
|||||||
, Init =
|
, Init =
|
||||||
{ CreateAdmin =
|
{ CreateAdmin =
|
||||||
-- | if this is True, attempt to create a user with admin
|
-- | if this is True, attempt to create a user with admin
|
||||||
-- | privileges with the password specified below (or better -
|
-- | privileges with the password specified below
|
||||||
-- | overriden); it fails if users already exist in the DB.
|
|
||||||
False
|
False
|
||||||
, AdminPassword =
|
, AdminPassword =
|
||||||
-- | used for the first admin, forced change on first login.
|
-- | used for the first admin, forced change on first login.
|
||||||
@ -1135,71 +1379,9 @@ label=dhallschemadefaults, basicstyle=\linespread{0.9}\scriptsize\ttfamily]
|
|||||||
|
|
||||||
in Schema
|
in Schema
|
||||||
\end{lstlisting}
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
Another one of specialties of Dhall is that $==$ and $!=$ equality operators
|
\vspace*{-\baselineskip}
|
||||||
only work on values of type \texttt{Bool}, which for example means that
|
\vspace*{-\baselineskip}
|
||||||
variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text}
|
|
||||||
(\texttt{string}) cannot be compared directly as in other languages, which
|
|
||||||
either leaves the work for a higher-level language (such as Go), or from the
|
|
||||||
perspective of the Dhall authors, \emph{enums} are promoted when the value
|
|
||||||
matters.
|
|
||||||
|
|
||||||
|
|
||||||
\n{3}{Safety considerations}
|
|
||||||
|
|
||||||
Having a programmable configuration language that understands functions and
|
|
||||||
allows importing not only arbitrary text from random internet URLs, but also
|
|
||||||
importing and \emph{evaluating} (i.e.\ running) potentially untrusted code, it
|
|
||||||
is important that there are some safety mechanisms employed, which can be
|
|
||||||
relied on by the user. Dhall offers this in multiple features: enforcing a
|
|
||||||
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
|
||||||
of the expression being imported.
|
|
||||||
|
|
||||||
|
|
||||||
\n{3}{Possible alternatives}
|
|
||||||
|
|
||||||
While developing the program, the author has also
|
|
||||||
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
|
||||||
cache}, which can generally be observed in the scenario of running the program
|
|
||||||
in an environment that does not allow to write the cache files (a read-only
|
|
||||||
filesystem), of does not keep the written cache files, such as a container that
|
|
||||||
is not configured to mount a persistent volume at the pertinent location.
|
|
||||||
|
|
||||||
To describe the way Dhall works when performing an evaluation, it resolves
|
|
||||||
every expression down to a combination of its most basic types (eliminating all
|
|
||||||
abstraction and indirection) in the process called
|
|
||||||
\textbf{normalisation}~\cite{dhallnorm} and then saves this result in the
|
|
||||||
host's cache. The \texttt{dhall-haskell} binary attempts to resolve the
|
|
||||||
variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
|
|
||||||
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
|
|
||||||
results of the normalisation will be written for repeated use. Do note that
|
|
||||||
this behaviour has been observed on a GNU/Linux host and the author has not
|
|
||||||
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
|
||||||
|
|
||||||
If normalisation is performed inside an ephemeral container (as opposed to, for
|
|
||||||
instance, an interactive desktop session), the results effectively get lost on
|
|
||||||
each container restart, which is both wasteful and not great for user
|
|
||||||
experience, since the normalisation of just a handful of imports (which
|
|
||||||
internally branches widely) can take an upwards of two minutes, during which
|
|
||||||
the user is left waiting for the hanging application with no reporting on the
|
|
||||||
progress or current status.
|
|
||||||
|
|
||||||
While workarounds for the above mentioned problem can be devised relatively
|
|
||||||
easily (such as bind mounting persistent volumes inside the container in place
|
|
||||||
of the \texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
|
|
||||||
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} to preserve the cache between
|
|
||||||
restarts, or let the cache be pre-computed during container build, since the
|
|
||||||
application is only really expected to run together with a compatible version
|
|
||||||
of the configuration schema and this version \emph{is} known at container build
|
|
||||||
time), it would certainly feel better if there was no need to work
|
|
||||||
\emph{around} the configuration system of choice.
|
|
||||||
|
|
||||||
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
|
||||||
as a potentially almost drop-in replacement for Dhall feature-wise, while also
|
|
||||||
resolving costly \emph{cold cache} normalisation operations, which is in
|
|
||||||
author's view Dhall's titular issue.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Data integrity and authenticity}
|
\n{2}{Data integrity and authenticity}
|
||||||
|
|
||||||
The user can interact with the application via a web client, such as a browser,
|
The user can interact with the application via a web client, such as a browser,
|
||||||
@ -1243,194 +1425,20 @@ e.g.\ for tamper protection purposes and similar; however, that work remains
|
|||||||
yet to be materialised.
|
yet to be materialised.
|
||||||
|
|
||||||
|
|
||||||
\n{2}{User isolation}
|
\n{2}{Database schema}\label{sec:dbschema}
|
||||||
|
|
||||||
Users are allowed into certain parts of the application based on the role they
|
|
||||||
currently posses. For the moment, two basic roles were envisioned, while this
|
|
||||||
list might get amended in the future, if the need arises:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item Administrator
|
|
||||||
\item User
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
It is paramount that the program protects itself from the insider threats as
|
|
||||||
well and therefore each role is only able to perform actions that it is
|
|
||||||
explicitly assigned. While there definitely is certain overlap between the
|
|
||||||
capabilities of the two outlined roles, each also possesses unique features
|
|
||||||
that the other does not.
|
|
||||||
|
|
||||||
For example, the administrator role is not able to perform searches on the
|
|
||||||
breach data directly using their administrator account, for that a separate
|
|
||||||
user account has to be devised. Similarly, the regular user is not able to
|
|
||||||
manage breach lists and other users, because that is a privileged operation.
|
|
||||||
|
|
||||||
In-application administrators are not able to view sensitive (any) user data
|
|
||||||
and should therefore only be able to perform the following actions:
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item Create user accounts
|
|
||||||
\item View list of users
|
|
||||||
\item View user email
|
|
||||||
\item Change user email
|
|
||||||
\item Change user email
|
|
||||||
\item Toggle whether user is an administrator
|
|
||||||
\item Delete user accounts
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
Let us consider a case when a user manages self, while demoting from
|
|
||||||
administrator to a regular user is permitted, promoting self to be an
|
|
||||||
administrator would constitute a \emph{privilege escalation} and likely be a
|
|
||||||
precursor to at least a \emph{denial of service} of sorts.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Zero trust principle}
|
|
||||||
|
|
||||||
\textit{Data confidentiality, i.e.\ not trusting the provider}
|
|
||||||
|
|
||||||
There is no way for the application (and consequently, the in-application
|
|
||||||
administrator) to read user's data. This is possible by virtue of encrypting
|
|
||||||
the pertinent data before saving them in the database by a state-of-the-art
|
|
||||||
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which in
|
|
||||||
turn is safely stored encrypted by a passphrase that only the user controls. Of
|
|
||||||
course, the user-supplied password is run by a password based key derivation
|
|
||||||
function (PBKDF: a key derivation function with a sliding computational cost)
|
|
||||||
before letting it encrypt the \emph{age} key.
|
|
||||||
|
|
||||||
The \emph{age} key is only generated when the user changes their password for
|
|
||||||
the first time to prevent scenarios such as in-application administrator with
|
|
||||||
access to physical database being able to both \textbf{recover} the key from
|
|
||||||
the database and \textbf{decrypt} it given that they already know the user
|
|
||||||
password (because they set it), which would subsequently give them unbounded
|
|
||||||
access to any future encrypted data, as long as they would be able to maintain
|
|
||||||
their database access. This is why the \emph{age} key generation and protection
|
|
||||||
are bound to the first password change. Of course, the evil administrator could
|
|
||||||
just perform the change themselves; however, the user would at least be able to
|
|
||||||
find those changes in the activity logs and know not to use the application.
|
|
||||||
But given the scenario of a total database compromise, the author finds all
|
|
||||||
hope is already lost at that point. At least when the database is dumped, it
|
|
||||||
only contains non-sensitive, functional information in plain test, everything
|
|
||||||
else should be encrypted.
|
|
||||||
|
|
||||||
Consequently, both the application operators and the in-application
|
|
||||||
administrators should never be able to learn the details of what the user is
|
|
||||||
tracking, the same being applicable even to potential attackers with direct
|
|
||||||
access to the database. Thus the author maintains that every scenario that
|
|
||||||
could potentially lead to a data breach (apart from a compromised user machine
|
|
||||||
and the like) would have to entail some form of operating memory acquisition,
|
|
||||||
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
|
|
||||||
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Compromise Monitoring}
|
|
||||||
|
|
||||||
\n{3}{Have I Been Pwned? Integration}
|
|
||||||
|
|
||||||
Troy Hunt's Have I Been Pwned? online service
|
|
||||||
(\url{https://haveibeenpwned.com/}) has been chosen as the online source of
|
|
||||||
compromised data. The service offers private APIs that are protected by API
|
|
||||||
keys. The application's \texttt{hibp} module and database representation models
|
|
||||||
the values returned by this API, which allows searching in large breaches using
|
|
||||||
email addresses.\\
|
|
||||||
The architecture there is relatively simple: the application administrator
|
|
||||||
configures an API key for HIBP, the user enters the query parameters, the
|
|
||||||
application constructs a query and calls the API and waits for a response. As
|
|
||||||
the API is rate-limited based on the key supplied, this can pose an issue and
|
|
||||||
it has not been fully resolved in the UI. The application then parses the
|
|
||||||
returned data and binds it to the local model for validation. If that goes
|
|
||||||
well, the data is saved into the database as a cache and the search query is
|
|
||||||
performed on the saved data. If it returns anything, it is displayed to the
|
|
||||||
user for browsing.
|
|
||||||
|
|
||||||
|
|
||||||
\n{3}{Local Dataset Plugin} Breach data from locally available datasets can be
|
|
||||||
imported into the application by first making sure it adheres to the specified
|
|
||||||
schema (have a look at the \emph{breach data schema} in
|
|
||||||
Listing~\ref{breachDataGoSchema}). If it doesn't (which is very likely with
|
|
||||||
random breach data), it needs to be converted to a form that does before
|
|
||||||
importing it to the application, e.g.\ using a Python script or similar.
|
|
||||||
Attempting to import data that does not follow the outlined schema would result
|
|
||||||
in an error. Also, importing a dataset which is over a reasonable size limit
|
|
||||||
would by default be rejected by the program as a precaution, since marshaling
|
|
||||||
e.g.\ a 1 TiB document would likely result in an OOM situation on the host,
|
|
||||||
assuming regular consumer hardware conditions, not HPC.
|
|
||||||
|
|
||||||
\vspace{\parskip}
|
|
||||||
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go struct with imports from the standard library are assumed},
|
|
||||||
label=breachDataGoSchema]
|
|
||||||
type breachDataSchema struct {
|
|
||||||
Name string
|
|
||||||
Time time.Time
|
|
||||||
IsVerified bool
|
|
||||||
ContainsPasswords bool
|
|
||||||
ContainsHashes bool
|
|
||||||
HashType string
|
|
||||||
HashSalted bool
|
|
||||||
HashPepperred bool
|
|
||||||
ContainsUsernames bool
|
|
||||||
ContainsEmails bool
|
|
||||||
Data any
|
|
||||||
}
|
|
||||||
\end{lstlisting}
|
|
||||||
\vspace*{-\baselineskip}
|
|
||||||
|
|
||||||
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
|
|
||||||
actuality be written and supplied by the user of the program as a YAML
|
|
||||||
document. YAML was chosen for multiple reasons: relative ease of use (plain
|
|
||||||
text, readable, support for inclusion of comments, its capability to store
|
|
||||||
multiple \emph{documents} inside of a single file with most of the inputs
|
|
||||||
implicitly typed as strings while thanks to being a superset of JSON it sports
|
|
||||||
machine readability. That should allow for documents similar to what can be
|
|
||||||
seen in Listing~\ref{breachDataYAMLSchema} to be ingested by the program,
|
|
||||||
read and written by humans and programs alike.
|
|
||||||
|
|
||||||
\smallskip
|
|
||||||
\begin{lstlisting}[language=YAML, caption={Example Breach Data Schema supplied
|
|
||||||
to the program as a YAML file, optionally containing multiple documents},
|
|
||||||
label=breachDataYAMLSchema]
|
|
||||||
---
|
|
||||||
name: Horrible breach
|
|
||||||
time: 2022-04-23T00:00:00Z+02:00
|
|
||||||
isVerified: false
|
|
||||||
containsPasswds: false
|
|
||||||
containsHashes: true
|
|
||||||
containsEmails: true
|
|
||||||
hashType: md5
|
|
||||||
hashSalted: false
|
|
||||||
hashPeppered: false
|
|
||||||
data:
|
|
||||||
hashes:
|
|
||||||
- hash1
|
|
||||||
- hash2
|
|
||||||
- hash3
|
|
||||||
emails:
|
|
||||||
- email1
|
|
||||||
-
|
|
||||||
- email3
|
|
||||||
---
|
|
||||||
# document #2, describing another breach.
|
|
||||||
name: Horrible breach 2
|
|
||||||
...
|
|
||||||
\end{lstlisting}
|
|
||||||
\vspace*{-\baselineskip}
|
|
||||||
|
|
||||||
Notice how the emails list in Listing~\ref{breachDataYAMLSchema} misses one
|
|
||||||
record, perhaps because it was not supplied or mistakenly omitted. This is a
|
|
||||||
valid scenario (mistakes happen) and the application needs to be able to handle
|
|
||||||
it. The alternative would be to require the user to prepare the data in such a
|
|
||||||
way that the empty/partial records would be dropped entirely.
|
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Database configuration}
|
|
||||||
|
|
||||||
The database schema is not being created manually in the database. Instead, an
|
The database schema is not being created manually in the database. Instead, an
|
||||||
Object-relational Mapping (ORM) tool named ent is used, which allows defining
|
Object-relational Mapping (ORM) tool named ent is used, which allows defining
|
||||||
the table schema and relations entirely in Go.
|
the table schema and relations entirely in Go. The application does not need
|
||||||
|
for the database schema to be pre-created when the application starts, it only
|
||||||
|
requires a connection string providing access to the database for a reasonably
|
||||||
|
privileged user if that is the case.
|
||||||
|
|
||||||
The best part about ent is that there is no need to define supplemental methods
|
The best part about \texttt{ent} is that there is no need to define
|
||||||
on the models, since with ent these are meant to be \emph{code generated} (in
|
supplemental methods on the models, as with \texttt{ent} these are meant to be
|
||||||
the older sense of word, not with Large Language Models). That creates files
|
\emph{code generated} (in the older sense of word, not with Large Language
|
||||||
with models based on the types of the attributes in the database model, and the
|
Models) into existence. Code generation creates files with actual Go models
|
||||||
|
based on the types of the attributes in the database schema model, and the
|
||||||
respective relations are transformed into methods on the receiver or functions
|
respective relations are transformed into methods on the receiver or functions
|
||||||
taking object attributes as arguments.
|
taking object attributes as arguments.
|
||||||
|
|
||||||
@ -1449,90 +1457,164 @@ These methods can further be imported into other packages and this makes
|
|||||||
working with the database a morning breeze.
|
working with the database a morning breeze.
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Production}
|
\n{1}{Deployment}
|
||||||
|
|
||||||
It is, of course, recommended that the application runs in a secure environment
|
\textbf{TODO}: mention how \texttt{systemd} aids in running the pod.
|
||||||
\allowbreak although definitions of that almost certainly differ depending on
|
|
||||||
who you ask. General recommendations would be either to effectively reserve a
|
|
||||||
machine for a single use case - running this program - so as to dramatically
|
|
||||||
decrease the potential attack surface of the host, or run the program isolated
|
|
||||||
in a container or a virtual machine. Further, if the host does not need
|
|
||||||
management access (it is a deployed-to-only machine that is configured
|
|
||||||
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
|
||||||
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
|
||||||
needed. In an ideal scenario, the host machine would have as little software
|
|
||||||
installed as possible besides what the application absolutely requires.
|
|
||||||
|
|
||||||
A demonstration of the above can be found in the multi-stage Containerfile that
|
A deployment setup as suggested in Section~\ref{sec:deploymentRecommendations}
|
||||||
is available in the main sources. The resulting container image only contains a
|
is already partially covered by the multi-stage \texttt{Containerfile} that is
|
||||||
statically linked copy of the program, a default configuration file and
|
available in the main sources. Once built, the resulting container image only
|
||||||
corresponding Dhall expressions cached at build time, which only support the
|
contains a handful of things it absolutely needs:
|
||||||
main configuration file. Since the program also needs a database, an example
|
|
||||||
scenario could include the container being run in a Podman pod together with
|
|
||||||
the database, which would not have to be exposed from the pod and would
|
|
||||||
therefore only be available over \texttt{localhost}.
|
|
||||||
|
|
||||||
It goes without saying that the operator should substitute values of any
|
\begin{itemize}
|
||||||
default configuration secrets with the new ones that were securely generated.
|
\item a statically linked copy of the program
|
||||||
|
\item a default configuration file and corresponding Dhall expressions cached
|
||||||
|
at build time
|
||||||
|
\item a recent CA certs bundle
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
System-wide cryptographic policies should target highest feasible security
|
Since the program also needs a database for proper functioning, an example
|
||||||
level, if at all available (such as by default on Fedora or RHEL), covering
|
scenario includes the application container being run in a Podman \textbf{pod}
|
||||||
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
|
together with the database. That results in not having to expose the database
|
||||||
and SELinux (kernel-level mandatory access control and security policy
|
to the entire host or out of the pod at all, it is only be available over pod's
|
||||||
mechanism) running in \emph{enforcing} mode, if available.
|
\texttt{localhost}.
|
||||||
|
|
||||||
|
It goes without saying that the default values of any configuration secrets
|
||||||
|
should be substituted by the application operator with new, securely generated
|
||||||
|
ones.
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Deployment recommendations}
|
\n{2}{Rootless Podman}
|
||||||
|
|
||||||
\n{3}{Transport security}
|
Assuming rootless Podman set up and the \texttt{just} tool installed on the
|
||||||
|
host, the application could be deployed by following a series of relatively
|
||||||
|
simple steps:
|
||||||
|
|
||||||
User connecting to the application should rightfully expect for their data to
|
\begin{itemize}
|
||||||
be protected \textit{in transit} (i.e.\ on the way between their browser and
|
\item build (or pull) the application container image
|
||||||
the server), which is what \emph{Transport Layer Security} family of
|
\item create a pod with user namespacing, exposing the application port
|
||||||
protocols~\cite{tls13rfc8446} was designed for, and which is the underpinning
|
\item run the database container inside the pod
|
||||||
of HTTPS. TLS utilises the primitives of asymmetric cryptography to let the
|
\item run the application inside the pod
|
||||||
client authenticate the server (verify that it is who it claims it is) and
|
\end{itemize}
|
||||||
negotiate a symmetric key for encryption in the process named the \emph{TLS
|
|
||||||
handshake} (see Section~\ref{sec:tls} for more details), the final purpose of
|
|
||||||
which is establishing a secure communications connection. The operator should
|
|
||||||
configure the program to either directly utilise TLS using configuration or
|
|
||||||
have it listen behind a TLS-terminating \emph{reverse proxy}.
|
|
||||||
|
|
||||||
|
In concrete terms, it would resemble something along the lines of
|
||||||
|
Listing~\ref{podmanDeployment}. Do note that all the commands are executed
|
||||||
|
under the unprivileged \texttt{user@containerHost} that is running rootless
|
||||||
|
Podman, i.e.\ it has \texttt{UID}/\texttt{GID} mapping entries in
|
||||||
|
\texttt{/etc/setuid} and \texttt{\etc/setgid} files \textbf{prior} to running any
|
||||||
|
Podman commands.
|
||||||
|
|
||||||
\n{3}{Containerisation}
|
% \newpage
|
||||||
Whether the pre-built or a custom container image is used to deploy the
|
|
||||||
application, it still needs access to secrets, such as database connection
|
|
||||||
string (containing database host, port, user, password/encrypted password,
|
|
||||||
authentication method and database name).
|
|
||||||
|
|
||||||
Currently, the application is able to handle \emph{peer}, \emph{scram-sha-256},
|
\begin{lstlisting}[language=bash, caption={Example application deployment using
|
||||||
\emph{user name maps} and raw \emph{password} as Postgres authentication
|
rootless Podman},
|
||||||
methods~\cite{pgauthmethods}, although the \emph{password} option should not be
|
label=podmanDeployment, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||||
used in production, \emph{unless} the connection to the database is protected
|
# From inside the project folder, build the image locally using kaniko.
|
||||||
by TLS.\ In any case, using the \emph{scram-sha-256}~\cite{scramsha256rfc7677}
|
just kaniko
|
||||||
method is preferable and one way to verify in development environment that
|
|
||||||
everything works as intended is the \emph{Password generator for PostgreSQL}
|
|
||||||
tool~\cite{goscramsha256}, which allows to get the encrypted string from a raw
|
|
||||||
user input.
|
|
||||||
|
|
||||||
If the application running in a container wants to use the \emph{peer}
|
# Create a pod.
|
||||||
authentication method, it is up to the operator to supply the Postgres socket
|
podman pod create --userns=keep-id -p3005:3000 --name pcmt
|
||||||
to the application (e.g.\ as a volume bind mount). This scenario was not
|
|
||||||
tested; however, and the author is also not entirely certain how \emph{user
|
|
||||||
namespaces} (on GNU/Linux) would influence the process (given that the
|
|
||||||
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
|
||||||
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
|
||||||
need to account.
|
|
||||||
|
|
||||||
Equally, if the application is running inside the container, the operator needs
|
# Run the database in the pod.
|
||||||
to make sure that the database is either running in a network that is also
|
podman run --pod pcmt --replace -d --name "pcmt-pg" --rm \
|
||||||
directly attached to the container or that there is a mechanism in place that
|
-e POSTGRES_INITDB_ARGS="--auth-host=scram-sha-256 \
|
||||||
routes the requests for the database hostname to the destination.
|
--auth-local=scram-sha-256" \
|
||||||
|
-e POSTGRES_PASSWORD=postgres -v $PWD/tmp/db:/var/lib/postgresql/data \
|
||||||
|
docker.io/library/postgres:15.2-alpine3.17
|
||||||
|
|
||||||
One such mechanism is container name based routing inside \emph{pods}
|
# Run the application in the pod.
|
||||||
(Podman/Kubernetes), where the resolution of container names is the
|
podman run --pod pcmt --replace --name pcmt-og -d --rm \
|
||||||
responsibility of a specially configured piece of software called Aardvark for
|
-e PCMT_LIVE=False \
|
||||||
the former and CoreDNS for the latter.
|
-e PCMT_DBTYPE="postgres" \
|
||||||
|
-e PCMT_CONNSTRING="host=pcmt-pg port=5432 sslmode=disable \
|
||||||
|
user=postgres dbname=postgres password=postgres"
|
||||||
|
-v $PWD/config.dhall:/config.dhall:ro \
|
||||||
|
docker.io/immawanderer/pcmt:testbuild -config /config.dhall
|
||||||
|
\end{lstlisting}
|
||||||
|
|
||||||
|
To summarise Listing~\ref{podmanDeployment}, first, the application
|
||||||
|
container is built from inside the project folder using \texttt{kaniko}.
|
||||||
|
Alternatively, the container image could be pulled from the container
|
||||||
|
repository, but it makes more sense showing the image being built from sources
|
||||||
|
since the listing depicts a \texttt{:testbuild} tag being used.
|
||||||
|
|
||||||
|
Next, a \emph{pod} is created and given a name, setting the port binding for
|
||||||
|
the application. Then, the database container is started inside the pod.
|
||||||
|
|
||||||
|
As a final step, the application container itself is run inside the pod. The application configuration named \texttt{config.dhall} located in
|
||||||
|
\texttt{\$PWD} is mounted as a volume into container's \texttt{/config.dhall},
|
||||||
|
providing the application with a default configuration. The default container
|
||||||
|
does contain a default configuration for reference, however, running the
|
||||||
|
container as is without additional configuration would fail as it does not
|
||||||
|
contain the necessary secrets.
|
||||||
|
|
||||||
|
\n{3}{Sanity checks}
|
||||||
|
|
||||||
|
Do also note that the application connects to the database using its
|
||||||
|
\emph{container} name, i.e.\ not the IP address. That is possible thanks to
|
||||||
|
Podman setting up DNS inside the pod in such a way that all containers in the
|
||||||
|
pod can reach each other using their (container) names. Interestingly,
|
||||||
|
connecting via \texttt{localhost} would also work, as from inside the pod, any
|
||||||
|
container in the pod can reach any other container in the same pod via pod's
|
||||||
|
\texttt{localhost}.
|
||||||
|
In fact, \emph{pinging} the database or application containers from an ad-hoc
|
||||||
|
\texttt{alpine} container added to the pod yields:
|
||||||
|
|
||||||
|
\vspace{\parskip}
|
||||||
|
\begin{lstlisting}[language=bash, caption={Pinging pod containers using their
|
||||||
|
names}, label=podmanPing, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||||
|
user@containerHost % podman run --rm -it --user=0 --pod=pcmt \
|
||||||
|
docker.io/library/alpine:3.18
|
||||||
|
/ # ping -c2 pcmt-og
|
||||||
|
PING pcmt-og (127.0.0.1): 56 data bytes
|
||||||
|
64 bytes from 127.0.0.1: seq=0 ttl=42 time=0.072 ms
|
||||||
|
64 bytes from 127.0.0.1: seq=1 ttl=42 time=0.118 ms
|
||||||
|
|
||||||
|
--- pcmt-og ping statistics ---
|
||||||
|
2 packets transmitted, 2 packets received, 0% packet loss
|
||||||
|
round-trip min/avg/max = 0.072/0.095/0.118 ms
|
||||||
|
/ # ping -c2 pcmt-pg
|
||||||
|
PING pcmt-pg (127.0.0.1): 56 data bytes
|
||||||
|
64 bytes from 127.0.0.1: seq=0 ttl=42 time=0.045 ms
|
||||||
|
64 bytes from 127.0.0.1: seq=1 ttl=42 time=0.077 ms
|
||||||
|
|
||||||
|
--- pcmt-pg ping statistics ---
|
||||||
|
2 packets transmitted, 2 packets received, 0% packet loss
|
||||||
|
round-trip min/avg/max = 0.045/0.061/0.077 ms
|
||||||
|
/ #
|
||||||
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
|
The pod created in Listing~\ref{podmanDeployment} only set the binding for a
|
||||||
|
port used by the application (\texttt{5005/tcp}). The Postgres default port
|
||||||
|
\texttt{5432/tcp} is not among pod's port bindings, as can be seen in the pod
|
||||||
|
creation command. This can also easily be verified using the command in
|
||||||
|
Listing~\ref{podmanPortBindings}:
|
||||||
|
|
||||||
|
\begin{lstlisting}[language=bash, caption={Podman pod port bindings},
|
||||||
|
label=podmanPortBindings, basicstyle=\linespread{0.9}\small\ttfamily]
|
||||||
|
user@containerHost % podman pod inspect pcmt \
|
||||||
|
--format="Port bindings: {{.InfraConfig.PortBindings}}\n\
|
||||||
|
Host network: {{.InfraConfig.HostNetwork}}"
|
||||||
|
Port bindings: map[3000/tcp:[{ 5005}]]
|
||||||
|
Host network: false
|
||||||
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
|
To be absolutely sure, trying to connect to the database from outside of the
|
||||||
|
pod (i.e. from the container host) should \emph{fail}, unless, of course, there
|
||||||
|
is another process listening on that port:
|
||||||
|
|
||||||
|
\begin{lstlisting}[language=bash, caption={In-pod database is unreachable from
|
||||||
|
the host}, breaklines=true, label=podDbUnreachable,
|
||||||
|
basicstyle=\linespread{0.9}\small\ttfamily]
|
||||||
|
user@containerHost % curl localhost:5432
|
||||||
|
--> curl: (7) Failed to connect to localhost port 5432 after 0 ms: Couldn't connect to server
|
||||||
|
\end{lstlisting}
|
||||||
|
\vspace*{-\baselineskip}
|
||||||
|
|
||||||
|
The error in Listing~\ref{podDbUnreachable} is expected, as it is the result of
|
||||||
|
the database port not been exposed from the pod.
|
||||||
|
|
||||||
|
|
||||||
\n{1}{Validation}
|
\n{1}{Validation}
|
||||||
@ -1541,19 +1623,15 @@ the former and CoreDNS for the latter.
|
|||||||
|
|
||||||
Unit testing is a hot topic for many people and the author does not count
|
Unit testing is a hot topic for many people and the author does not count
|
||||||
himself to be a staunch supporter of neither extreme. The ``no unit tests''
|
himself to be a staunch supporter of neither extreme. The ``no unit tests''
|
||||||
seems to discount any benefit there is to unit testing, while a `` TDD-only''
|
seems to discount any benefit there is to unit testing, while a ``
|
||||||
(TDD, or Test Driven Development is a development methodology whereby tests are
|
TDD-only''\footnotemark{} approach can be a little too much for some people's
|
||||||
written first, then a complementary piece of code that is supposed to be
|
taste. The author tends to prefer a \emph{middle ground} approach in this
|
||||||
tested, just enough to get past the compile errors and to see the test fail,
|
particular case, i.e. writing enough tests where meaningful but not necessarily
|
||||||
then the code is refactored to make the test pass and then it can be fearlessly
|
testing everything or writing tests prior to business logic code. Arguably,
|
||||||
extended because the test is the safety net catching us when the user slips and
|
following the practice of TDD should result in writing a \emph{better designed}
|
||||||
alters the originally intended behaviour) approach can be a little too much for
|
code, particularly because there needs to be a prior thought about the shape
|
||||||
some people's taste. The author tends to sport a \emph{middle ground} approach
|
and function of the code, as it is tested for before it is even written, but it
|
||||||
here, with writing enough tests where meaningful but not necessarily testing
|
adds an slight inconvenience to what is otherwise a straightforward process.
|
||||||
everything or writing tests prior to code, although arguably that practice
|
|
||||||
should result in writing a \emph{better} designed code, particularly because
|
|
||||||
there has to be a prior though about it because it needs to be tested
|
|
||||||
\emph{first}.
|
|
||||||
|
|
||||||
Thanks to Go's built in support for testing via its \texttt{testing} package
|
Thanks to Go's built in support for testing via its \texttt{testing} package
|
||||||
and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go
|
and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go
|
||||||
@ -1578,6 +1656,15 @@ informing the developer that no tests were found, which is handy to learn if it
|
|||||||
was not intended/expected. When compiling regular source code, the Go files
|
was not intended/expected. When compiling regular source code, the Go files
|
||||||
with \texttt{\_test} in the name are simply ignored by the build tool.
|
with \texttt{\_test} in the name are simply ignored by the build tool.
|
||||||
|
|
||||||
|
\footnotetext{TDD, or Test Driven Development, is a development methodology
|
||||||
|
whereby tests are written \emph{first}, then a complementary piece of code
|
||||||
|
that is supposed to be tested is added, just enough to get past the compile
|
||||||
|
errors and to see the test \emph{fail} and then is the code finally
|
||||||
|
refactored to make the test \emph{pass}. The code can then be fearlessly
|
||||||
|
extended because the test is the safety net catching the programmer when the
|
||||||
|
mind slips and alters the originally intended behaviour of the code.}
|
||||||
|
|
||||||
|
|
||||||
\n{2}{Integration tests}
|
\n{2}{Integration tests}
|
||||||
|
|
||||||
Integrating with external software, namely the database in case of this
|
Integrating with external software, namely the database in case of this
|
||||||
@ -1724,26 +1811,29 @@ by \emph{Let's Encrypt}\allowbreak issued, short-lived, ECDSA
|
|||||||
a testing instance; therefore, limits to prevent abuse might be imposed.
|
a testing instance; therefore, limits to prevent abuse might be imposed.
|
||||||
|
|
||||||
|
|
||||||
|
\n{3}{Deployment validation}
|
||||||
|
|
||||||
|
TODO: show the results of testing the app in prod using
|
||||||
|
\url{https://testssl.sh/}.
|
||||||
|
|
||||||
|
|
||||||
% =========================================================================== %
|
% =========================================================================== %
|
||||||
\nn{Conclusion}
|
\nn{Conclusion}
|
||||||
|
|
||||||
The objectives of the thesis have been to create the Password Compromise
|
The objectives of the thesis have been to create the Password Compromise
|
||||||
Monitoring Tool aimed at security-conscious user in order to validate their
|
Monitoring Tool aimed at security-conscious user in order to validate their
|
||||||
assumptions on the security of their credentials. The thesis opened by
|
assumptions on the security of their credentials. The thesis opened by diving
|
||||||
introducing common terminology and continued with a dive into cryptography
|
into cryptography topics such as encryption and briefly mentioned TLS.
|
||||||
topics such as encryption, Diffie-Hellman key distribution scheme and briefly
|
|
||||||
mentioned TLS. Furthermore, it discussed the inner workings of browsers and the
|
|
||||||
protocols that underpin them.
|
|
||||||
|
|
||||||
Additionally, security mechanisms such as Site Isolation and Content Security
|
Additionally, security mechanisms such as Site Isolation and Content Security
|
||||||
Policy, commonly employed by mainstream browsers of today, were
|
Policy, commonly employed by mainstream browsers of today, were introduced and
|
||||||
introduced and the reader learnt how Content Security Policy is easily and
|
the reader learnt how Content Security Policy is easily and dynamically
|
||||||
dynamically configured.
|
configured.
|
||||||
|
|
||||||
An extensive body of the thesis then revolved around the practical part,
|
An extensive body of the thesis then revolved around the practical part,
|
||||||
describing everything from tooling used through application high-level-view
|
describing everything from tooling used through high-level view of
|
||||||
architecture to implementation of specific parts of the application across the
|
application's architecture to implementation of specific parts of the
|
||||||
stack.
|
application across the stack.
|
||||||
|
|
||||||
Finally, the practical part concluded by broadly depicting validation
|
Finally, the practical part concluded by broadly depicting validation
|
||||||
methods used to verify if the application worked correctly.
|
methods used to verify if the application worked correctly.
|
||||||
|
Reference in New Issue
Block a user