1
0

extend, reword practical part

This commit is contained in:
surtur 2023-08-23 21:18:08 +02:00
parent fb5c7d0bcd
commit 65b7cb9b89
Signed by: wanderer
SSH Key Fingerprint: SHA256:MdCZyJ2sHLltrLBp0xQO0O1qTW9BT/xl5nXkDvhlMCI

@ -3,14 +3,15 @@
\n{1}{Introduction}
A part of the task of this thesis was to build an actual Password Compromise
Monitoring Tool. Therefore, the development process, the tools and practices
used generally, and with more specificity the outcome are all described in the
following sections. A whole section is dedicated to application architecture,
whereby relevant engineering choices are justified and motifs preceding the
decisions are explained. This part then flows into recommendations for more of
a production deployment and concludes by describing the validation methods
chosen and used to ensure correctness and stability of the program.
A part of the task of this thesis was to build an actual application, which was
named Password Compromise Monitoring Tool, or \texttt{pcmt} for short.
Therefore, the development process, the general tools and practices as well as
the specific outcome are all described in the following sections. A whole
section is dedicated to application architecture, whereby relevant engineering
choices are justified and motifs preceding the decisions are explained. This
part then flows into recommendations for more of a production deployment and
concludes by describing the validation methods chosen and used to ensure
correctness and stability of the program.
\n{2}{Kudos}
@ -18,7 +19,10 @@ chosen and used to ensure correctness and stability of the program.
The program that has been developed as part of this thesis used and utilised a
great deal of free (as in \textit{freedom}) and open-source software in the
process, either directly or as an outstanding work tool, and the author would
like to take this opportunity to recognise that fact\footnotemark.
like to take this opportunity to recognise that fact\footnotemark{}.
\footnotetext{\textbf{Disclaimer:} the author is not affiliated with any of the
projects mentioned on this page.}
In particular, the author acknowledges that this work would not be the same
without:
@ -34,14 +38,12 @@ without:
\item Go (\url{https://go.dev/})
\end{itemize}
All of the code written has been typed into VIM (\texttt{9.0}), the shell used
to run the commands was ZSH, both running in the author's terminal emulator of
choice, \texttt{kitty}. The development machines ran a recent installation of
\textit{Arch Linux (by the way)} and Fedora 38, both using a
All the code was typed into VIM, the shell used was ZSH, and the terminal
emulator of choice was \texttt{kitty}. The development machines ran a recent
installation of \textit{Arch Linux}\footnotemark{} and Fedora 38, both using a
\texttt{6.\{2,3,4\}.x} XanMod variant of the Linux kernel.
\footnotetext{\textbf{Disclaimer:} the author is not affiliated with any of the
projects mentioned on this page.}
\footnotetext{(by the way) \url{https://i.redd.it/mfrfqy66ey311.jpg}.}
\n{1}{Development}
@ -49,11 +51,11 @@ projects mentioned on this page.}
The source code of the project was being versioned since the start, using the
popular and industry-standard git (\url{https://git-scm.com}) source code
management (SCM) tool. Commits were made frequently and, if at all possible,
for small and self-contained changes of code, trying to follow sane commit
message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted commit
messages. The name of the default branch is \texttt{development}, since that is
what the author likes to choose for new projects that are not yet stable (it is
in fact the default in author's \texttt{.gitconfig}).
consist of small and self-contained changes of code, trying to follow sane
commit message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted
commit messages. The name of the default branch is \texttt{development}, since
that is what the author likes to choose for new projects that are not yet
stable (it is in fact the default in author's \texttt{.gitconfig}).
\n{2}{Commit signing}
@ -134,33 +136,33 @@ container) was used to run the builds.
The way this runner works is that it creates an ephemeral container for every
pipeline step and executes given \emph{commands} inside of it. At the end of
each step the container is discarded, while the repository, which is mounted
into each container's \texttt{/drone/src} is persisted between steps, allowing
it to be cloned only from \emph{origin} only at the start of the pipeline and
then shared for all the following steps, saving bandwidth, time and disk
each step, the container is discarded while the repository clone, which is
mounted into each container's \texttt{/drone/src}, is persisted between steps,
allowing it to be cloned from \emph{origin} only at the start of the pipeline
and then shared for all the following steps, saving bandwidth, time and disk
writes.
The entire configuration used to run the pipelines can be found in a file named
\texttt{.drone.yml} at the root of the main source code repository. The
workflow consists of four pipelines, which are run in parallel. Two main
pipelines are defined to build the frontend assets, the \texttt{pcmt} binary
and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Arch and
Alpine (version 3.1\{7,8\}). These two pipelines are identical apart from
and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Alpine
(version 3.1\{7,8\}) and Arch. These two pipelines are identical apart from
OS-specific bits such as installing a certain package, etc. For the record,
other OS-architecture combinations were not tested.
A third pipeline contains instructions to build a popular static analysis tool
called \texttt{golangci-lint}, which is sort of a meta-linter, bundling a
staggering amount of linters (linter is a tool that performs static code
called \texttt{golangci-lint}, which is a sort of meta-linter, bundling a
staggering number of linters (linter is a tool that performs static code
analysis and can raise awareness of programming errors, flag potentially buggy
code constructs, or \emph{mere} stylistic errors) - from sources and then
code constructs, or \emph{mere} stylistic errors), from sources and then
perform the analysis of project's codebase using the freshly built binary. If
the result of this step is successful, a handful of code analysis services get
pinged in the next steps to take notice of the changes to project's source code
and update their metrics, details can be found in the main Drone configuration
and update their metrics. Details can be found in the main Drone configuration
file \texttt{.drone.yml} and the configuration for the \texttt{golangci-lint}
tool itself (such as what linters are enabled/disabled and with what settings)
can be found in the root of the repository in the file named
tool itself (such as what linters are enabled/disabled and their
configurations) can be found in the root of the repository in the file named
\texttt{.golangci.yml}.
The fourth pipeline focuses on linting the \texttt{Containerfile} and building
@ -287,9 +289,10 @@ passed around as a pointer.
An experimental (note: not anymore, with \texttt{go1.21} it was brought into
Go's \textit{stdlib}) library for \textit{structured} logging \texttt{slog} was
used to facilitate every logging need the program might have. It supports both
JSON and plain-text logging, which was made configurable by the program. Either
a configuration file value or an environment variable can be used to set this.
used to facilitate every logging need that the program might have. It supports
both JSON and plain-text logging, which was made configurable by the program.
Either a configuration file value or an environment variable can be used to set
this.
There are four log levels available by default (\texttt{DEBUG}, \texttt{INFO},
\texttt{WARNING}, \texttt{ERROR}) and the pertinent library funtions are
@ -373,10 +376,10 @@ TCP range $1-65535$.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-printMigration}}{A boolean option that if set, makes the
program print any \textbf{upcoming} database migrations (based on the current
state of the database) and exit. The connection string environment variable
still needs to be set in order to be able connect to the database and perform
\paragraph{\texttt{-printMigration}}{A boolean option that, if set, makes the
program print any \textbf{upcoming} database migrations (based on the current
state of the database) and exit. The connection string environment variable
still needs to be set in order to be able connect to the database and perform
the schema \emph{diff}. This option is mainly useful during debugging.}
\vspace*{-\baselineskip}
@ -410,7 +413,7 @@ development binary might simply print the truncated commit ID (consult
An important thing to mention is embedded assets and templates. Go has multiple
mechanisms to natively embed arbitrary files directly into the binary during
the regular build process. \texttt{embed.FS} from the standard library'
the regular build process. \texttt{embed.FS} from the standard library
\texttt{embed} package was used to bundle all template files and web assets,
such as images, logos and stylesheets at the module level. These are then
passed around the program as needed, such as to the \texttt{handlers} package.
@ -481,7 +484,7 @@ JavaScript.
\n{2}{Frontend}
Frontend-side, the application Tailwind was used for CSS. It promotes the usage
Frontend-wise, the application Tailwind was used for CSS. It promotes the usage
of flexible \emph{utility-first} classes in the HTML markup instead of
separating out styles from content. Understandably, this is somewhat of a
preference issue and the author does not hold hard opinions in either
@ -491,7 +494,7 @@ detailed documentation and offering built-in support for dark/light mode, and
partially also because it \emph{looks} nice.
The Go templates containing the CSS classes need to be parsed by Tailwind in
order t produce the final stylesheet that can be bundled with the application.
order to produce the final stylesheet that can be bundled with the application.
The upstream provides an original CLI tool (\texttt{tailwindcss}), which can be
used exactly for that action. Simple and accessible layouts were overall
preferred, a single page was rather split into multiple when becoming
@ -503,9 +506,9 @@ pages.
As an aside, the author has briefly experimented with WebAssembly to provide
client-side dynamic functionality for this project, but has ultimately scrapped
it in favour of the entirely server-side rendered approach. It is possible that
it would get revisited in the future if necessary, and performance mattered.
Even from the short experiments it was obvious how much faster WebAssembly was
when compared to JavaScript.
it would get revisited in the future if necessary. Even from the short
experiments it was obvious how much faster WebAssembly was when compared to
JavaScript.
% \newpage
@ -525,15 +528,15 @@ roles were envisioned:
\end{itemize}
It is paramount that the program protects itself from the insider threats as
well and therefore each role is only able to perform actions that it is
explicitly assigned. While there definitely is certain overlap between the
well, and therefore each role is only able to perform actions that it is
explicitly assigned. While there definitely is a certain overlap between the
capabilities of the two outlined roles, each also possesses unique features
that the other one does not.
For instance, the administrator role is not able to perform searches on the
breach data directly, for that a separate \emph{user} account has to be
devised. Similarly, a regular user is not able to manage breach lists and other
users, because that is a privileged operation.
For instance, the administrator role is not able to perform breach data
searches directly, for that a separate \emph{user} account has to be devised.
Similarly, a regular user is not able to manage breach lists and other users,
because that is a privileged operation.
In-application administrators are not able to view (any) sensitive user data
and should therefore only be able to perform the following actions:
@ -572,12 +575,12 @@ configuration parameters) before letting it encrypt the \emph{age} key.
The \texttt{age} identity is only generated once the user changes their
password for the first time, in an attempt to prevent scenarios like the
in-application administrator with access to physical database being able to
both \textbf{recover} the key from the database and \textbf{decrypt} it given
both \textbf{recover} the key from the database and \textbf{decrypt} it, given
that they already know the user password (because they set it when they created
the user), which would subsequently give them unbounded access to any future
encrypted data, as long as they would be able to maintain their database
access. This is why generating the \texttt{age} identity is are bound to the
first password change.
access. This is why generating the \texttt{age} identity is bound to the first
password change.
Of course, the supposed evil administrator could simply perform the password
change themselves! However, the user would at least be able to find those
@ -603,8 +606,8 @@ instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
\n{2}{Dhall Configuration Schema}\label{sec:configuration}
The configuration schema was at first being developed as part of the main
project's repository, before it was determined that it would benefit both the
development and overall clarity if the schema lived in its own repository (see
project's repository, before it was determined that both the development and
overall clarity would benefit from the schema living in its own repository (see
Section~\ref{sec:repos} for details). This enabled the schema to be
independently developed and versioned, and only be pulled into the main
application whenever it was determined to be ready.
@ -876,13 +879,27 @@ manner with more than one concurrent \emph{writer} (replicated application
instances).
\footnotetext{In Go, integer size is architecture dependent, see
\url{https://go.dev/ref/spec#Numeric_types}}
\url{https://go.dev/ref/spec#Numeric_types}.}
The relations between entities as modelled with \texttt{ent} can be imagined as
the edges connecting the nodes of a directed \emph{graph}, with the nodes
representing the entities. This conceptualisation lends itself for a more
representing the entities. This conceptualisation lends itself to a more
human-friendly querying language, where the directionality can be expressed
with words describing ownership
with words describing ownership, like so:
\vspace{\parskip}
\begin{lstlisting}[caption={Ent graph query},
label=entQuery,
backgroundcolor=\color{lstbg},
language=Go,
]
one, err := users.Query.
Where(
LocalBreach.
Has(Field_xyz)
).
Only(ctx)
\end{lstlisting}
@ -1179,12 +1196,13 @@ himself to be a staunch supporter of neither extreme. The ``no unit tests''
opinion seems to discount any benefit there is to unit testing, while a
``TDD-only''\footnotemark{} approach can be a little too much for some people's
taste. The author tends to prefer a \emph{middle ground} approach in this
particular case, i.e. writing enough tests where meaningful but not necessarily
testing everything or writing tests prior to business logic code. Arguably,
following the practice of TDD should result in writing a \emph{better designed}
code, particularly because there needs to be a prior thought about the shape
and function of the code, as it is tested for before it is even written, but it
adds a slight inconvenience to what is otherwise a straightforward process.
particular case, i.e. writing enough tests where meaningful, but not
necessarily testing everything or writing tests prior to business logic code.
Arguably, following the practice of TDD should result in writing a \emph{better
designed} code, particularly because there needs to be a prior thought about
the shape and function of the code, as it is tested for before being even
written, but it adds a slight inconvenience to what is otherwise a
straightforward process.
Thanks to Go's built in support for testing via its \texttt{testing} package
and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go
@ -1200,7 +1218,7 @@ the signature. Without it, the function is not considered to be a testing
function despite having the required signature and is therefore \emph{not}
executed during testing.
This test lookup behaviour; however, also has a neat side effect: all the test
This test lookup behaviour, however, also has a neat side effect: all the test
files can be kept side-by-side their regular source counterparts, there is no
need to segregate them into a specially blessed \texttt{tests} folder or
similar, which in author's opinion improves readability. As a failsafe, in case
@ -1231,20 +1249,20 @@ then after pushing to remote in the CI.
In the integration test shown in Listing~\ref{integrationtest}, it is prefaced
at line 10 by declaring a helper function \texttt{getCtx() context.Context},
which takes no arguments and returns a new\\ \texttt{context.Context}
initialised with the value of the global logger. As previously mentioned, that
is how the logger gets injected into the user module functions. The actual test
function with the signature \texttt{TestUserExists(t *testing.T)} defines a
database connection string at line 21 and attempts to open a connection to the
database. The database in use here is SQLite3 running in memory mode, meaning
no file is actually written to disk during this process. Since the testing data
is not needed after the test, this is desirable. Next, a defer statement calls
the \texttt{Close()} method on the database object, which is the Go idiomatic
way of closing files and network connections (which are also an abstraction
over files on UNIX-like operating systems such as GNU/Linux). Contrary to where
it is declared, the \emph{defer} statement is only called after all the
statements in the surrounding function, which makes sure no file descriptors
(FDs) are leaked and the file is properly closed when the function returns.
which takes no arguments and returns a new \texttt{context.Context} initialised
with the value of the global logger. As previously mentioned, that is how the
logger gets injected into the user module functions. The actual test function
with the signature \texttt{TestUserExists(t *testing.T)} defines a database
connection string at line 21 and attempts to open a connection to the database.
The database in use here is SQLite3 running in memory mode, meaning no file is
actually written to disk during this process. Since the testing data is not
needed after the test, this is desirable. Next, a defer statement calls the
\texttt{Close()} method on the database object, which is the Go idiomatic way
of closing files and network connections (which are also an abstraction over
files on UNIX-like operating systems such as GNU/Linux). Contrary to where it
is declared, the \emph{defer} statement is only called after all the statements
in the surrounding function, which makes sure no file descriptors (FDs) are
leaked and the file is properly closed when the function returns.
In the next step at line 25 a database schema creation is attempted, handling
the potential error in a Go idiomatic way, which uses the return value from the
@ -1371,7 +1389,7 @@ The application has been deployed in a test environment on author's modest
Virtual Private Server (VPS) at \texttt{https://testpcmt.dotya.ml}, protected
by \emph{Let's Encrypt}\allowbreak issued, short-lived, ECDSA
\texttt{secp384r1} curve TLS certificate, and configured with strict CSP. It is
a test instance; therefore limits (and rate-limits) to prevent abuse might be
a test instance, therefore limits (and rate-limits) to prevent abuse might be
imposed.
\\
The test environment makes the program available over both modern IPv6 and
@ -1427,7 +1445,7 @@ ensures that no unvetted content is ever loaded from elsewhere.
The \texttt{Referrer-Policy} header setting of \texttt{no-referrer,
strict-origin-when-cross-origin} ensures that user tracking is reduced, since
no referrer is included (the \texttt{Referer} header is omitted) when the user
navigatse away from the site or somehow send requests outside the application
navigates away from the site or somehow send requests outside the application
using other means. The \texttt{Permissions-Policy} set to
\texttt{geolocation=(), midi=(), sync-xhr=(), microphone=(), camera=(),
gyroscope=(), magnetometer=(), fullscreen=(self), payment=()} declares that the