diff --git a/tex/part-practical.tex b/tex/part-practical.tex index 1b89f74..8ce7999 100644 --- a/tex/part-practical.tex +++ b/tex/part-practical.tex @@ -3,14 +3,15 @@ \n{1}{Introduction} -A part of the task of this thesis was to build an actual Password Compromise -Monitoring Tool. Therefore, the development process, the tools and practices -used generally, and with more specificity the outcome are all described in the -following sections. A whole section is dedicated to application architecture, -whereby relevant engineering choices are justified and motifs preceding the -decisions are explained. This part then flows into recommendations for more of -a production deployment and concludes by describing the validation methods -chosen and used to ensure correctness and stability of the program. +A part of the task of this thesis was to build an actual application, which was +named Password Compromise Monitoring Tool, or \texttt{pcmt} for short. +Therefore, the development process, the general tools and practices as well as +the specific outcome are all described in the following sections. A whole +section is dedicated to application architecture, whereby relevant engineering +choices are justified and motifs preceding the decisions are explained. This +part then flows into recommendations for more of a production deployment and +concludes by describing the validation methods chosen and used to ensure +correctness and stability of the program. \n{2}{Kudos} @@ -18,7 +19,10 @@ chosen and used to ensure correctness and stability of the program. The program that has been developed as part of this thesis used and utilised a great deal of free (as in \textit{freedom}) and open-source software in the process, either directly or as an outstanding work tool, and the author would -like to take this opportunity to recognise that fact\footnotemark. +like to take this opportunity to recognise that fact\footnotemark{}. + +\footnotetext{\textbf{Disclaimer:} the author is not affiliated with any of the +projects mentioned on this page.} In particular, the author acknowledges that this work would not be the same without: @@ -34,14 +38,12 @@ without: \item Go (\url{https://go.dev/}) \end{itemize} -All of the code written has been typed into VIM (\texttt{9.0}), the shell used -to run the commands was ZSH, both running in the author's terminal emulator of -choice, \texttt{kitty}. The development machines ran a recent installation of -\textit{Arch Linux (by the way)} and Fedora 38, both using a +All the code was typed into VIM, the shell used was ZSH, and the terminal +emulator of choice was \texttt{kitty}. The development machines ran a recent +installation of \textit{Arch Linux}\footnotemark{} and Fedora 38, both using a \texttt{6.\{2,3,4\}.x} XanMod variant of the Linux kernel. -\footnotetext{\textbf{Disclaimer:} the author is not affiliated with any of the -projects mentioned on this page.} +\footnotetext{(by the way) \url{https://i.redd.it/mfrfqy66ey311.jpg}.} \n{1}{Development} @@ -49,11 +51,11 @@ projects mentioned on this page.} The source code of the project was being versioned since the start, using the popular and industry-standard git (\url{https://git-scm.com}) source code management (SCM) tool. Commits were made frequently and, if at all possible, -for small and self-contained changes of code, trying to follow sane commit -message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted commit -messages. The name of the default branch is \texttt{development}, since that is -what the author likes to choose for new projects that are not yet stable (it is -in fact the default in author's \texttt{.gitconfig}). +consist of small and self-contained changes of code, trying to follow sane +commit message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted +commit messages. The name of the default branch is \texttt{development}, since +that is what the author likes to choose for new projects that are not yet +stable (it is in fact the default in author's \texttt{.gitconfig}). \n{2}{Commit signing} @@ -134,33 +136,33 @@ container) was used to run the builds. The way this runner works is that it creates an ephemeral container for every pipeline step and executes given \emph{commands} inside of it. At the end of -each step the container is discarded, while the repository, which is mounted -into each container's \texttt{/drone/src} is persisted between steps, allowing -it to be cloned only from \emph{origin} only at the start of the pipeline and -then shared for all the following steps, saving bandwidth, time and disk +each step, the container is discarded while the repository clone, which is +mounted into each container's \texttt{/drone/src}, is persisted between steps, +allowing it to be cloned from \emph{origin} only at the start of the pipeline +and then shared for all the following steps, saving bandwidth, time and disk writes. The entire configuration used to run the pipelines can be found in a file named \texttt{.drone.yml} at the root of the main source code repository. The workflow consists of four pipelines, which are run in parallel. Two main pipelines are defined to build the frontend assets, the \texttt{pcmt} binary -and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Arch and -Alpine (version 3.1\{7,8\}). These two pipelines are identical apart from +and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Alpine +(version 3.1\{7,8\}) and Arch. These two pipelines are identical apart from OS-specific bits such as installing a certain package, etc. For the record, other OS-architecture combinations were not tested. A third pipeline contains instructions to build a popular static analysis tool -called \texttt{golangci-lint}, which is sort of a meta-linter, bundling a -staggering amount of linters (linter is a tool that performs static code +called \texttt{golangci-lint}, which is a sort of meta-linter, bundling a +staggering number of linters (linter is a tool that performs static code analysis and can raise awareness of programming errors, flag potentially buggy -code constructs, or \emph{mere} stylistic errors) - from sources and then +code constructs, or \emph{mere} stylistic errors), from sources and then perform the analysis of project's codebase using the freshly built binary. If the result of this step is successful, a handful of code analysis services get pinged in the next steps to take notice of the changes to project's source code -and update their metrics, details can be found in the main Drone configuration +and update their metrics. Details can be found in the main Drone configuration file \texttt{.drone.yml} and the configuration for the \texttt{golangci-lint} -tool itself (such as what linters are enabled/disabled and with what settings) -can be found in the root of the repository in the file named +tool itself (such as what linters are enabled/disabled and their +configurations) can be found in the root of the repository in the file named \texttt{.golangci.yml}. The fourth pipeline focuses on linting the \texttt{Containerfile} and building @@ -287,9 +289,10 @@ passed around as a pointer. An experimental (note: not anymore, with \texttt{go1.21} it was brought into Go's \textit{stdlib}) library for \textit{structured} logging \texttt{slog} was -used to facilitate every logging need the program might have. It supports both -JSON and plain-text logging, which was made configurable by the program. Either -a configuration file value or an environment variable can be used to set this. +used to facilitate every logging need that the program might have. It supports +both JSON and plain-text logging, which was made configurable by the program. +Either a configuration file value or an environment variable can be used to set +this. There are four log levels available by default (\texttt{DEBUG}, \texttt{INFO}, \texttt{WARNING}, \texttt{ERROR}) and the pertinent library funtions are @@ -373,10 +376,10 @@ TCP range $1-65535$.} \vspace*{-\baselineskip} -\paragraph{\texttt{-printMigration}}{A boolean option that if set, makes the - program print any \textbf{upcoming} database migrations (based on the current - state of the database) and exit. The connection string environment variable - still needs to be set in order to be able connect to the database and perform +\paragraph{\texttt{-printMigration}}{A boolean option that, if set, makes the +program print any \textbf{upcoming} database migrations (based on the current +state of the database) and exit. The connection string environment variable +still needs to be set in order to be able connect to the database and perform the schema \emph{diff}. This option is mainly useful during debugging.} \vspace*{-\baselineskip} @@ -410,7 +413,7 @@ development binary might simply print the truncated commit ID (consult An important thing to mention is embedded assets and templates. Go has multiple mechanisms to natively embed arbitrary files directly into the binary during -the regular build process. \texttt{embed.FS} from the standard library' +the regular build process. \texttt{embed.FS} from the standard library \texttt{embed} package was used to bundle all template files and web assets, such as images, logos and stylesheets at the module level. These are then passed around the program as needed, such as to the \texttt{handlers} package. @@ -481,7 +484,7 @@ JavaScript. \n{2}{Frontend} -Frontend-side, the application Tailwind was used for CSS. It promotes the usage +Frontend-wise, the application Tailwind was used for CSS. It promotes the usage of flexible \emph{utility-first} classes in the HTML markup instead of separating out styles from content. Understandably, this is somewhat of a preference issue and the author does not hold hard opinions in either @@ -491,7 +494,7 @@ detailed documentation and offering built-in support for dark/light mode, and partially also because it \emph{looks} nice. The Go templates containing the CSS classes need to be parsed by Tailwind in -order t produce the final stylesheet that can be bundled with the application. +order to produce the final stylesheet that can be bundled with the application. The upstream provides an original CLI tool (\texttt{tailwindcss}), which can be used exactly for that action. Simple and accessible layouts were overall preferred, a single page was rather split into multiple when becoming @@ -503,9 +506,9 @@ pages. As an aside, the author has briefly experimented with WebAssembly to provide client-side dynamic functionality for this project, but has ultimately scrapped it in favour of the entirely server-side rendered approach. It is possible that -it would get revisited in the future if necessary, and performance mattered. -Even from the short experiments it was obvious how much faster WebAssembly was -when compared to JavaScript. +it would get revisited in the future if necessary. Even from the short +experiments it was obvious how much faster WebAssembly was when compared to +JavaScript. % \newpage @@ -525,15 +528,15 @@ roles were envisioned: \end{itemize} It is paramount that the program protects itself from the insider threats as -well and therefore each role is only able to perform actions that it is -explicitly assigned. While there definitely is certain overlap between the +well, and therefore each role is only able to perform actions that it is +explicitly assigned. While there definitely is a certain overlap between the capabilities of the two outlined roles, each also possesses unique features that the other one does not. -For instance, the administrator role is not able to perform searches on the -breach data directly, for that a separate \emph{user} account has to be -devised. Similarly, a regular user is not able to manage breach lists and other -users, because that is a privileged operation. +For instance, the administrator role is not able to perform breach data +searches directly, for that a separate \emph{user} account has to be devised. +Similarly, a regular user is not able to manage breach lists and other users, +because that is a privileged operation. In-application administrators are not able to view (any) sensitive user data and should therefore only be able to perform the following actions: @@ -572,12 +575,12 @@ configuration parameters) before letting it encrypt the \emph{age} key. The \texttt{age} identity is only generated once the user changes their password for the first time, in an attempt to prevent scenarios like the in-application administrator with access to physical database being able to -both \textbf{recover} the key from the database and \textbf{decrypt} it given +both \textbf{recover} the key from the database and \textbf{decrypt} it, given that they already know the user password (because they set it when they created the user), which would subsequently give them unbounded access to any future encrypted data, as long as they would be able to maintain their database -access. This is why generating the \texttt{age} identity is are bound to the -first password change. +access. This is why generating the \texttt{age} identity is bound to the first +password change. Of course, the supposed evil administrator could simply perform the password change themselves! However, the user would at least be able to find those @@ -603,8 +606,8 @@ instance using \texttt{LiME}~\cite{lime}, or perhaps directly the \n{2}{Dhall Configuration Schema}\label{sec:configuration} The configuration schema was at first being developed as part of the main -project's repository, before it was determined that it would benefit both the -development and overall clarity if the schema lived in its own repository (see +project's repository, before it was determined that both the development and +overall clarity would benefit from the schema living in its own repository (see Section~\ref{sec:repos} for details). This enabled the schema to be independently developed and versioned, and only be pulled into the main application whenever it was determined to be ready. @@ -876,13 +879,27 @@ manner with more than one concurrent \emph{writer} (replicated application instances). \footnotetext{In Go, integer size is architecture dependent, see -\url{https://go.dev/ref/spec#Numeric_types}} +\url{https://go.dev/ref/spec#Numeric_types}.} The relations between entities as modelled with \texttt{ent} can be imagined as the edges connecting the nodes of a directed \emph{graph}, with the nodes -representing the entities. This conceptualisation lends itself for a more +representing the entities. This conceptualisation lends itself to a more human-friendly querying language, where the directionality can be expressed -with words describing ownership +with words describing ownership, like so: + +\vspace{\parskip} +\begin{lstlisting}[caption={Ent graph query}, +label=entQuery, +backgroundcolor=\color{lstbg}, +language=Go, +] +one, err := users.Query. + Where( + LocalBreach. + Has(Field_xyz) + ). + Only(ctx) +\end{lstlisting} @@ -1179,12 +1196,13 @@ himself to be a staunch supporter of neither extreme. The ``no unit tests'' opinion seems to discount any benefit there is to unit testing, while a ``TDD-only''\footnotemark{} approach can be a little too much for some people's taste. The author tends to prefer a \emph{middle ground} approach in this -particular case, i.e. writing enough tests where meaningful but not necessarily -testing everything or writing tests prior to business logic code. Arguably, -following the practice of TDD should result in writing a \emph{better designed} -code, particularly because there needs to be a prior thought about the shape -and function of the code, as it is tested for before it is even written, but it -adds a slight inconvenience to what is otherwise a straightforward process. +particular case, i.e. writing enough tests where meaningful, but not +necessarily testing everything or writing tests prior to business logic code. +Arguably, following the practice of TDD should result in writing a \emph{better +designed} code, particularly because there needs to be a prior thought about +the shape and function of the code, as it is tested for before being even +written, but it adds a slight inconvenience to what is otherwise a +straightforward process. Thanks to Go's built in support for testing via its \texttt{testing} package and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go @@ -1200,7 +1218,7 @@ the signature. Without it, the function is not considered to be a testing function despite having the required signature and is therefore \emph{not} executed during testing. -This test lookup behaviour; however, also has a neat side effect: all the test +This test lookup behaviour, however, also has a neat side effect: all the test files can be kept side-by-side their regular source counterparts, there is no need to segregate them into a specially blessed \texttt{tests} folder or similar, which in author's opinion improves readability. As a failsafe, in case @@ -1231,20 +1249,20 @@ then after pushing to remote in the CI. In the integration test shown in Listing~\ref{integrationtest}, it is prefaced at line 10 by declaring a helper function \texttt{getCtx() context.Context}, -which takes no arguments and returns a new\\ \texttt{context.Context} -initialised with the value of the global logger. As previously mentioned, that -is how the logger gets injected into the user module functions. The actual test -function with the signature \texttt{TestUserExists(t *testing.T)} defines a -database connection string at line 21 and attempts to open a connection to the -database. The database in use here is SQLite3 running in memory mode, meaning -no file is actually written to disk during this process. Since the testing data -is not needed after the test, this is desirable. Next, a defer statement calls -the \texttt{Close()} method on the database object, which is the Go idiomatic -way of closing files and network connections (which are also an abstraction -over files on UNIX-like operating systems such as GNU/Linux). Contrary to where -it is declared, the \emph{defer} statement is only called after all the -statements in the surrounding function, which makes sure no file descriptors -(FDs) are leaked and the file is properly closed when the function returns. +which takes no arguments and returns a new \texttt{context.Context} initialised +with the value of the global logger. As previously mentioned, that is how the +logger gets injected into the user module functions. The actual test function +with the signature \texttt{TestUserExists(t *testing.T)} defines a database +connection string at line 21 and attempts to open a connection to the database. +The database in use here is SQLite3 running in memory mode, meaning no file is +actually written to disk during this process. Since the testing data is not +needed after the test, this is desirable. Next, a defer statement calls the +\texttt{Close()} method on the database object, which is the Go idiomatic way +of closing files and network connections (which are also an abstraction over +files on UNIX-like operating systems such as GNU/Linux). Contrary to where it +is declared, the \emph{defer} statement is only called after all the statements +in the surrounding function, which makes sure no file descriptors (FDs) are +leaked and the file is properly closed when the function returns. In the next step at line 25 a database schema creation is attempted, handling the potential error in a Go idiomatic way, which uses the return value from the @@ -1371,7 +1389,7 @@ The application has been deployed in a test environment on author's modest Virtual Private Server (VPS) at \texttt{https://testpcmt.dotya.ml}, protected by \emph{Let's Encrypt}\allowbreak issued, short-lived, ECDSA \texttt{secp384r1} curve TLS certificate, and configured with strict CSP. It is -a test instance; therefore limits (and rate-limits) to prevent abuse might be +a test instance, therefore limits (and rate-limits) to prevent abuse might be imposed. \\ The test environment makes the program available over both modern IPv6 and @@ -1427,7 +1445,7 @@ ensures that no unvetted content is ever loaded from elsewhere. The \texttt{Referrer-Policy} header setting of \texttt{no-referrer, strict-origin-when-cross-origin} ensures that user tracking is reduced, since no referrer is included (the \texttt{Referer} header is omitted) when the user -navigatse away from the site or somehow send requests outside the application +navigates away from the site or somehow send requests outside the application using other means. The \texttt{Permissions-Policy} set to \texttt{geolocation=(), midi=(), sync-xhr=(), microphone=(), camera=(), gyroscope=(), magnetometer=(), fullscreen=(self), payment=()} declares that the