% =========================================================================== % \part{Practical part} \n{1}{Introduction} A part of the task of this thesis was to build an actual Password Compromise Monitoring Tool. Therefore, the development process, the tools and practices used generally, and with more specificity the outcome are all described in the following sections. A whole section is dedicated to application architecture, whereby relevant engineering choices are justified and motifs preceding the decisions are explained. This part then flows into recommendations for more of a production deployment and concludes by describing the validation methods chosen and used to ensure correctness and stability of the program. \n{2}{Kudos} The program that has been developed as part of this thesis used and utilised a great deal of free (as in \textit{freedom}) and open-source software in the process, either directly or as an outstanding work tool, and the author would like to take this opportunity to recognise that fact\footnotemark. In particular, the author acknowledges that this work would not be the same without: \begin{itemize} \item vim (\url{https://www.vim.org/}) \item Arch Linux (\url{https://archlinux.org/}) \item ZSH (\url{https://www.zsh.org/}) \item kitty (\url{https://sw.kovidgoyal.net/kitty/}) \item Nix (\url{https://nixos.org/explore.html}) \item pre-commit (\url{https://pre-commit.com/}) \item Podman (\url{https://podman.io/}) \item Go (\url{https://go.dev/}) \end{itemize} All of the code written has been typed into VIM (\texttt{9.0}), the shell used to run the commands was ZSH, both running in the author's terminal emulator of choice, \texttt{kitty}. The development machines ran a recent installation of \textit{Arch Linux (by the way)} and Fedora 38, both using a \texttt{6.\{2,3,4\}.x} XanMod variant of the Linux kernel. \footnotetext{\textbf{Disclaimer:} the author is not affiliated with any of the projects mentioned on this page.} \n{1}{Development} The source code of the project was being versioned since the start, using the popular and industry-standard git (\url{https://git-scm.com}) source code management (SCM) tool. Commits were made frequently and, if at all possible, for small and self-contained changes of code, trying to follow sane commit message \emph{hygiene}, i.e.\ striving for meaningful and well-formatted commit messages. The name of the default branch is \texttt{development}, since that is what the author likes to choose for new projects that are not yet stable (it is in fact the default in author's \texttt{.gitconfig}). \n{2}{Commit signing} Since git allows cryptographically \emph{singing} all commits, it would be unwise not to take advantage of this. For the longest time, GPG was the only method available for signing commits in git; however, that is no longer applicable~\cite{agwagitssh}. These days, it is also possible to both sign and verify one's git commits (and tags!) using SSH keys, namely those produced by OpenSSH, which \emph{can} be the same ones that can be used to log in to remote systems. The author has, of course, not reused the same key pairs that are used to connect to machines for signing commits. A different, \texttt{Ed25519} elliptic curve key pairs have been used specifically for signing. Public components of these keys are enclosed in this thesis as Appendix~\ref{appendix:signingkeys} for future reference. The validity of a signature on a particular commit can be viewed with git using the following commands (the \% sign denotes the shell prompt): \vspace{\parskip} \begin{lstlisting}[language=bash, caption={Verifying the signature of a git commit}, label=gitverif, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}] % cd % git show --show-signature % # alternatively: % git verify-commit \end{lstlisting} \vspace*{-\baselineskip} There is one caveat to this though, git first needs some additional configuration for the code in Listing~\ref{gitverif} to work as one would expect. Namely that the public key used to verify the signature needs to be stored in git's ``allowed signers file'', then git needs to be told where that file is located using the configuration value \texttt{gpg.ssh.allowedsignersfile} and finally the configuration value of the \texttt{gpg.format} field needs to be set to \texttt{ssh}. Luckily, because git also allows the configuration values to be local to each repository, both of the mentioned issues can be solved by running the following commands from inside the cloned repository: \vspace{\parskip} \begin{lstlisting}[language=bash, caption={Prepare allowed signers file and signature format for git}, label=gitsshprep, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}] % # set the signature format for the local repository. % git config --local gpg.format ssh % # save the public key. % cat > ./.tmp-allowed_signers \ <<<'surtur leo ' % # set the allowed signers file path for the local repository. % git config --local gpg.ssh.allowedsignersfile=./.tmp-allowed_signers \end{lstlisting} \vspace*{-\baselineskip} After the code in Listing~\ref{gitsshprep} is run, everything from the Listing~\ref{gitverif} should remain applicable for the lifetime of the repository or until git changes implementation of signature verification. The git \texttt{user.name} that can be seen on the commits in the \textbf{Author} field is named after the machine that was used to develop the program, since the author uses different signing keys on each machine. That way the committer machine can be determined post-hoc. For future reference, git has been used in the version \texttt{git version 2.4\{0,1,2\}.x}. \n{2}{Continuous Integration} To increase both the author's and public confidence in the atomic changes made over time, it was attempted to thoroughly \emph{integrate} them using a continuous integration (CI) service that was plugged into the main source code repository since the early stages of development. This, of course, was again self-hosted, including the workers. The tool of choice there was Drone (\url{https://drone.io}) and the ``docker'' runner (in fact it runs any OCI container) was used to run the builds. The way this runner works is that it creates an ephemeral container for every pipeline step and executes given \emph{commands} inside of it. At the end of each step the container is discarded, while the repository, which is mounted into each container's \texttt{/drone/src} is persisted between steps, allowing it to be cloned only from \emph{origin} only at the start of the pipeline and then shared for all the following steps, saving bandwidth, time and disk writes. The entire configuration used to run the pipelines can be found in a file named \texttt{.drone.yml} at the root of the main source code repository. The workflow consists of four pipelines, which are run in parallel. Two main pipelines are defined to build the frontend assets, the \texttt{pcmt} binary and run tests on \texttt{x86\_64} GNU/Linux targets, one for each of Arch and Alpine (version 3.1\{7,8\}). These two pipelines are identical apart from OS-specific bits such as installing a certain package, etc. For the record, other OS-architecture combinations were not tested. A third pipeline contains instructions to build a popular static analysis tool called \texttt{golangci-lint}, which is sort of a meta-linter, bundling a staggering amount of linters (linter is a tool that performs static code analysis and can raise awareness of programming errors, flag potentially buggy code constructs, or \emph{mere} stylistic errors) - from sources and then perform the analysis of project's codebase using the freshly built binary. If the result of this step is successful, a handful of code analysis services get pinged in the next steps to take notice of the changes to project's source code and update their metrics, details can be found in the main Drone configuration file \texttt{.drone.yml} and the configuration for the \texttt{golangci-lint} tool itself (such as what linters are enabled/disabled and with what settings) can be found in the root of the repository in the file named \texttt{.golangci.yml}. The fourth pipeline focuses on linting the \texttt{Containerfile} and building the container and pushing in to a public container registry, although the latter action is only performed on feature branches, \emph{pull request} or \emph{tag} events. \obr{Drone CI median build time}{fig:drone-median-build}{.84}{graphics/drone-median-build} The median build time as of writing was 1 minute, which includes running all four pipelines, and that is acceptable. Build times might of course vary depending on the hardware, for reference, these builds were run on a machine equipped with a Zen 3 Ryzen 5 5600 CPU with nominal clock times, DDR4 @ 3200 MHz RAM, a couple of PCIe Gen 4 NVMe drives in a mirrored setup (using ZFS) and a 600 Mbps downlink, software-wise running Arch with an author-flavoured Xanmod kernel version 6.\{2,3,4\}.x. \n{2}{Source code repositories}\label{sec:repos} The git repository containing source code of the \texttt{pcmt} project:\\ \url{https://git.dotya.ml/mirre-mt/pcmt.git}. The git repository hosting the \texttt{pcmt} configuration schema:\\ \url{https://git.dotya.ml/mirre-mt/pcmt-config-schema.git}. The repository containing the \LaTeX{} source code of this thesis:\\ \url{https://git.dotya.ml/mirre-mt/masters-thesis.git}. All the pertaining source code was published in repositories on a publicly available git server operated by the author, the reasoning \emph{pro} self-hosting being that it is the preferred way of guaranteed autonomy over one's source code, as opposed to large silos owned by big corporations having a track record of arguably not always deciding with user's best interest in mind (although recourse has been observed~\cite{ytdl}). When these providers act on impulse or under public pressure they can potentially (at least temporarily) disrupt operations of their users. Thus, they are not only beholding their users to lengthy \emph{terms of service} that \emph{are subject to change at any given moment}, but also outside factors beyond their control. Granted, decentralisation can take a toll on discoverability of the project, but that is only a concern if rapid market penetration is a goal, not when aiming for an organically grown community. \n{2}{Toolchain} Throughout the creation of this work, the \emph{then-current} version of the Go programming language was used, i.e. \texttt{go1.20}. To read more on why Go was chosen in particular, see Appendix~\ref{appendix:whygo}. Equally, Nix and Nix-based tools such as \texttt{devenv} have also aided heavily during development, more on those is written in Appendix~\ref{appendix:whynix}. \tab{Tool/Library-Usage Matrix}{tab:toolchain}{1.0}{ll}{ \textbf{Tool/Library} & \textbf{Usage} \\ Go programming language & program core \\ Dhall configuration language & program configuration \\ Echo & HTTP handlers, controllers \\ ent & ORM using graph-based modelling \\ pq & Pure-Go Postgres drivers \\ bluemonday & sanitising HTML \\ TailwindCSS & utility-first approach to Cascading Style Sheets \\ PostgreSQL & persistent data storage \\ } Table~\ref{tab:depsversionmx} contains the names and versions of the most important libraries and supporting software that were used to build the application. \tab{Dependency-Version Matrix}{tab:depsversionmx}{1.0}{ll}{ \textbf{Name} & \textbf{version} \\ \texttt{echo} (\url{https://echo.labstack.com/}) & 4.11.1 \\ \texttt{go-dhall} (\url{https://github.com/philandstuff/dhall-golang}) & 6.0.2\\ \texttt{ent} (\url{https://entgo.io/}) & 0.12.3 \\ \texttt{pq} (\url{https://github.com/lib/pq/}) & 1.10.9 \\ \texttt{bluemonday} (\url{https://github.com/microcosm-cc/bluemonday}) & 1.0.25 \\ \texttt{tailwindcss} (\url{https://tailwindcss.com/}) & 3.3.0 \\ \texttt{PostgreSQL} (\url{https://www.postgresql.org/}) & 15.3 \\ } Additionally, the dependency-version mapping for the Go program can be inferred from looking at the \texttt{go.mod}'s first \textit{require} block at any point in time. The same can be achieved for \emph{frontend} by glancing at the \texttt{package-lock.json} file. \n{1}{Application architecture} The application is written in Go and uses \textit{gomodules}. The full name of the module is \texttt{git.dotya.ml/mirre-mt/pcmt}. \obr{Application class diagram}{fig:classdiagram}{.79}{graphics/pcmt-class-diagram.pdf} \n{2}{Package structure} The source code of the module is organised into smaller, self-contained Go \emph{packages} appropriately along a couple of domains: logging, core application, web routers, configuration and settings, etc. In Go, packages are delimited by folder structure -- each folder can be a package. Generally speaking, the program aggregates decision points into central places, such as \texttt{run.go}, which then imports child packages that facilitate each of the tasks of loading the configuration, connecting to the database and running migrations, consolidating flag, environment variable and configuration-based values into canonical \emph{settings} \texttt{struct}, setting up web routes, authenticating requests, or handling \texttt{signals} and performing graceful shutdowns. \n{3}{Internal package} The \texttt{internal} package was not used as of writing, but the author plans to eventually migrate \emph{internal} logic of the program into the internal package to prevent accidental imports. \n{2}{Logging} The program uses \emph{dependency injection} to share a single logger instance (the same technique is also used to share the database client). This logger is then passed around as a pointer, so that the underlying data stays the same or is modified concurrently for all consumers. As a rule of thumb throughout the application, every larger \texttt{struct} that needs to be passed around is passed around as a pointer. An experimental (note: not anymore, with \texttt{go1.21} it was brought into Go's \textit{stdlib}) library for \textit{structured} logging \texttt{slog} was used to facilitate every logging need the program might have. It supports both JSON and plain-text logging, which was made configurable by the program. Either a configuration file value or an environment variable can be used to set this. There are four log levels available by default (\texttt{DEBUG}, \texttt{INFO}, \texttt{WARNING}, \texttt{ERROR}) and the pertinent library funtions are parametric. The first parameter of type \texttt{string} is the main message, that is supplied as a \emph{value} to the \emph{key} named appropriately `\texttt{msg}', a feature of structured loggers which can later be used for filtering. Any other parameters need to be supplied in pairs, serving as key and value, respectively. This main \texttt{slog} interface has been extended in package \texttt{slogging} to also provide the formatting functionality of the \texttt{fmt} standard library package. This was achieved by directly embedding \texttt{slog.Logger} in a custom \texttt{struct} type named \texttt{Slogger} and implementing the additional methods on the custom type. The new type that embeds the original \texttt{slog.Logger} gets to keep its methods thanks to the composition nature of Go. Thus, common formatting directives like the one seen in Listing~\ref{goFmtExpression} are now supported with the custom logger, in addition to anything the base \texttt{slog.Logger} offers. \vspace{\parskip} \begin{lstlisting}[language=Go, caption={Example formatting expression supplied to the logger}, label=goFmtExpression, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, otherkeywords={\%s, \%q, \%v}, ] slogger.Debugf("operation %q for user %q completed at %s", op, usr.ID, time.Now()) \end{lstlisting} Furthermore, functionality was added to support changing the log level at runtime, which is a convenient feature in certain situations. \n{2}{Authentication} The authentication logic is relatively simple and its core has mostly been isolated into a custom \emph{middleware}. User passwords are hashed using a secure KDF before ever being sent to the database. The KDF of choice is \texttt{bcrypt} (with a sane \emph{Cost} of 10), which automatically includes \emph{salt} for the password and provides ``length-constant'' time hash comparisons. The author plans to add support for the more modern \texttt{scrypt} and the state-of-the-art, P-H-C (Password Hashing Competition) winner algorithm \texttt{Argon2} (\url{https://github.com/P-H-C/phc-winner-argon2}) for flexibility. \n{2}{SQLi prevention} No raw SQL queries are directly used to access the database, thus decreasing the likelihood of SQL injection attacks. Instead, parametric queries are constructed in code using a graph-like API of the \texttt{ent} library, which is attended to in depth in Section~\ref{sec:dbschema}. \n{2}{Configurability} Virtually any important value in the program has been made into a configuration value, so that the operator can customise the experience as needed. A choice of sane configuration defaults was attempted, which resulted in the configuration file essentially only needing to contain secrets, unless there is a need to override the defaults. It is not entirely a \emph{zero-config} situation, rather a \emph{minimal-config} one. An example can be seen in Section~\ref{sec:configuration}. Certain options deemed important enough (this was largely subjective) were additionally made into command-line \emph{flags}, using the standard library package \texttt{flags}. Users wishing to display all available options can append the program with the \texttt{-help} flag, a courtesy of the mentioned \texttt{flags} package. \vspace*{-\baselineskip} \paragraph{\texttt{-host } (string)}{Takes one argument and specifies the hostname, or the address to listen on.} \vspace*{-\baselineskip} \paragraph{\texttt{-port } (int)}{This flag takes one integer argument and specifies the port to listen on. The argument is validated at program start-up and the program has a fallback built in for the case that the supplied value is bogus, such as a string or a number outside the allowed TCP range $1-65535$.} \vspace*{-\baselineskip} \paragraph{\texttt{-printMigration}}{A boolean option that if set, makes the program print any \textbf{upcoming} database migrations (based on the current state of the database) and exit. The connection string environment variable still needs to be set in order to be able connect to the database and perform the schema \emph{diff}. This option is mainly useful during debugging.} \vspace*{-\baselineskip} \paragraph{\texttt{-devel}}{This flag instructs the program to enter \textit{devel mode}, in which all templates are re-parsed and re-executed upon each request, and the default log verbosity is changed to level \texttt{DEBUG}. Should not be used in production.} \vspace*{-\baselineskip} \paragraph{\texttt{-import } (string)}{This option tells the program to perform an import of local breach data into program's main database. Obviously, the database connection string environment variable also needs to be present for this. The option takes one argument that is the path to file formatted according to the \texttt{ImportSchema} (consult Listing~\ref{breachImportSchema}). The program prints the result of the import operation, indicating success or failure, and exits.} \vspace*{-\baselineskip} \paragraph{\texttt{-version}}{As could probably be inferred from its name, this flag makes the program to print its own version (that has been embedded into the binary at build time) and exit. A release binary would print something akin to a \emph{semantic versioning}-compliant git tag string, while a development binary might simply print the truncated commit ID (consult \texttt{Containerfile} and \texttt{justfile}) of the sources used to build it.} \n{2}{Embedded assets} An important thing to mention is embedded assets and templates. Go has multiple mechanisms to natively embed arbitrary files directly into the binary during the regular build process. \texttt{embed.FS} from the standard library' \texttt{embed} package was used to bundle all template files and web assets, such as images, logos and stylesheets at the module level. These are then passed around the program as needed, such as to the \texttt{handlers} package. There is also a toggle in the application configuration (\texttt{LiveMode}), which instructs the program at start-up to either rely entirely on embedded assets, or pull live template and asset files from the filesystem. The former option makes the application more portable as it is wholy self-contained, while the latter allows for flexibility and customisation not only during development. Where the program looks for assets and templates in \emph{live mode} is determined by another configuration options: \texttt{assetsPath} and \texttt{templatePath}. \n{2}{Composability} The core templating functionality was provided by the \texttt{html/template} Go standard library package. Echo's \texttt{Renderer} interface has been implemented, so that template rendering could be performed directly using Echo's built-in facilities in a more ergonomic manner using \texttt{return c.Render(http.StatusOk, "home.tmpl")}. \vspace{\parskip} \begin{lstlisting}[ caption={Conditionaly enabling functionality inside a Go template based on user access level}, label=tmplConditionals, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, morekeywords={if,and,end}, ] {{ if and .User .User.IsLoggedIn .User.IsAdmin }} ... {{ end }} \end{lstlisting} Templates used for rendering of the web pages were created in a composable manner, split into smaller, reusable parts, such as \texttt{footer.tmpl} and \texttt{head.tmpl}. Those could then be included e.g.\ using \texttt{\{\{ template "footer.tmpl" \}\}}. Specific functionality is conditionally executed based on the determined level of access of the user, see Listing~\ref{tmplConditionals} for reference. A popular HTML sanitiser \texttt{bluemonday} has been employed to aid with battling XSS. The program first runs every template through the sanitiser before rendering it, so that any user-controlled inputs are handled safely. A dynamic web application should include a CSP configuration. The program therefore has the ability to calculate the hashes (SHA256/SHA384) of its assets (scripts, images) on the fly and it is able to use them inside the templates. This unlocks potentially using third party assets without opening up CSP with directives like \texttt{script-src 'unsafe-hashes'}. It also means that there is no need to maintain a set of customised \texttt{head} templates with pre-computed hashes next to script sources, since the application can perform the necessary calculations in user's stead. \n{2}{Server-side rendering} The application constructs the web pages \emph{entirely} on the server side, and it runs without a single line of JavaScript, of which the author is especially proud. It improves load times, decreases the attack surface, increases maintainability and reduces cognitive load that is required when dealing with JavaScript. Of course, that requires extensive usage of non-semantic \texttt{POST} requests in web forms even for data \emph{updates} (where HTTP \texttt{PUT}s should be used) and the accompanying frequent full-page refreshes, but that still is not enough to warrant the use of JavaScript. \n{2}{Frontend} Frontend-side, the application Tailwind was used for CSS. It promotes the usage of flexible \emph{utility-first} classes in the HTML markup instead of separating out styles from content. Understandably, this is somewhat of a preference issue and the author does not hold hard opinions in either direction; however, it has to be noted that this approach empirically allows for rather quick UI prototyping. Tailwind was chosen for having a reasonably detailed documentation and offering built-in support for dark/light mode, and partially also because it \emph{looks} nice. The Go templates containing the CSS classes need to be parsed by Tailwind in order t produce the final stylesheet that can be bundled with the application. The upstream provides an original CLI tool (\texttt{tailwindcss}), which can be used exactly for that action. Simple and accessible layouts were overall preferred, a single page was rather split into multiple when becoming convoluted. Data-backed efforts were made to create reasonably contrasting pages. \n{3}{Frontend experiments} As an aside, the author has briefly experimented with WebAssembly to provide client-side dynamic functionality for this project, but has ultimately scrapped it in favour of the entirely server-side rendered approach. It is possible that it would get revisited in the future if necessary, and performance mattered. Even from the short experiments it was obvious how much faster WebAssembly was when compared to JavaScript. % \newpage \n{2}{User isolation} \obr{Application use case diagram}{fig:usecasediagram}{.9}{graphics/pcmt-use-case.pdf} Users are only allowed into specific parts of the application based on the role they currently possess (Role-based Access Control). While this short list might get amended in the future, initially only two basic roles were envisioned: \begin{itemize} \item Administrator \item User \end{itemize} It is paramount that the program protects itself from the insider threats as well and therefore each role is only able to perform actions that it is explicitly assigned. While there definitely is certain overlap between the capabilities of the two outlined roles, each also possesses unique features that the other one does not. For instance, the administrator role is not able to perform searches on the breach data directly, for that a separate \emph{user} account has to be devised. Similarly, a regular user is not able to manage breach lists and other users, because that is a privileged operation. In-application administrators are not able to view (any) sensitive user data and should therefore only be able to perform the following actions: \begin{itemize} \item Create user accounts \item View user listing \item View user details \item Change user details, including administrative status \item Delete user accounts \item Refresh breach data from online sources \end{itemize} Let us consider a case when a user performs an operation on their own account. While demoting from administrator to a regular user should be permitted, promoting self to be an administrator would constitute a \emph{privilege escalation} and likely be a precursor to at least a \emph{denial of service} of sorts, as there would be nothing preventing the newly-\emph{admined} user from disabling the accounts of all other administrators. \n{2}{Zero trust principle} \textit{Confidentiality, i.e.\ not trusting the provider} There is no way for the application (and consequently, the in-application administrator) to read user's data (such as saved search queries). This is possible by virtue of encrypting the pertinent data before saving them in the database by a state-of-the-art \texttt{age} tool (backed by X25519)~\cite{age},~\cite{x25519rfc7748}. The \texttt{age} \emph{identity} itself is in turn encrypted by a passphrase that only the user controls. Of course, the user-supplied password is run by a password based key derivation function (\texttt{argon2}, version \emph{id} with the officially {recommended} configuration parameters) before letting it encrypt the \emph{age} key. The \texttt{age} identity is only generated once the user changes their password for the first time, in an attempt to prevent scenarios like the in-application administrator with access to physical database being able to both \textbf{recover} the key from the database and \textbf{decrypt} it given that they already know the user password (because they set it when they created the user), which would subsequently give them unbounded access to any future encrypted data, as long as they would be able to maintain their database access. This is why generating the \texttt{age} identity is are bound to the first password change. Of course, the supposed evil administrator could simply perform the password change themselves! However, the user would at least be able to find those changes in the activity logs and know to \emph{not} use the application under such circumstances. But given the scenario of a total database compromise, the author finds that all hope is \emph{already} lost at that point. At least when the database is dumped, it should only contain non-sensitive, functional information in plain text, everything else should be encrypted. Consequently, both the application operators and the in-application administrators should ideally never be able to learn the details of what the user is tracking/searching for, the same being by extension applicable even to potential attackers with direct access to the database. Thus, the author maintains that every scenario that could potentially lead to a data breach (apart from a compromised actual user password) would have to entail some form of operating memory acquisition on the machine hosting the application, for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the \emph{hypervisor}, if considering a virtualised (``cloud'') environments. \n{1}{Implementation} \n{2}{Dhall Configuration Schema}\label{sec:configuration} The configuration schema was at first being developed as part of the main project's repository, before it was determined that it would benefit both the development and overall clarity if the schema lived in its own repository (see Section~\ref{sec:repos} for details). This enabled the schema to be independently developed and versioned, and only be pulled into the main application whenever it was determined to be ready. % \vspace{\parskip} \smallskip % \vspace{\baselineskip} \begin{lstlisting}[language=Haskell, caption={Dhall configuration schema version 0.0.1-rc.2}, label=dhallschema, basicstyle=\linespread{0.9}\footnotesize\ttfamily, backgroundcolor=\color{lstbg}, morekeywords={Text, Natural, Optional, Type} ] let Schema = { Type = { Host : Text , Port : Natural , HTTP : { Domain : Text , Secure : Bool , AutoTLS : Bool , TLSKeyPath : Text , TLSCertKeyPath : Text , HSTSMaxAge : Natural , ContentSecurityPolicy : Text , RateLimit : Natural , Gzip : Natural , Timeout : Natural } , Mailer : { Enabled : Bool , Protocol : Text , SMTPAddr : Text , SMTPPort : Natural , ForceTrustServerCert : Bool , EnableHELO : Bool , HELOHostname : Text , Auth : Text , From : Text , User : Text , Password : Text , SubjectPrefix : Text , SendPlainText : Bool } , LiveMode : Bool , DevelMode : Bool , AppPath : Text , Session : { CookieName : Text , CookieAuthSecret : Text , CookieEncrSecret : Text , MaxAge : Natural } , Logger : { JSON : Bool, Fmt : Optional Text } , Init : { CreateAdmin : Bool, AdminPassword : Text } , Registration : { Allowed : Bool } } } \end{lstlisting} \vspace*{-\baselineskip} Full schema with type annotations can be seen in Listing~\ref{dhallschema}. \newpage The \texttt{let} statement declares a variable called \texttt{Schema} and assigns to it the result of the expression on the right side of the equals sign, which has for practical reasons been trimmed and is displayed without the \emph{default} block. The default block is instead shown in its own Listing~\ref{dhallschemadefaults}. The main configuration comprises both raw attributes and child records, which allow for grouping of related functionality. For instance, configuration settings pertaining mailserver setup are grouped in a record named \textbf{Mailer}. Its attribute \textbf{Enabled} is annotated as \textbf{Bool}, which was deemed appropriate for an on-off switch-like functionality, with the only permissible values being either \emph{True} or \emph{False}. Do note that in Dhall $true\ != True$, since internally \textbf{True} is a \texttt{Bool} constant built directly into Dhall (see ``The Prelude''~\cite{dhallprelude} for reference), while \textbf{true} is evaluated as an \emph{unbound} variable, that is, a variable \emph{not} defined in the current \emph{scope} and thus not \emph{present} in the current scope. Another one of Dhall's specialties is that `$==$' and `$!=$' (in)equality operators \textbf{only} work on values of type \texttt{Bool}, which for example means that variables of type \texttt{Natural} (\texttt{uint}) or \texttt{Text} (\texttt{string}) cannot be compared directly as is the case in other languages. That either leaves the comparing work for a higher-level language (such as Go). Alternatively, from the perspective of the Dhall authors \emph{enums} are the promoted way to solve this when the value matters, i.e.\ derive a custom \emph{named} type from a primitive type and compare \emph{that}. \newpage % \vspace{\parskip} \begin{lstlisting}[language=Haskell, caption={Dhall configuration defaults for schema version 0.0.1-rc.2}, label=dhallschemadefaults, basicstyle=\linespread{0.9}\footnotesize\ttfamily, backgroundcolor=\color{lstbg}, ] , default = -- | have sane defaults. { Host = "" , Port = 3000 , HTTP = { Domain = "" , Secure = False , AutoTLS = False , TLSKeyPath = "" , TLSCertKeyPath = "" , HSTSMaxAge = 0 , ContentSecurityPolicy = "" , RateLimit = 0 , Gzip = 0 , Timeout = 0 } , Mailer = { Enabled = False , Protocol = "smtps" , SMTPAddr = "" , SMTPPort = 465 , ForceTrustServerCert = False , EnableHELO = False , HELOHostname = "" , Auth = "" , From = "" , User = "" , Password = "" , SubjectPrefix = "pcmt - " , SendPlainText = True } , LiveMode = -- | LiveMode controls whether the application looks for -- | directories "assets" and "templates" on the filesystem or -- | in its bundled Embed.FS. False , DevelMode = False , AppPath = -- | AppPath specifies where the program looks for "assets" and -- | "templates" in case LiveMode is True. "." , Session = { CookieName = "pcmt_session" , CookieAuthSecret = "" , CookieEncrSecret = "" , MaxAge = 3600 } , Logger = { JSON = True, Fmt = None Text } , Init = { CreateAdmin = -- | if this is True, attempt to create a user with admin -- | privileges with the password specified below False , AdminPassword = -- | used for the first admin, forced change on first login. "50ce50fd0e4f5894d74c4caecb450b00c594681d9397de98ffc0c76af5cff5953eb795f7" } , Registration.Allowed = True } } in Schema \end{lstlisting} \vspace*{-\baselineskip} \vspace*{-\baselineskip} \vspace*{-\baselineskip} \n{2}{Data integrity and authenticity} The user can interact with the application via a web client, such as a browser, and is required to authenticate for all sensitive operations. To not only know \emph{who} the user is but also make sure they are \emph{permitted} to perform the action they are attempting, the program employs an \emph{authorisation} mechanism in the form of sessions. These are on the client side represented by cryptographically signed and encrypted (using 256-bit AES) HTTP cookies. That lays foundations for a few things: the data saved into the cookies can be regarded as private because short of future \emph{quantum computers} only the program itself can decrypt and access the data, and the data can be trusted since it is both signed using the key that only the program controls and \emph{encrypted} with \emph{another} key that equally only the program holds. The cookie data is only ever written \emph{or} read at the server side, solidifying the authors decision to let it be encrypted, as there is no point in not encrypting it for some perceived client-side simplification. Users navigating the website send their session cookie (if it exists) with \textbf{every request} to the server, which subsequently verifies the integrity of the data and in case it is valid, determines the existence and potential amount of user privilege that should be granted. Public endpoints do not mandate the presence of a valid session by definition, while at protected endpoints the user is authenticated at every request. When a session expires or if there is no session to begin with, the user is either shown a \emph{Not found} error message, the \emph{Unauthorised} error message or redirected to \texttt{/signin}, depending on the endpoint or resource, as can be seen, this behaviour is not uniform and depends on the resource and/or the endpoint. Another aspect that contributes to data integrity from \emph{another} point of view is utilising database \emph{transactions} for bundling together multiple database operations that collectively change the \emph{state}. Using the transactional jargon, the data is only \emph{committed} if each individual change was successful. In case of any errors, the database is instructed to perform an atomic \emph{rollback}, which brings it back to a state before the changes were ever attempted. The author has additionally considered the thought of utilising an embedded immutable database like immudb (\url{https://immudb.io}) for record keeping (verifiably storing data change history) and additional data integrity checks, e.g.\ for tamper protection purposes and similar; however, that work remains yet to be materialised. \n{2}{Database schema}\label{sec:dbschema} The database schema is not being created by manually typing out SQL statements. Instead, an Object-relational Mapping (ORM) tool named \texttt{ent} is used, which allows defining the table schema and relations entirely in Go. The upside of this approach is that the \emph{entity} types are natively understood by code editors, and they also get type-checked by the compiler for correctness, preventing all sorts of headaches and potential bugs. Since \texttt{ent} encourages the usage of \emph{declarative migrations} at early stages of the project, it is not required for the database schema to exist on application start-up in form of raw SQL (or HCL). Instead, \texttt{ent} only requires a valid connection string providing reasonably privileged access to the database and it handlers the database configuration by auto-generating SQL with the help of the companion embedded library \texttt{Atlas} (\url{https://atlasgo.io/}). The upstream project (\texttt{ent}) encourages moving to otherwise more traditional \emph{versioned migrations} for more mature projects, so that is on the roadmap for later. The best part about using \texttt{ent} is that there is no need to define supplemental methods on the models, as with \texttt{ent} these are meant to be \emph{code generated} (in the older sense of word, not with Large Language Models) into existence. Code generation creates files with actual Go models based on the types of the attributes in the database schema model, and the respective relations are transformed into methods on the receiver or functions taking object attributes as arguments. For instance, if the model's attribute is a string value \texttt{Email}, ent can be used to generate code that contains methods on the user object like the following: \begin{itemize} \item \texttt{EmailIn(pattern string)} \item \texttt{EmailEQ(email string)} \item \texttt{EmailNEQ(email string)} \item \texttt{EmailHasSuffix(suffix string)} \end{itemize} These methods can further be imported into other packages and this makes working with the database a morning breeze. All the database \empg{entity} IDs were declared as type \texttt{UUID} (\emph{universally unique ID, theoretically across space and time}), contrary to the more traditional \emph{integer} IDs. Support for \texttt{UUID}s was provided natively by the supported databases and in Go via a popular and vetted open-source library (\url{github.com/google/uuid}). Among the upsides of using \texttt{UUID}s over integer IDs is that there is no need to manually increment the ID. But more importantly, there is also the fact that compared to 32-bit\footnotemark{} signed integers the \texttt{UUID} is a somewhat randomly generated 16 byte (128 bit) array, reducing chances of collision. Barring higher chances of preventing conflicts during imports of foreign databases, this design decision might not provide any advantage for the current system \emph{at the moment}. It could, however, hold importance in the future, should the database ever be deployed in a replicated, high-availability (HA) manner with more than one concurrent \emph{writer} (replicated application instances). \footnotetext{In Go, integer size is architecture dependent, see \url{https://go.dev/ref/spec#Numeric_types}} The relations between entities as modelled with \texttt{ent} can be imagined as the edges connecting the nodes of a directed \emph{graph}, with the nodes representing the entities. This conceptualisation lends itself for a more human-friendly querying language, where the directionality can be expressed with words describing ownership \n{1}{Deployment} A deployment setup as suggested in Section~\ref{sec:deploymentRecommendations} is already \emph{partially} covered by the multi-stage \texttt{Containerfile} that is available in the main sources. Once built, the resulting container image only contains a handful of things it absolutely needs: \begin{itemize} \item a self-contained statically linked copy of the program \item a default configuration file and corresponding Dhall expressions cached at build time \item a recent CA certs bundle \end{itemize} Since the program also needs a database for proper functioning, an example scenario includes the application container being run in a Podman \textbf{pod} (as in a pea pod or pod of whales) together with the database. That results in not having to expose the database to the entire host or out of the pod at all, it is only available over pod's \texttt{localhost}. Hopefully it goes without saying that the default values of any configuration secrets should be substituted by the application operator with new, securely generated ones (read: using \texttt{openssl rand} or \texttt{pwgen}). \n{2}{Rootless Podman} Assuming rootless Podman set up and the \texttt{just} tool installed on the host, the application could be deployed by following a series of relatively simple steps: \begin{itemize} \item build (or pull) the application container image \item create a pod with user namespacing, exposing the application port \item run the database container inside the pod \item run the application inside the pod \end{itemize} In concrete terms, it would resemble something along the lines of Listing~\ref{podmanDeployment}. Do note that all the commands are executed under the unprivileged \texttt{user@containerHost} that is running rootless Podman, i.e.\ it has \texttt{UID}/\texttt{GID} mapping entries in \texttt{/etc/setuid} and \texttt{\etc/setgid} files \textbf{prior} to running any Podman commands. % \newpage \begin{lstlisting}[language=bash, caption={Example application deployment using rootless Podman}, label=podmanDeployment, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, commentcolor=\color{Gray}, morekeywords={mkdir,podman,just}, ] # From inside the project folder, build the image locally using kaniko. just kaniko # Create a pod, limit the amount of memory/CPU available to its containers. podman pod create --replace --name pcmt \ --memory=100m --cpus=2 \ --userns=keep-id -p3005:3000 # Create the database folder and run the database in the pod. mkdir -pv ./tmp/db podman run --pod pcmt --replace -d --name "pcmt-pg" --rm \ -e POSTGRES_INITDB_ARGS="--auth-host=scram-sha-256 \ --auth-local=scram-sha-256" \ -e POSTGRES_PASSWORD=postgres \ -v $PWD/tmp/db:/var/lib/postgresql/data:Z \ --health-cmd "sh -c 'pg_isready -U postgres -d postgres'" \ --health-on-failure kill \ --health-retries 3 \ --health-interval 10s \ --health-timeout 1s \ --health-start-period=5s \ docker.io/library/postgres:15.2-alpine3.17 # Run the application itself in the pod. podman run --pod pcmt --replace --name pcmt-og -d --rm \ -e PCMT_LIVE=False \ -e PCMT_DBTYPE="postgres" \ -e PCMT_CONNSTRING="host=pcmt-pg port=5432 sslmode=disable \ user=postgres dbname=postgres password=postgres" -v $PWD/config.dhall:/config.dhall:Z,ro \ docker.io/immawanderer/mt-pcmt:testbuild -config /config.dhall \end{lstlisting} % \vspace*{-\baselineskip} To summarise Listing~\ref{podmanDeployment}, first the application container is built from inside the project folder using \texttt{kaniko}. The container image could alternatively be pulled from the container repository, but it makes more sense showing the image being built from sources with the listing depicting a \texttt{:testbuild} tag being used. Next, a \emph{pod} is created and given a name, setting the port binding for the application. Then, the database container is started inside the pod, configured with a healthchecking mechanism. As a final step, the application container itself is run inside the pod. The application configuration named \texttt{config.dhall} located in \texttt{\$PWD} is mounted as a volume into container's \texttt{/config.dhall}, providing the application with a default configuration. The default container does contain a default configuration for reference, however, running the container without additionally providing the necessary secrets would fail. \n{3}{Sanity checks} Also do note that the application connects to the database using its \emph{container} name, i.e.\ not the IP address. This is possible thanks to Podman setting up DNS resolution inside pods using default networks in such a way that all containers in the pod can reach each other using their (container) names. Interestingly, connecting via \texttt{localhost} from containers inside the pod would also work. Inside the pod, any container in the pod can reach any other container in the same pod via \emph{pod's} own \texttt{localhost}, thanks to a shared network name space~\cite{podmanNet}. In fact, \emph{pinging} (sending ICMP packets using the \texttt{ping} command) the database and application containers from an ad-hoc Alpine Linux container that just joined the pod temporarily yields: \vspace{\parskip} \begin{lstlisting}[language=bash, caption={Pinging pod containers using their names}, label=podmanPing, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, morekeywords={podman,ping} ] user@containerHost % podman run --rm -it \ --user=0 \ --pod=pcmt \ docker.io/library/alpine:3.18 / % ping -c2 pcmt-og PING pcmt-og ( 56 data bytes 64 bytes from seq=0 ttl=42 time=0.072 ms 64 bytes from seq=1 ttl=42 time=0.118 ms --- pcmt-og ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.072/0.095/0.118 ms / % ping -c2 pcmt-pg PING pcmt-pg ( 56 data bytes 64 bytes from seq=0 ttl=42 time=0.045 ms 64 bytes from seq=1 ttl=42 time=0.077 ms --- pcmt-pg ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.045/0.061/0.077 ms / % \end{lstlisting} Was the application deployed in a traditional manner instead of using Podman, the use of FQDNs or IPs would be probably be necessary, as there would be no magic resolution of container names happening transparently in the background. \n{3}{Database isolation from the host} A keen observer has undoubtedly noticed that the pod constructed in Listing~\ref{podmanDeployment} did only create the binding for a port used by the application (\texttt{5005/tcp}). The Postgres default port \texttt{5432/tcp} is not among pod's port bindings, as can be seen in the pod creation command in the said listing. This can also easily be verified using the command in Listing~\ref{podmanPortBindings}: \begin{lstlisting}[language=bash, caption={Podman pod port binding inspection}, label=podmanPortBindings, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, morekeywords={podman}, ] user@containerHost % podman pod inspect pcmt \ --format="Port bindings: {{.InfraConfig.PortBindings}}\n\ Host network: {{.InfraConfig.HostNetwork}}" Port bindings: map[3000/tcp:[{ 5005}]] Host network: false \end{lstlisting} \vspace*{-\baselineskip} To be absolutely sure that the database is available only internally in the pod (unless, of course, there is another process listening on the subject port), and that connecting to the database from outside the pod (i.e. from the container host) really \emph{does} fail, the following commands can be issued: \begin{lstlisting}[language=bash, caption={In-pod database is unreachable from the host}, breaklines=true, label=podDbUnreachable, basicstyle=\linespread{0.9}\small\ttfamily, backgroundcolor=\color{lstbg}, ] user@containerHost % curl localhost:5432 --> curl: (7) Failed to connect to localhost port 5432 after 0 ms: Couldn't connect to server \end{lstlisting} \vspace*{-\baselineskip} The error in Listing~\ref{podDbUnreachable} is indeed expected, as it is the result of the database port not been exposed from the pod. Of course, since a volume (essentially a bind mount) from the host is used, the actual data is still accessible on the host, both to privileged users and the user running the pod. On the host with SELinux support, the \texttt{:Z} volume addendum at least ensures that the content of the volume is directly inaccessible to other containers, including the application container running inside the same pod, via SELinux labelling. \n{3}{Health checks} Running the containers with health checks can be counted among the few crucial settings. That way the container runtime can periodically \emph{check} that the application running inside the container is behaving correctly and instructions can be provided on what action should be taken, should the health of the application evaluate unsatisfyingly. Furthermore, different sets of health checking commands can be passed with Podman for start-up and runtime. \n{2}{Reverse proxy configuration} If the application is deployed behind a reverse proxy, such as NGINX, the configuration snippet in Listing~\ref{nginxSnip} might apply. Do note how the named upstream server \texttt{pcmt} references the port that was exposed from the pod created in Listing~\ref{podmanDeployment}. \begin{lstlisting}[caption={Example reverse proxy configuration snippet}, breaklines=true, label=nginxSnip, basicstyle=\linespread{0.9}\scriptsize\ttfamily, backgroundcolor=\color{lstbg}, morekeywords={upstream,server,return,listen,server_name,add_header,access_log,error_log,location,proxy_pass,proxy_set_header,allow,include,more_set_headers,ssl_buffer_size,ssl_dhparam,ssl_certificate,ssl_certificate_key,http2}, ] upstream pcmt { server; } server { return 301 https://$request_uri; listen 80; listen [::]:80; server_name: www.; return 404; add_header Referrer-Policy "no-referrer, origin-when-cross-origin"; } server { server_name ; access_log /var/log/nginx/.access.log; error_log /var/log/nginx/.error.log; location / { proxy_pass http://pcmt; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-For $proxy_add_forwarded_for; } location /robots.txt { allow all; add_header Content-Type "text/plain; charset=utf-8"; add_header X-Robots-Tag "all, noarchive, notranslate"; return 200 "User-agent: *\nDisallow: /"; } include sec-headers.conf; add_header X-Real-IP $remote_addr; add_header X-Forwarded-For $proxy_add_x_forwarded_for; add_header X-Forwarded-Proto $scheme; more_set_headers 'Early-Data: $ssl_early_data'; listen [::]:443 ssl http2; listen 443 ssl http2; ssl_certificate /etc/letsencrypt/live//fullchain.pem; ssl_certificate_key /etc/letsencrypt/live//privkey.pem; include /etc/letsencrypt/options-ssl-nginx.conf; ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # reduce TTFB ssl_buffer_size 4k; } \end{lstlisting} \vspace*{-\baselineskip} The snippet describes how traffic arriving at port \texttt{80/tcp} (IPv4 or IPv6) that matches the domain name(s) \texttt{\{www.,\}} (\texttt{} being the domain name that the program was configured with, including appropriate DNS records) gets 301-redirected to the same location (\texttt{\$request\_uri}), only over \texttt{HTTPS}. If the server name does not match, a 404 is returned instead. In the main location block, all traffic except for \texttt{/robots.txt} is forwarded to the named backend, with headers added on top by the proxy in order to label the incoming requests as \emph{not} originating at the proxy. The \emph{robots} route is treated specially, immediately returning a directive that disallows crawling of any resource on the page for all. The proxy is also instructed to log access and error events to specific log files, finally load the domain's TLS certificates (obtained out of band), reduce the \texttt{ssl\_buffer\_size} and listen on port \texttt{443/tcp} (dual stack). \n{1}{Validation} \n{2}{Unit tests} Unit testing is a hot topic for many people and the author does not count himself to be a staunch supporter of neither extreme. The ``no unit tests'' opinion seems to discount any benefit there is to unit testing, while a ``TDD-only''\footnotemark{} approach can be a little too much for some people's taste. The author tends to prefer a \emph{middle ground} approach in this particular case, i.e. writing enough tests where meaningful but not necessarily testing everything or writing tests prior to business logic code. Arguably, following the practice of TDD should result in writing a \emph{better designed} code, particularly because there needs to be a prior thought about the shape and function of the code, as it is tested for before it is even written, but it adds a slight inconvenience to what is otherwise a straightforward process. Thanks to Go's built in support for testing via its \texttt{testing} package and the tooling in the \texttt{go} tool, writing tests is relatively simple. Go looks for files in the form \texttt{\_test.go} in the present working directory but can be instructed to look for test files in packages recursively found on any path using the ellipsis, like so: \texttt{go test ./path/to/package/\ldots}, which then \emph{runs} all the tests found, and reports some statistics, such as the time it took to run the test or whether it succeeded or failed. To be precise, the test files also need to contain test functions, which are functions with the signature \texttt{func TestWhatever(t *testing.T)\{\}} and where the function prefix ``Test'' is just as important as the signature. Without it, the function is not considered to be a testing function despite having the required signature and is therefore \emph{not} executed during testing. This test lookup behaviour; however, also has a neat side effect: all the test files can be kept side-by-side their regular source counterparts, there is no need to segregate them into a specially blessed \texttt{tests} folder or similar, which in author's opinion improves readability. As a failsafe, in case no actual test are found, the current behaviour of the tool is to print a note informing the developer that no tests were found, which is handy to learn if it was not intended/expected. When compiling regular source code, the Go files with \texttt{\_test} in the name are simply ignored by the build tool. \footnotetext{TDD, or Test Driven Development, is a development methodology whereby tests are written \emph{first}, then a complementary piece of code that is supposed to be tested is added, just enough to get past the compile errors and to see the test \emph{fail} and then is the code finally refactored to make the test \emph{pass}. The code can then be fearlessly extended because the test is the safety net catching the programmer when the mind slips and alters the originally intended behaviour of the code.} \n{2}{Integration tests} Integrating with external software, namely the database in case of this program, is designed to utilise the same mechanism that was mentioned in the previous section: Go's \texttt{testing} package. These tests verify that the code changes can still perform the same actions with the external software that were possible before the change and are run before every commit locally and then after pushing to remote in the CI. \n{3}{func TestUserExists(t *testing.T)} In the integration test shown in Listing~\ref{integrationtest}, it is prefaced at line 10 by declaring a helper function \texttt{getCtx() context.Context}, which takes no arguments and returns a new\\ \texttt{context.Context} initialised with the value of the global logger. As previously mentioned, that is how the logger gets injected into the user module functions. The actual test function with the signature \texttt{TestUserExists(t *testing.T)} defines a database connection string at line 21 and attempts to open a connection to the database. The database in use here is SQLite3 running in memory mode, meaning no file is actually written to disk during this process. Since the testing data is not needed after the test, this is desirable. Next, a defer statement calls the \texttt{Close()} method on the database object, which is the Go idiomatic way of closing files and network connections (which are also an abstraction over files on UNIX-like operating systems such as GNU/Linux). Contrary to where it is declared, the \emph{defer} statement is only called after all the statements in the surrounding function, which makes sure no file descriptors (FDs) are leaked and the file is properly closed when the function returns. In the next step at line 25 a database schema creation is attempted, handling the potential error in a Go idiomatic way, which uses the return value from the function in an assignment to a variable declared in the \texttt{if} statement, and checks whether the \texttt{err} was \texttt{nil} or not. In case the \texttt{err} was not \texttt{nil}, i.e.\ \emph{there was an error in the callee function}, the condition evaluates to \texttt{true}, which is followed by entering the inner block. Inside it, the error is announced to the user (likely a developer running the test in this case) and the testing object's \texttt{FailNow()} method is called. That marks the test function as having failed, and thus stops its execution. In this case, that is the desired outcome, since if the database schema creation call fails, there really is no point in continuing the testing of user creation. \\ Conversely, if the schema \emph{does} get created without an error, the code continues to declare a few variables (lines 30-32): \texttt{username}, \texttt{email} and \texttt{ctx}, where the context injected with the logger is saved. Two of them are subsequently (line 33) passed into the \texttt{UsernameExists} function, \texttt{ctx} being the first argument and the database pointer and \texttt{username} following, while the \texttt{email} variable is only used at a later stage (line 46). The point of declaring them together is to give a sense of relatedness. The error value returned from this function is again checked (line 33) and if everything goes well, the \texttt{usernameFound} boolean value is checked next at line 38. \smallskip \smallskip \begin{lstlisting}[language=Go, caption={User existence integration test}, label=integrationtest,basicstyle=\linespread{0.9}\scriptsize\ttfamily, backgroundcolor=\color{lstbg}, numbers=left, numberstyle=\linespread{0.9}\scriptsize\ttfamily, frame=l, framesep=18.5pt, framerule=0.1pt, xleftmargin=18.7pt, otherkeywords={\%s, \%q, \%v}, ] // modules/user/user_test.go package user import ( "context" "testing" "git.dotya.ml/mirre-mt/pcmt/ent/enttest" "git.dotya.ml/mirre-mt/pcmt/slogging" _ "github.com/xiaoqidun/entps" ) func getCtx() context.Context { l := slogging.Init(false) ctx := context.WithValue(context.Background(), CtxKey{}, l) return ctx } func TestUserExists(t *testing.T) { connstr := "file:ent_tests?mode=memory&_fk=1" db := enttest.Open(t, "sqlite3", connstr) defer db.Close() if err := db.Schema.Create(context.Background()); err != nil { t.Errorf("failed to create schema resources: %v", err) t.FailNow() } username := "dude" email := "dude@b.cc" ctx := getCtx() usernameFound, err := UsernameExists(ctx, db, username) if err != nil { t.Errorf("error checking for username {%s} existence: %q", username, err) } if usernameFound { t.Errorf("unexpected: user{%s} should not have been found", username) } if _, err := EmailExists(ctx, db, email); err != nil { t.Errorf("unexpected: user email '%s' should not have been found", email) } usr, err := CreateUser(ctx, db, email, username, "so strong") if err != nil { t.Errorf("failed to create user, error: %q", err) t.FailNow() } else if usr == nil { t.Error("got nil usr back") t.FailNow() } if usr.Username != username { t.Errorf("got back wrong username, want: %s, got: %s", username, usr.Username, ) } // ...more checks... } \end{lstlisting} Since the database has just been created, there should be no users, which is checked in the body of the \texttt{if} statement (line 35). The same check is then performed using an email (line 42), which is also correctly expected to fail. The final statements of the described test attempts to create a user by calling the function \texttt{CreateUser(...)} at line 46, whose return values are again checked for both error and \emph{nillability}, respectively. The test continues with more of the checks similar to what has been described so far, but the rest was omitted for brevity. As was just demonstrated in the test, a neat thing about error handling in Go is that it allows for very easy checking of all code paths, not just the \emph{happy path} where there are no issues. The recommended approach of immediately explicitly handling (or deciding to ignore) the error is in author's view superior to wrapping hundreds of lines in \texttt{try} blocks and then \emph{catching} (or not) \emph{all the} exceptions, as is the practice in some other languages. \n{2}{Test environment} The application has been deployed in a test environment on author's modest Virtual Private Server (VPS) at \texttt{https://testpcmt.dotya.ml}, protected by \emph{Let's Encrypt}\allowbreak issued, short-lived, ECDSA \texttt{secp384r1} curve TLS certificate, and configured with strict CSP. It is a test instance; therefore limits (and rate-limits) to prevent abuse might be imposed. \\ The test environment makes the program available over both modern IPv6 and legacy IPv4 protocols, to maximise accessibility. Redirects were set up from plain HTTP to HTTPS, as well as from \texttt{www} to non-\texttt{www} domain. The subject domain configuration is hardened by setting the \texttt{CAA} record, limiting certificate authorities (CAs) that are able to issue TLS certificates for it (and let them be trusted by validating clients). Additionally, \textit{HTTP Strict Transport Security} (HSTS) had been enabled for the main domain (\texttt{dotya.ml}) including the subdomains quite some time ago (consult the preload lists in Firefox/Chrome), which mandates that clients speaking HTTP only ever connect to it (and the subdomains) using TLS. The whole deployment has been orchestrated using an Ansible\footnotemark{} playbook created for this occasion, focusing on idempotence with the aim of reliably automating the deployment process. At the same time, it is now described reasonably well in the code. Its code is available at \url{https://git.dotya.ml/mirre-mt/ansible-pcmt.git}. \footnotetext{A Nix-ops approach was considered, however, Ansible was deemed more suitable since the existing host runs Arch.} \n{3}{Deployment validation} % TODO: show the results of testing the app in prod using: % \url{https://testssl.sh/} and % \url{https://gtmetrix.com/reports/testpcmt.dotya.ml/}. The deployed application has been validated using the \textit{Security Headers} tool (see \url{https://securityheaders.com/?q=https%3A%2F%2Ftestpcmt.dotya.ml}), the results of which can be seen in Figure~\ref{fig:secheaders}. It shows that the application sets the \texttt{Cross Origin Opener Policy} to \texttt{same-origin}, which isolates the browsing context exclusively to \textit{same-origin} documents, preventing \textit{cross-origin} documents from loading in the same browser context. \obr{Security Headers scan}{fig:secheaders}{.89}{graphics/screen-securityHeaders} Furthermore, a \texttt{Content Security Policy} of \texttt{upgrade-insecure-requests; default-src 'none'; manifest-src 'self'; font-src 'self'; img-src 'self' https://*; script-src 'self'; style-src 'self'; object-src 'self'; form-action 'self'; frame-ancestors'self'; base-uri 'self'} is set by the program using a header. This policy essentially pronounces the application (whatever domain it happens to be hosted at - \texttt{'self'}) as the only \textit{permissible} source for any scripts, styles and frames, the only destination of web forms. One exception is the \texttt{image-src 'self' https://*} directive, which more leniently also permits images from any \textit{secure} sources. This measure ensures that no unvetted content is ever loaded from elsewhere. The \texttt{Referrer-Policy} header setting of \texttt{no-referrer, strict-origin-when-cross-origin} ensures that user tracking is reduced, since no referrer is included (the \texttt{Referer} header is omitted) when the user navigatse away from the site or somehow send requests outside the application using other means. The \texttt{Permissions-Policy} set to \texttt{geolocation=(), midi=(), sync-xhr=(), microphone=(), camera=(), gyroscope=(), magnetometer=(), fullscreen=(self), payment=()} declares that the application is, for instance, never going to request access to payment information, user microphone or camera devices, or geolocation. \texttt{gobuster} was used in fuzzing mode to aid in uncovering potential application misconfigurations. The wordlists used include: \begin{itemize} \item Anton Lopanitsyn's \texttt{fuzz.txt} (\url{https://github.com/Bo0oM/fuzz.txt/tree/master}) \item Daniel Miessler's \texttt{SecLists} (\url{https://github.com/danielmiessler/SecLists}) \item Sam's \texttt{samlists} (\url{https://github.com/the-xentropy/samlists}) \end{itemize} Many requests yielded 404s for non-existent pages, or possibly pages requiring authentication (\emph{NotFound} is used so as not to disclose page's existence). The program initially also issued quite a few 503s as a result of rate-limiting, until \texttt{gobuster} was tamed using the \texttt{--delay} parameter. Anti-CSRF measures employed by the program caused most of the requests to yield 400s (missing CSRF token), or 403s with a CSRF token. % A Burp test would perhaps be more telling. The deployed application was scanned with Quallys' \textit{SSL Labs} scanner and the results can be seen in Figure~\ref{fig:ssllabs}, confirming that HSTS (includes subdomains) is deployed, the server runs TLS 1.3, the DNS Certificate Authority Authorisation (CAA) is configured for the domain, with the overall grade being A+. \obr{Quallys SSL Labs scan}{fig:ssllabs}{.75}{graphics/screen-sslLabs} \n{1}{Application screenshots} Figure~\ref{fig:homepage} depicts the initial page that a logged-out user is greeted with when they load the application. \obr{Homepage}{fig:homepage}{.84}{graphics/screen-homepage} Figure~\ref{fig:signup} can be seen showing a registration page with input fields turned green after basic validation. Visiting this page with registration disabled in settings would yield a 404. \obr{Registration page}{fig:signup}{.65}{graphics/screen-signup} \newpage \obr{Registration page email error}{fig:signupEmailError}{.54}{graphics/screen-signup-emailError} A sign-up form error telling the user to provide a valid email address is shown in Figure~\ref{fig:signupEmailError}. \obr{Sign-in page}{fig:signin}{.55}{graphics/screen-signin} Figure~\ref{fig:signin} depicts a sign-in form similar to the sign-up one. \obr{Short password error on sign-in}{fig:signinShortPasswd}{.55}{graphics/screen-signin-shortPasswordError} An error in Figure~\ref{fig:signinShortPasswd} prompts the user to lengthen the content of the password field from 3 to at least 20 characters. \newpage \obr{Admin homepage} {fig:adminHome}{.25} {graphics/screen-adminHome} Figure~\ref{fig:adminHome} displays a simple greeting and a logout button. \obr{User management screen} {fig:adminUserManagement}{.85} {graphics/screen-adminUserManagement} Figure~\ref{fig:adminUserManagement} shows the user management screen, which provides links to view user details page, start creating a new user. % \obr{User creation screen} % {fig:adminUserCreate}{.35} % {graphics/screen-adminUserCreate} \obr{User creation: `username not unique' error} {fig:adminUserCreateErrorUsernameNotUnique}{.65} {graphics/screen-adminUserCreateErrorUsernameNotUnique} User creation form can be seen in Figure~\ref{fig:adminUserCreateErrorUsernameNotUnique}. Both regular and admin level users can be created here. In this case, an error is shown, telling the user there is an issue with username uniqueness. User experience of this process could in the future be improved by using a bit of JavaScript (or WebAssembly) to check uniqueness of the username on user's \emph{key-up}. \newpage \obr{Creation of user `demo'} {fig:adminUserCreateDemo}{.75} {graphics/screen-adminUserCreateDemo} The user management screen is again shown in Figure~\ref{fig:adminUserCreateDemo} after user `demo' was created. An informative \emph{flash} message is printed near the top of the page immediately after the action and not shown on subsequent page loads. \obr{User details screen} {fig:adminUserDetail}{.65} {graphics/screen-adminUserDetail} The user details page is depicted in Figure~\ref{fig:adminUserDetail}. The interface presents key information about the user such as ID, username and admin status. Additionally, it provides a link back to the previous page and two buttons: one for editing the user and one for user deletion. \obr{User edit screen} {fig:adminUserEdit}{.65} {graphics/screen-adminUserEdit} Figure~\ref{fig:adminUserEdit} shows the form for user editing with a button `Update' in the bottom for submitting, a couple of checkboxes for toggling `admin' and `active' state of the user. Above those, there are input fields for `username', `email', `password' and the confirmation of the password. \newpage \obr{User deletion confirmation} {fig:adminUserDeleteConfirm}{.55} {graphics/screen-adminUserDeleteConfirmation} When attempting to delete a user, the administrator is presented with the screen shown in Figure~\ref{fig:adminUserDeleteConfirm}, which asks them whether they are absolutely sure to perform an action with permanent consequences. The `Confirm permanent deletion' button is highlighted in intense red colour, while the `Cancel' button is displayed in a light blue tone. There are two additional links: the `All users' one that points to the user management page, and the `Back to detail' one that simply brings the administrator one step back to the user details page. \obr{User deletion post-hoc} {fig:adminUserDeletePostHoc}{.55} {graphics/screen-adminUserDemoDeletion} After successful user deletion, the administrator is redirected back to user management page and a flash message confirming the deletion is printed near the top of the page, as shown in Figure~\ref{fig:adminUserDeletePostHoc}. \obr{Log-out message} {fig:logout}{.20} {graphics/screen-logout} Figure~\ref{fig:logout} shows the message printed to users on logout. \newpage \obr{Manage API keys} {fig:manageAPIKeys}{.65} {graphics/screen-manageAPIKeys} Figure~\ref{fig:manageAPIKeys} shows a page that allows administrators to manage instance-wide API keys for external services, such as \emph{Have I Been Pwned?} or \emph{DeHashed.com}. Do note that these keys are never distributed to clients in any way and are only ever used by the application itself to make the requests on \emph{behalf} of the users. \obr{Import of locally available breach data from the CLI} {fig:localImport}{.99} {graphics/screen-localImport} Figure~\ref{fig:localImport} depicts how formatted breach data can be imported into the program's database using the CLI. % =========================================================================== %