1
0
Fork 0

pract.: add text, reword parts

This commit is contained in:
surtur 2023-08-23 00:12:38 +02:00
parent daa4f58489
commit e2078fcf91
Signed by: wanderer
SSH Key Fingerprint: SHA256:MdCZyJ2sHLltrLBp0xQO0O1qTW9BT/xl5nXkDvhlMCI
2 changed files with 208 additions and 135 deletions

View File

@ -37,14 +37,6 @@ SHA3-256:
Note: Also contains the git history. The hash digest of this file is omitted to avoid chicken-egg problem when wishing to include its digest here.\\
File name: \texttt{pcmt-latex-thesis.tar.zst}\\
\n{2}{\texttt{gobuster} fuzzing logs}\label{appendix:gobusterFuzzingLogs}
Note: A ZSTD-compressed archive of the logs.\\
File name: \texttt{b3sums}\\
Blake3:
{\small\texttt{0000000000000000000000000000000000000000000000000000000000000000}}\\
SHA3-256:
{\small\texttt{0000000000000000000000000000000000000000000000000000000000000000}}\\
\n{2}{\texttt{BLAKE3} checksums}\label{appendix:b3sums}
Note: A list of \texttt{BLAKE3} checksums for the supplementary material
(excluding the checksum files). Check using \texttt{b3sum -c b3sums}.\\

View File

@ -348,10 +348,63 @@ Virtually any important value in the program has been made into a configuration
value, so that the operator can customise the experience as needed. A choice of
sane configuration defaults was attempted, which resulted in the configuration
file essentially only needing to contain secrets, unless there is a need to
override the defaults. It is not entirely \emph{zero-config} situation, rather
a \emph{minimal-config} one. An example can be seen in
override the defaults. It is not entirely a \emph{zero-config} situation,
rather a \emph{minimal-config} one. An example can be seen in
Section~\ref{sec:configuration}.
Certain options deemed important enough (this was largely subjective) were
additionally made into command-line \emph{flags}, using the standard library
package \texttt{flags}. Users wishing to display all available options can
append the program with the \texttt{-help} flag, a courtesy of the mentioned
\texttt{flags} package.
\vspace*{-\baselineskip}
\paragraph{\texttt{-host <hostname/IP>} (string)}{Takes one argument and specifies
the hostname, or the address to listen on.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-port <port number>} (int)}{This flag takes one integer
argument and specifies the port to listen on. The argument is validated at
program start-up and the program has a fallback built in for the case that
the supplied value is bogus, such as a string or a number outside the allowed
TCP range $1-65535$.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-printMigration}}{A boolean option that if set, makes the
program print any \textbf{upcoming} database migrations (based on the current
state of the database) and exit. The connection string environment variable
still needs to be set in order to be able connect to the database and perform
the schema \emph{diff}. This option is mainly useful during debugging.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-devel}}{This flag instructs the program to enter
\textit{devel mode}, in which all templates are re-parsed and re-executed upon
each request, and the default log verbosity is changed to level
\texttt{DEBUG}. Should not be used in production.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-import <path/to/file>} (string)}{This option tells the program
to perform an import of local breach data into program's main database.
Obviously, the database connection string environment variable also needs to
be present for this. The option takes one argument that is the path to file
formatted according to the \texttt{ImportSchema} (consult
Listing~\ref{breachImportSchema}). The program prints the result of the import
operation, indicating success or failure, and exits.}
\vspace*{-\baselineskip}
\paragraph{\texttt{-version}}{As could probably be inferred from its name, this
flag makes the program to print its own version (that has been embedded into
the binary at build time) and exit. A release binary would print something
akin to a \emph{semantic versioning}-compliant git tag string, while a
development binary might simply print the truncated commit ID (consult
\texttt{Containerfile} and \texttt{justfile}) of the sources used to build it.}
\n{2}{Embedded assets}
@ -363,11 +416,13 @@ such as images, logos and stylesheets at the module level. These are then
passed around the program as needed, such as to the \texttt{handlers} package.
There is also a toggle in the application configuration (\texttt{LiveMode}),
which can instruct the program at start-up to either rely entirely on embedded
assets or pull live template and asset files from the filesystem. The former
option makes the application more portable as it is wholly self-contained,
while the latter allows for flexibility and customisation not only during
development.
which instructs the program at start-up to either rely entirely on embedded
assets, or pull live template and asset files from the filesystem. The former
option makes the application more portable as it is wholy self-contained, while
the latter allows for flexibility and customisation not only during
development. Where the program looks for assets and templates in \emph{live
mode} is determined by another configuration options: \texttt{assetsPath} and
\texttt{templatePath}.
\n{2}{Composability}
@ -401,6 +456,15 @@ A popular HTML sanitiser \texttt{bluemonday} has been employed to aid with
battling XSS. The program first runs every template through the sanitiser
before rendering it, so that any user-controlled inputs are handled safely.
A dynamic web application should include a CSP configuration. The program
therefore has the ability to calculate the hashes (SHA256/SHA384) of its assets
(scripts, images) on the fly and it is able to use them inside the templates.
This unlocks potentially using third party assets without opening up CSP with
directives like \texttt{script-src 'unsafe-hashes'}. It also means that there
is no need to maintain a set of customised \texttt{head} templates with
pre-computed hashes next to script sources, since the application can perform
the necessary calculations in user's stead.
\n{2}{Server-side rendering}
@ -409,52 +473,57 @@ and it runs without a single line of JavaScript, of which the author is
especially proud. It improves load times, decreases the attack surface,
increases maintainability and reduces cognitive load that is required when
dealing with JavaScript. Of course, that requires extensive usage of
non-semantic \texttt{POST} requests in web forms even for data updates (where
\texttt{PUT}s should be used) and the accompanying frequent full-page
refreshes, but that still is not enough to warrant the use of JavaScript.
non-semantic \texttt{POST} requests in web forms even for data \emph{updates}
(where HTTP \texttt{PUT}s should be used) and the accompanying frequent
full-page refreshes, but that still is not enough to warrant the use of
JavaScript.
\n{2}{Frontend}
Frontend-side, the application was styled using TailwindCSS, which promotes
usage of flexible \emph{utility-first} classes in the markup (HTML) instead of
separating out the specific styles out into all-encompassing classes. The
author understands this is somewhat of a preference issue and does not hold
hard opinions in either direction, however, a note has to be made that this
approach empirically allows for a rather quick UI prototyping. Tailwind was
chosen partially also because it \emph{looked} nice, had a reasonably detailed
documentation and offered built-in support for dark/light mode. The templates
containing the CSS classes need to be parsed by Tailwind in order to construct
the final stylesheet. Upstream provides the original CLI tool called
\texttt{tailwindcss}, which can be used exactly for that action. Overall, simple
and accessible layouts were preferred, a single page was rather split into
multiple when becoming convoluted, and data-backed efforts were made to create
reasonably contrasting pages.
Frontend-side, the application Tailwind was used for CSS. It promotes the usage
of flexible \emph{utility-first} classes in the HTML markup instead of
separating out styles from content. Understandably, this is somewhat of a
preference issue and the author does not hold hard opinions in either
direction; however, it has to be noted that this approach empirically allows
for rather quick UI prototyping. Tailwind was chosen for having a reasonably
detailed documentation and offering built-in support for dark/light mode, and
partially also because it \emph{looks} nice.
The Go templates containing the CSS classes need to be parsed by Tailwind in
order t produce the final stylesheet that can be bundled with the application.
The upstream provides an original CLI tool (\texttt{tailwindcss}), which can be
used exactly for that action. Simple and accessible layouts were overall
preferred, a single page was rather split into multiple when becoming
convoluted. Data-backed efforts were made to create reasonably contrasting
pages.
\n{3}{Frontend experiments}
As an aside, the author has briefly experimented with WebAssembly for this
project, but has ultimately scrapped the functionality in favour of the
entirely server-side rendered one. It is possible that it would get revisited
if the client-side dynamic functionality was necessary and performance
mattered. Even from the short experiments it was obvious how much faster
WebAssembly was when compared to JavaScript.
As an aside, the author has briefly experimented with WebAssembly to provide
client-side dynamic functionality for this project, but has ultimately scrapped
it in favour of the entirely server-side rendered approach. It is possible that
it would get revisited in the future if necessary, and performance mattered.
Even from the short experiments it was obvious how much faster WebAssembly was
when compared to JavaScript.
\newpage
% \newpage
\n{2}{User isolation}
Users are allowed into certain parts of the application based on the role they
currently possess. For the moment, two basic roles were envisioned, while this
list might get amended in the future, should the need arise:
\obr{Application use case diagram}{fig:usecasediagram}{.9}{graphics/pcmt-use-case.pdf}
Users are only allowed into specific parts of the application based on the role
they currently possess (Role-based Access Control).
While this short list might get amended in the future, initially only two basic
roles were envisioned:
\begin{itemize}
\item Administrator
\item User
\end{itemize}
\obr{Application use case diagram}{fig:usecasediagram}{.9}{graphics/pcmt-use-case.pdf}
It is paramount that the program protects itself from the insider threats as
well and therefore each role is only able to perform actions that it is
explicitly assigned. While there definitely is certain overlap between the
@ -462,26 +531,28 @@ capabilities of the two outlined roles, each also possesses unique features
that the other one does not.
For instance, the administrator role is not able to perform searches on the
breach data directly using their administrator account, for that a separate
user account has to be devised. Similarly, the regular user is not able to
manage breach lists and other users, because that is a privileged operation.
breach data directly, for that a separate \emph{user} account has to be
devised. Similarly, a regular user is not able to manage breach lists and other
users, because that is a privileged operation.
In-application administrators are not able to view sensitive (any) user data
In-application administrators are not able to view (any) sensitive user data
and should therefore only be able to perform the following actions:
\begin{itemize}
\item Create user accounts
\item View list of users
\item View user email
\item Change user email
\item Toggle whether user is an administrator
\item View user listing
\item View user details
\item Change user details, including administrative status
\item Delete user accounts
\item Refresh breach data from online sources
\end{itemize}
Let us consider a case when a user manages self, while demoting from
administrator to a regular user is permitted, promoting self to be an
administrator would constitute a \emph{privilege escalation} and likely be a
precursor to at least a \emph{denial of service} of sorts.
Let us consider a case when a user performs an operation on their own account.
While demoting from administrator to a regular user should be permitted,
promoting self to be an administrator would constitute a \emph{privilege
escalation} and likely be a precursor to at least a \emph{denial of service} of
sorts, as there would be nothing preventing the newly-\emph{admined} user from
disabling the accounts of all other administrators.
\n{2}{Zero trust principle}
@ -489,36 +560,41 @@ precursor to at least a \emph{denial of service} of sorts.
\textit{Confidentiality, i.e.\ not trusting the provider}
There is no way for the application (and consequently, the in-application
administrator) to read user's data. This is possible by virtue of encrypting
the pertinent data before saving them in the database by a state-of-the-art
\emph{age} key~\cite{age} (backed by X25519~\cite{x25519rfc7748}), which is in
turn safely stored encrypted by a passphrase that only the user controls. Of
administrator) to read user's data (such as saved search queries). This is
possible by virtue of encrypting the pertinent data before saving them in the
database by a state-of-the-art \texttt{age} tool (backed by
X25519)~\cite{age},~\cite{x25519rfc7748}. The \texttt{age} \emph{identity}
itself is in turn encrypted by a passphrase that only the user controls. Of
course, the user-supplied password is run by a password based key derivation
function (PBKDF: a key derivation function with a sliding computational cost)
before letting it encrypt the \emph{age} key.
function (\texttt{argon2}, version \emph{id} with the officially {recommended}
configuration parameters) before letting it encrypt the \emph{age} key.
The \emph{age} key is only generated when the user changes their password for
the first time to prevent scenarios such as in-application administrator with
access to physical database being able to both \textbf{recover} the key from
the database and \textbf{decrypt} it given that they already know the user
password (because they set it), which would subsequently give them unbounded
access to any future encrypted data, as long as they would be able to maintain
their database access. This is why the \emph{age} key generation and protection
are bound to the first password change. Of course, the evil administrator could
just perform the change themselves; however, the user would at least be able to
find those changes in the activity logs and know not to use the application.
But given the scenario of a total database compromise, the author finds all
hope is already lost at that point. At least when the database is dumped, it
only contains non-sensitive, functional information in plain test, everything
else should be encrypted.
The \texttt{age} identity is only generated once the user changes their
password for the first time, in an attempt to prevent scenarios like the
in-application administrator with access to physical database being able to
both \textbf{recover} the key from the database and \textbf{decrypt} it given
that they already know the user password (because they set it when they created
the user), which would subsequently give them unbounded access to any future
encrypted data, as long as they would be able to maintain their database
access. This is why generating the \texttt{age} identity is are bound to the
first password change.
Of course, the supposed evil administrator could simply perform the password
change themselves! However, the user would at least be able to find those
changes in the activity logs and know to \emph{not} use the application under
such circumstances. But given the scenario of a total database compromise, the
author finds that all hope is \emph{already} lost at that point. At least when
the database is dumped, it should only contain non-sensitive, functional
information in plain text, everything else should be encrypted.
Consequently, both the application operators and the in-application
administrators should never be able to learn the details of what the user is
tracking, the same being applicable even to potential attackers with direct
access to the database. Thus, the author maintains that every scenario that
could potentially lead to a data breach (apart from a compromised user machine
and the like) would have to entail some form of operating memory acquisition,
for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
administrators should ideally never be able to learn the details of what the
user is tracking/searching for, the same being by extension applicable even to
potential attackers with direct access to the database. Thus, the author
maintains that every scenario that could potentially lead to a data breach
(apart from a compromised actual user password) would have to entail some form
of operating memory acquisition on the machine hosting the application, for
instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
\emph{hypervisor}, if considering a virtualised (``cloud'') environments.
@ -529,9 +605,9 @@ for instance using \texttt{LiME}~\cite{lime}, or perhaps directly the
The configuration schema was at first being developed as part of the main
project's repository, before it was determined that it would benefit both the
development and overall clarity if the schema lived in its own repository (see
Section~\ref{sec:repos} for details). This now enables the schema to be
independently developed and versioned, and only pulled into the main
application whenever the application is determined to be ready for it.
Section~\ref{sec:repos} for details). This enabled the schema to be
independently developed and versioned, and only be pulled into the main
application whenever it was determined to be ready.
% \vspace{\parskip}
@ -592,6 +668,8 @@ let Schema =
Full schema with type annotations can be seen in Listing~\ref{dhallschema}.
\newpage
The \texttt{let} statement declares a variable called \texttt{Schema} and
assigns to it the result of the expression on the right side of the equals
sign, which has for practical reasons been trimmed and is displayed without the
@ -810,16 +888,13 @@ with words describing ownership
\n{1}{Deployment}
\textbf{TODO}: mention how \texttt{systemd} aids in running the pod.\\
\\
A deployment setup as suggested in Section~\ref{sec:deploymentRecommendations}
is already partially covered by the multi-stage \texttt{Containerfile} that is
available in the main sources. Once built, the resulting container image only
contains a handful of things it absolutely needs:
is already \emph{partially} covered by the multi-stage \texttt{Containerfile}
that is available in the main sources. Once built, the resulting container
image only contains a handful of things it absolutely needs:
\begin{itemize}
\item a statically linked copy of the program (contains parts of the Go
stdlib)
\item a self-contained statically linked copy of the program
\item a default configuration file and corresponding Dhall expressions cached
at build time
\item a recent CA certs bundle
@ -831,7 +906,8 @@ scenario includes the application container being run in a Podman \textbf{pod}
not having to expose the database to the entire host or out of the pod at all,
it is only available over pod's \texttt{localhost}. Hopefully it goes without
saying that the default values of any configuration secrets should be
substituted by the application operator with new, securely generated ones.
substituted by the application operator with new, securely generated ones
(read: using \texttt{openssl rand} or \texttt{pwgen}).
\n{2}{Rootless Podman}
@ -891,13 +967,13 @@ podman run --pod pcmt --replace --name pcmt-og -d --rm \
-e PCMT_CONNSTRING="host=pcmt-pg port=5432 sslmode=disable \
user=postgres dbname=postgres password=postgres"
-v $PWD/config.dhall:/config.dhall:Z,ro \
docker.io/immawanderer/pcmt:testbuild -config /config.dhall
docker.io/immawanderer/mt-pcmt:testbuild -config /config.dhall
\end{lstlisting}
\vspace*{-\baselineskip}
% \vspace*{-\baselineskip}
To summarise Listing~\ref{podmanDeployment}, first the application container
is built from inside the project folder using \texttt{kaniko}. Alternatively,
the container image could be pulled from a container repository, but it makes
To summarise Listing~\ref{podmanDeployment}, first the application container is
built from inside the project folder using \texttt{kaniko}. The container
image could alternatively be pulled from the container repository, but it makes
more sense showing the image being built from sources with the listing
depicting a \texttt{:testbuild} tag being used.
@ -1299,24 +1375,24 @@ a test instance; therefore limits (and rate-limits) to prevent abuse might be
imposed.
\\
The test environment makes the program available over both modern IPv6 and
legacy IPv4 protocols, to maximise accessibility. Redirects have been set up
from plain HTTP to HTTPS, as well as from \texttt{www} to non-\texttt{www}
domain. The subject domain configuration has been hardened with a \texttt{CAA}
record limiting certificate authorities (CAs) that are able to issue TLS
legacy IPv4 protocols, to maximise accessibility. Redirects were set up from
plain HTTP to HTTPS, as well as from \texttt{www} to non-\texttt{www} domain.
The subject domain configuration is hardened by setting the \texttt{CAA}
record, limiting certificate authorities (CAs) that are able to issue TLS
certificates for it (and let them be trusted by validating clients).
Additionally, the main domain (\texttt{dotya.ml}) had enabled \textit{HTTP
Strict Transport Security} (HSTS) including the subdomains quite some time ago
(consult the preload lists in Firefox/Chrome), which mandates that clients
speaking HTTP only connect to it (and the subdomains) using TLS.
Additionally, \textit{HTTP Strict Transport Security} (HSTS) had been enabled
for the main domain (\texttt{dotya.ml}) including the subdomains quite some
time ago (consult the preload lists in Firefox/Chrome), which mandates that
clients speaking HTTP only ever connect to it (and the subdomains) using TLS.
The whole deployment has been orchestrated using an Ansible\footnotemark{}
playbook created for this occasion, with the aim to have the whole deployment
process reliably automated, focusing on idempotence. At the same time, it is
now described reasonably well in the code. Its code is available at
playbook created for this occasion, focusing on idempotence with the aim of
reliably automating the deployment process. At the same time, it is now
described reasonably well in the code. Its code is available at
\url{https://git.dotya.ml/mirre-mt/ansible-pcmt.git}.
\footnotetext{A Nix-ops approach was also considered, however, Ansible was
deemed more suitable since the existing host runs Arch.}
\footnotetext{A Nix-ops approach was considered, however, Ansible was deemed
more suitable since the existing host runs Arch.}
\n{3}{Deployment validation}
@ -1329,10 +1405,10 @@ The deployed application has been validated using the \textit{Security Headers}
tool (see \url{https://securityheaders.com/?q=https%3A%2F%2Ftestpcmt.dotya.ml}),
the results of which can be seen in Figure~\ref{fig:secheaders}.
It can be seen that the application sets the \texttt{Cross Origin Opener
Policy} to \texttt{same-origin}, which isolates the browsing context
exclusively to \textit{same-origin} documents, preventing \textit{cross-origin}
documents from loading in the same browser context.
It shows that the application sets the \texttt{Cross Origin Opener Policy} to
\texttt{same-origin}, which isolates the browsing context exclusively to
\textit{same-origin} documents, preventing \textit{cross-origin} documents from
loading in the same browser context.
\obr{Security Headers scan}{fig:secheaders}{.89}{graphics/screen-securityHeaders}
@ -1348,11 +1424,11 @@ exception is the \texttt{image-src 'self' https://*} directive, which more
leniently also permits images from any \textit{secure} sources. This measure
ensures that no unvetted content is ever loaded from elsewhere.
Also, the \texttt{Referrer-Policy} header setting of \texttt{no-referrer,
The \texttt{Referrer-Policy} header setting of \texttt{no-referrer,
strict-origin-when-cross-origin} ensures that user tracking is reduced, since
no referrer is included (the \texttt{Referer} header is omitted), should the
user navigate away from the site or somehow send requests outside the
application using another means. The \texttt{Permissions-Policy} set to
no referrer is included (the \texttt{Referer} header is omitted) when the user
navigatse away from the site or somehow send requests outside the application
using other means. The \texttt{Permissions-Policy} set to
\texttt{geolocation=(), midi=(), sync-xhr=(), microphone=(), camera=(),
gyroscope=(), magnetometer=(), fullscreen=(self), payment=()} declares that the
application is, for instance, never going to request access to payment
@ -1367,16 +1443,21 @@ application misconfigurations. The wordlists used include:
\item Sam's \texttt{samlists} (\url{https://github.com/the-xentropy/samlists})
\end{itemize}
The logs of the fuzzing operations are enclosed as
Appendix~\ref{appendix:gobusterFuzzingLogs}.
Many requests yielded 404s for non-existent pages, or possibly pages requiring
authentication (\emph{NotFound} is used so as not to disclose page's
existence). The program initially also issued quite a few 503s as a result of
rate-limiting, until \texttt{gobuster} was tamed using the \texttt{--delay}
parameter. Anti-CSRF measures employed by the program caused most of the
requests to yield 400s (missing CSRF token), or 403s with a CSRF token.
% A Burp test would perhaps be more telling.
The results of scanning the deployed application using Quallys' \textit{SSL
Labs} scanner can be seen in the picture~\ref{fig:ssllabs}, confirming that
HSTS is deployed (including for the subdomains), the server supports TLS 1.3,
the DNS Certificate Authority Authorisation (CAA) has been configured for the
domain, with the overall grade being A+.
The deployed application was scanned with Quallys' \textit{SSL Labs} scanner
and the results can be seen in Figure~\ref{fig:ssllabs}, confirming that HSTS
(includes subdomains) is deployed, the server runs TLS 1.3, the DNS Certificate
Authority Authorisation (CAA) is configured for the domain, with the overall
grade being A+.
\obr{Quallys SSL Labs}{fig:ssllabs}{.79}{graphics/screen-sslLabs}
\obr{Quallys SSL Labs scan}{fig:ssllabs}{.75}{graphics/screen-sslLabs}
@ -1495,10 +1576,16 @@ After successful user deletion, the administrator is redirected back to user
management page and a flash message confirming the deletion is printed near the
top of the page, as shown in Figure~\ref{fig:adminUserDeletePostHoc}.
\obr{Log-out message}
{fig:logout}{.20}
{graphics/screen-logout}
Figure~\ref{fig:logout} shows the message printed to users on logout.
\newpage
\obr{Manage API keys}
{fig:manageAPIKeys}{.70}
{fig:manageAPIKeys}{.65}
{graphics/screen-manageAPIKeys}
Figure~\ref{fig:manageAPIKeys} shows a page that allows administrators to
@ -1507,14 +1594,8 @@ Pwned?} or \emph{DeHashed.com}. Do note that these keys are never distributed
to clients in any way and are only ever used by the application itself to make
the requests on \emph{behalf} of the users.
\obr{Log-out message}
{fig:logout}{.27}
{graphics/screen-logout}
Figure~\ref{fig:logout} shows the message printed to users on logout.
\obr{Import of locally available breach data from the CLI}
{fig:localImport}{.75}
{fig:localImport}{.99}
{graphics/screen-localImport}
Figure~\ref{fig:localImport} depicts how formatted breach data can be imported