1
0

theor.: add text, refs, fix typos, grammar; reword

This commit is contained in:
surtur 2023-08-23 00:04:26 +02:00
parent 5e6dacaf59
commit daa4f58489
Signed by: wanderer
SSH Key Fingerprint: SHA256:MdCZyJ2sHLltrLBp0xQO0O1qTW9BT/xl5nXkDvhlMCI
2 changed files with 299 additions and 177 deletions

@ -34,11 +34,11 @@ Hash functions are algorithms used to help with a number of things: integrity
verification, password protection, digital signature, public-key encryption and
others. Hashes are used in forensic analysis to prove authenticity of digital
artifacts, to uniquely identify a change-set within revision-based source code
management systems such as Git, Subversion or Mercurial, to detect
known-malicious software by anti-virus programs or by advanced filesystems in
order to verify block integrity and enable repairs, and also in many other
applications that each person using a modern computing device has come across,
such as when connecting to a website protected by the famed HTTPS.
management systems such as Git or Mercurial, to detect known-malicious software
by anti-virus programs or by advanced filesystems in order to verify block
integrity and enable repairs, and also in many other applications that each
person using a modern computing device has come across, such as when connecting
to a website protected by the famed HTTPS.
The popularity of hash functions stems from a common use case: the need to
simplify reliably identifying a chunk of data. Of course, two chunks of data,
@ -74,45 +74,50 @@ for the wrong job can potentially result in a security breach.
As an example, suppose \texttt{MD5}, a popular hash function internally using
the same data structure - \emph{Merkle-Damgård} construction - as
\texttt{BLAKE3}. While the former produces 128 bit digests, the latter by
default outputs 256 bit digest with no upper limit (Merkle tree extensibility).
There is a list of differences that could further be mentioned, however, they
both have one thing in common: they are \emph{designed} to be \emph{fast}. The
latter, as a cryptographic hash function, is conjectured to be \emph{random
oracle indifferentiable}, secure against length extension, but it is also in
fact faster than all of \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and
even \texttt{Blake2} family of functions.
\texttt{BLAKE3}. The former produces 128 bit digests, compared to the default
256 bits of output and no upper ($<2^{64}$ bytes) limit (Merkle tree
extensibility) for the latter. There is a list of differences that could
further be mentioned, however, they both have one thing in common: they are
\emph{designed} to be \emph{fast}. The latter, as a cryptographic hash
function, is conjectured to be \emph{random oracle indifferentiable}, secure
against length extension, but it is also in fact faster than all of
\texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and even \texttt{Blake2} family
of functions~\cite{blake3}.
The use case of both is to (quickly) verify integrity of a given chunk of data,
in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not
to secure a password by hashing it first, which poses a big issue when used
to...secure passwords by hashing them first.
A password hash function, such as \texttt{argon2} or \texttt{bcrypt} are good
choices for securely storing hashed passwords, namely because they place CPU
and memory burden on the machine that is computing the digest, in case of the
mentioned functions the \emph{hardness} is even configurable to satisfy the
most possible scenarios. They also forcefully limit potential parallelism, thus
restricting the scale at which an exhaustive search could be launched.
Additionally, both functions can automatically \emph{salt} the passwords before
hashing them, automatically ensuring that two exact same passwords of two
different users will not end up hashing to the same digest value. That makes it
much harder to recover the original, supposedly weak user-provided password.
Password hashing functions such as \texttt{argon2} or \texttt{bcrypt} are good
choices for \emph{securely} storing hashed passwords, namely because they place
CPU and memory burden on the machine that is computing the digest. In case of
the mentioned functions, \emph{hardness} is even configurable to satisfy the
greatest possible array of scenarios. These functions also forcefully limit
potential parallelism, thereby restricting the scale at which exhaustive
searches performed using tools like \texttt{Hashcat} or \texttt{John the
Ripper} could be at all feasible, practically obviating old-school hash
cracking~\cite{hashcracking},~\cite{hashcracking2}. Additionally, both
functions can automatically add random \emph{salt} to passwords, automatically
ensuring that no copies of the same password provided by different users will
end up hashing to the same digest value.
\n{3}{Why are hashes interesting}
As already mentioned, since hashes are often used to store the representation
of the password instead of the password itself, which is where the allure comes
from, especially services storing hashed user passwords happen to
non-voluntarily leak them. Should wrong type of hash be used for password
hashing or weak parameters be set or the hash function be simply used
improperly, it sparks even more interest.
As already hinted, hashes are often used to store a \emph{logical proof of the
password}, rather than the password itself. Especially services storing hashed
user passwords happen to non-voluntarily leak them. Using a wrong type of hash
for password hashing, weak hash function parameters, reusing \emph{salt} or the
inadvertently \emph{misusing} the hash function in some other way, is a sure
way to spark a lot of
interest~\cite{megatron},~\cite{linkedin1},~\cite{linkedin2}.
Historically, there have also been enough instances of leaked raw passwords
that anyone with enough interest had more than enough time to additionally put
together a neat list of hashes of the most commonly used passwords.
Historically, plain-text passwords have also leaked enough times (or weak
hashes have been cracked) that anyone with enough interest had more than
sufficient amount of time to additionally put together neat lists of hashes of
the most commonly used
passwords~\cite{rockyou},~\cite{plaintextpasswds1},~\cite{plaintextpasswds2},~\cite{plaitextpasswds3}.
So while a service might not be storing passwords in \emph{plain text}, which
is a good practice, using a hashing function not designed to protect passwords
@ -150,7 +155,7 @@ ones deemed secure enough, which is why it is no longer needed to manually
specify what cipher suite should be used (or rely on the client/server to
choose wisely). While possibly facing compatibility issues with legacy devices,
the simplicity brought by enabling TLSv1.3 might be considered a worthy
trade-off.
trade-off~\cite{tls13rfc8446}.
\n{1}{Passwords}\label{sec:passwords}
@ -166,7 +171,7 @@ During the World War II.\ the US paratroopers' use of passwords has evolved to
even include a counter-password.
According to McMillan, the first \textit{computer} passwords date back to
mid-1960s' Massachusetts Institute of Technology (MIT), when researchers at the
mid-1960s Massachusetts Institute of Technology (MIT), when researchers at the
university built a massive time-sharing computer called CTSS. Apparently,
\textit{even then} the passwords did not protect the users as well as they were
expected to~\cite{mcmillan}.
@ -249,10 +254,10 @@ passwords}{fig:forbiddencharacters}{.8}{graphics/forbiddencharacters.jpg}
Note that ``Passw0rd!'' would have been a perfectly acceptable password for the
validator displayed in
Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's
recommendations on this matter are that all printing ASCII~\cite{asciirfc20}
characters as well as the space character SHOULD be acceptable in memorized
secrets and Unicode~\cite{iso10646} characters SHOULD be accepted as well.
Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's
recommendations on this matter are that all printing ASCII characters as well
as the space character SHOULD be acceptable in memorized secrets, and Unicode
characters SHOULD be accepted as well~\cite{asciirfc20},~\cite{iso10646}.
\n{3}{Character composition requirements}
@ -261,13 +266,17 @@ composition requirements in place, too. The reality is that instead of
creating strong passwords directly, most users first try a basic version and
then keep tweaking characters until the password ends up fulfilling the minimum
requirement.
The \emph{problem} with that is that it has been shown, that people use similar
patterns, i.e. starting with capital letters, putting a symbol last and a
number in the last two positions. This is also known to cyber criminals
cracking passwords and they run their dictionary attacks using the common
substitutions, such as "\$" for "s", "E" for "3", "1" for "l", "@" for "a" etc.
The password created in this manner will almost certainly be bad so all that is
achieved is frustrating the user in order to still arrive at a bad password.
The \emph{problem} with it is that it has been shown, that people use similar
patterns, i.e.\ starting with capital letters, putting a symbol last and a
number in the last two positions. This is also known to people cracking the
password hashes and they run their dictionary attacks using the common
substitutions, such as ``\$'' for ``s'', ``E'' for ``3'', ``1'' for ``l'',
``@'' for ``a''
etc.~\cite{megatron},~\cite{hashcracking},~\cite{hashcracking2}. It is safe to
expect that the password created in this manner will almost certainly be bad,
and the only achievement was to frustrate the user in order to still arrive at
a bad password.
\n{3}{Other common issues}
@ -276,11 +285,12 @@ using JavaScript), thereby essentially breaking the password manager
functionality, which is an issue because it encourages bad password practices
such as weak passwords and likewise, password reuse.
Another frequent issue is forced frequent password rotation. Making frequent
password rotations mandatory contributes to users developing a password
creation pattern and is further a modern-day security anti-pattern and
according to the British NCSC the practice ``carries no real benefits as stolen
passwords are generally exploited immediately''~\cite{ncsc}.
Forced frequent password rotation is another common issue. Apparently, making
frequent password rotations mandatory contributes to users developing a
password creation \emph{patterns}. Moreover, according to the British NCSC, the
subject practice ``carries no real benefits as stolen passwords are generally
exploited immediately'', and the organisation calls it a modern-day security
anti-pattern~\cite{ncsc}.
\n{1}{Web security}\label{sec:websecurity}
@ -314,8 +324,8 @@ it should find it more difficult to steal data from other
sites~\cite{siteisolation}.
Firefox calls their version of \emph{Site Isolation}-like functionality Project
Fission~\cite{projectfission} but the two are very similar, both in internal
architecture and what they try to achieve. Elements of the web page are scanned
Fission, but the two are very similar, both in internal architecture and what
they try to achieve~\cite{projectfission}. Elements of the web page are scanned
to decide whether they are allowed according to \emph{same-site} restrictions
and allocated shared or isolated memory based on the result.
@ -326,15 +336,15 @@ features, unbeknownst to them.
\n{2}{Cross-site scripting}\label{sec:xss}
As per OWASP Top Ten list~\cite{owasptop10}, injection is the third most
observed issue across millions of websites. Cross-site scripting is a type of
attack in which malicious code, such as infected scripts are injected into a
website that would otherwise be trusted. Since the misconfiguration or a flaw
of the application allowed this, the browser of the victim that trusts the
website simply executes the code provided by the attacker. This code thus gains
access to session tokens and any cookies associated with the website's origin,
apart from being able to rewrite the HTML content. The results of XSS can range
from account compromise to identity theft.
As per OWASP Top Ten list, injection is the third most observed issue across
millions of websites. Cross-site scripting is a type of attack in which
malicious code, such as infected scripts, is injected into a website that would
otherwise be trusted. Since the misconfiguration or a flaw of the application
allowed this, the browser of the victim that trusts the website simply executes
the code provided by the attacker. This code thus gains access to session
tokens and any cookies associated with the website's origin, apart from being
able to rewrite the HTML content. The results of XSS can range from account
compromise to identity theft~\cite{owasptop10}.
Solutions deployed against XSS vary. On the client side, it mainly comes down
to good browser patching hygiene, browser features such as Site Isolation (see
@ -348,10 +358,10 @@ On the server side though, these options (indicating to the browsers \emph{how}
the site should be parsed) can directly be manipulated and configured. They
should be fine-tuned to fit the needs of each specific website.
Further, more than 10 years ago now, a new, powerful and comprehensive
framework for controlling the admissibility of content has been devised:
Content Security Policy. Its capabilities superseded those of the previously
mentioned options and it is discussed more in-depth in the following section.
Furthermore, a new, powerful and comprehensive framework for controlling the
admissibility of content has been devised more than 10 years ago now: Content
Security Policy. Its capabilities superseded those of the previously mentioned
options, and it is discussed more in-depth in the following section.
\n{2}{Content Security Policy}\label{sec:csp}
@ -370,14 +380,14 @@ website operator to decide what client-side resources can load on their website
are permitted \emph{sources} of content.
For example, scripts can be restricted to only load from a list of trusted
domains and inline scripts can be blocked completely, which is a huge win
domains, and inline scripts can be blocked entirely, which is a huge win
against popular XSS techniques.
Further, scripts and stylesheets can also be allowed based on a cryptographic
(SHA256, SHA384 or SHA512) hash of their content, which should be a known
information to legitimate website operators prior to or at the time scripts are
served, making sure no unauthorised script or stylesheet will ever be run on
user's computer (running a compliant browser).
Not only that, scripts and stylesheets can also be allowed based on a
cryptographic (SHA256, SHA384 or SHA512) hash of their content, which should be
a known information to legitimate website operators prior to or at the time
scripts are served, making sure no unauthorised script or stylesheet will ever
be run on user's computer (running a compliant browser).
A policy of CSPv3, which is the current iteration of the concept, can be served
either as a header or inside website's \texttt{<meta>} tag. Configuration is
@ -415,8 +425,6 @@ There are many more directives and settings than mentioned in this section, the
author encourages anybody interested to give it a read, e.g.\ at
\url{https://web.dev/csp/}.
\textbf{TODO}: add more concrete examples.
\n{1}{Configuration}
@ -426,7 +434,7 @@ tweak/manage its behaviour, and these changes are usually persisted
\emph{LocalStorage} key-value store in the browser, a binary or plain text
configuration file. These configuration files need to be read and checked at
least on program start-up and either stored into operating memory for the
duration of the runtime of the program, or loaded and parsed and the memory
duration of the runtime of the program, or loaded and parsed, and the memory
subsequently \emph{freed} (initial configuration).
There is an abundance of configuration languages (or file formats used to craft
@ -437,22 +445,22 @@ Dhall stood out as a language that was designed with both security and the
needs of dynamic configuration scenarios in mind, borrowing a concept or two
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
few of its concepts from Haskell), and in its apparent core being very similar
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
is: ``a programmable configuration language that you can think of as: JSON +
to JSON, which adds to a familiar feel. In fact, in Dhall's authors' own words
it is: ``a programmable configuration language that you can think of as: JSON +
functions + types + imports''~\cite{dhalllang}.
Among all of the listed features, the especially intriguing one to the author
was the promise of \emph{types}. There are multiple examples directly on the
Among all the listed features, the especially intriguing one to the author was
the promise of \emph{types}. There are multiple examples directly on the
project's documentation webpage demonstrating for instance the declaration and
usage of custom types (that are, of course merely combinations of the primitive
types that the language provides, such as \emph{Bool}, \emph{Natural} or
\emph{List}, to name just a few), so it was not exceedingly hard to start
designing a custom configuration \emph{schema} for the program.
Dhall not being a Turing-complete language also guarantees that evaluation
\emph{always} terminates eventually, which is a good attribute to possess as a
configuration language.
usage of custom types (that are, of course, merely combinations of the
primitive types that the language provides, such as \emph{Bool}, \emph{Natural}
or \emph{List}, to name just a few), so it was not exceedingly hard to start
designing a custom configuration \emph{schema} for the program. Dhall, not
being a Turing-complete language, also guarantees that evaluation \emph{always}
terminates eventually, which is a good attribute to possess for a configuration
language.
\n{3}{Safety considerations}
\n{2}{Safety considerations}
Having a programmable configuration language that understands functions and
allows importing not only arbitrary text from random internet URLs, but also
@ -462,14 +470,14 @@ relied on by the user. Dhall offers this in multiple features: enforcing a
same-origin policy and (optionally) pinning a cryptographic hash of the value
of the expression being imported.
\n{3}{Possible alternatives}
\n{2}{Possible alternatives}
While developing the program, the author has also
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
cache}, which can generally be observed in the scenario of running the program
in an environment that does not allow to write the cache files (a read-only
filesystem), of does not keep the written cache files, such as a container that
is not configured to mount a persistent volume at the pertinent location.
While developing the program, the author has also come across certain
shortcomings of Dhall, namely the long start-up on \emph{cold cache}. It can
generally be observed when running the program in an environment that does not
allow persistently writing the cache files (a read-only filesystem), or does
not keep the written cache files, such as a container that is not configured to
mount persistent volumes to pertinent locations.
To describe the way Dhall works when performing an evaluation, it resolves
every expression down to a combination of its most basic types (eliminating all
@ -480,7 +488,7 @@ variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
results of the normalisation will be written for repeated use. Do note that
this behaviour has been observed on a GNU/Linux host and the author has not
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
verified this behaviour on another platforms, such as FreeBSD.
If normalisation is performed inside an ephemeral container (as opposed to, for
instance, an interactive desktop session), the results effectively get lost on
@ -490,15 +498,15 @@ internally branches widely) can take an upwards of two minutes, during which
the user is left waiting for the hanging application with no reporting on the
progress or current status.
While workarounds for the above mentioned problem can be devised relatively
easily (such as bind mounting \emph{persistent} volumes inside containers
to\texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} in order to preserve cache
between restarts, or let the cache be pre-computed during container build,
since the application is only really expected to run together with a compatible
version of the configuration schema and this version \emph{is} known at
container build time), it would certainly feel better if there was no need to
work \emph{around} the configuration system of choice.
Workarounds for the above-mentioned problem can be devised relatively easily,
but it would certainly \emph{feel} better if there was no need to work
\emph{around} the configuration system of choice. For instance, bind mounting
\emph{persistent} volumes to pertinent locations inside the container
(\texttt{\$\{XDG\_CACHE\_HOME\}/\{dhall,dhall-haskell\}}) would preserve cache
between restarts. Alternatively, the cache could be pre-computed on container
build (as the program is only expected to run with a compatible schema version,
and that version \emph{is} known at container build time for the supplied
configuration).
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
as an almost drop-in replacement for Dhall feature-wise, while also resolving
@ -506,7 +514,7 @@ the costly \emph{cold cache} normalisation operations, which is in author's
view Dhall's titular flaw. In a slightly contrasting approach, another emerging
project called \texttt{TySON} (\url{https://github.com/jetpack-io/tyson}),
which uses \emph{a subset} of TypeScript to also create a programmable,
strictly typed configuration language, opted to take a well known language
strictly typed configuration language, opted to take a well-known language
instead of reinventing the wheel, while still being able to retain feature
parity with Dhall.
@ -514,7 +522,7 @@ parity with Dhall.
\n{1}{Compromise Monitoring}
There are, of course, several ways one could approach monitoring of compromised
of credentials, some more \emph{manual} in nature than others. When using a
credentials, some more \emph{manual} in nature than others. When using a
service that is suspected/expected to be breached in the future, one can always
create a unique username/password combination specifically for the subject
service and never use that combination anywhere else. That way, if the
@ -533,8 +541,8 @@ hand, namely:
\item sifting through (possibly) unstructured data by hand
\end{itemize}
Of course, as this is a popular topic for a number of people, the above
mentioned work has already been packaged into neat and practical online
Of course, as this is a popular topic for a number of people, the
above-mentioned work has already been packaged into neat and practical online
offerings. In case one decides in favour of using those, an additional range of
issues (the previous one still applicable) arises:
@ -595,10 +603,11 @@ Another source is then simply any locally supplied data, which, of course,
could have been obtained from a breach available online beforehand.
Locally supplied data is specific in that it needs to be formatted in such a
way that it can be understood by the application. That is, the data cannot be
in its raw form anymore but has to have been morphed into the precise shape the
application needs for further processing. Once imported, the application can
query the data at will, as it knows exactly the shape of it.
way that it is understood by the application. That is, the data supplied for
importing cannot be in its original raw form anymore, instead it has to have
been morphed into the precise shape the application needs for further
processing. Once imported, the application can query the data at will, as it
knows exactly the shape of it.
This supposes the existence of a \emph{format} for importing, the schema of
which is devised in Section~\ref{sec:localDatasetPlugin}.
@ -650,7 +659,7 @@ morekeywords={any,time}
The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in
actuality translate to a YAML document written and supplied by an
administrative user of the program. And while the author is personally not the
greatest supporter of YAML, however, the format was still chosen for several
greatest supporter of YAML; however, the format was still chosen for several
reasons:
\begin{itemize}
@ -723,30 +732,36 @@ system, different-levels-of-symbolic fees were introduced to obtain the API
keys. These Apparently, the top consumers of the API seemed to utilise it
orders of magnitude more than the average person, which led Hunt to devising a
new, tiered API access system in which the \emph{little guys} would not be
subsidising the \emph{big guys}\cite{hibpBillingChanges}. Additionally, the
symbolic fee of \$3.50/mo for the entry-level, 10 requests per minute API key
was meant to serve as a small barrier for (mis)users with nefarious purposes,
but pose practically no obstacle for \emph{legitimate} users, which is entirely
reasonable.
subsidising the \emph{big guys}. Additionally, the symbolic fee of \$3.50 a
month for the entry-level 10 requests-per-minute API key was meant to serve as
a small barrier for (mis)users with nefarious purposes, but pose practically no
obstacle for \emph{legitimate} users, which is entirely
reasonable~\cite{hibpBillingChanges}.
The application's \texttt{hibp} module and database representation attempts to
model the values returned by this API and declare actions to be performed upon
the data, which is what facilitates the breach search functionality in the
program.
The application's \texttt{hibp} module and database representation
(\texttt{schema.HIBPSchema}) attempts to model the values returned by this API
and declare actions to be performed upon the data, which is what facilitates
the breach search functionality in the program.
The architecture is relatively simple: the application administrator configures
an API key for the HIBP service via the management interface, the user enters
the query parameters and the application then constructs the API call that is
sent to the API, awaiting the response. As the API is rate-limited
(individually, based on the API key supplied), this \emph{could} pose an issue
at high utilisation times, and thus needs to be handled in the backend as well
as in the UI.
The architecture is relatively simple. Breach data, including title, date,
description and tags are cached by the application on start-up, as this API is
not authenticated. In order for the authenticated API to be called, the
application administrator first needs to configure an API key for the HIBP
service via the management interface. The user can then enter the desired query
parameters and the application then constructs the API call that is sent to the
authenticated API, and awaits the response. As the API is rate-limited
(individually, based on the API key supplied), sending requests directly after
receiving them from the users would likely pose an issue at high utilisation
times, and would result in the application ending up unnecessarily throttled.
Request sending thus needs to be handled in the backend by a requests
scheduler, as well as appropriately in the UI.
After a response from the API server arrives, the application parses the
returned data and attempts to \emph{bind} it to the pre-programmed \emph{model}
for validation. If the data can be successfully validated, it is saved into the
database as a cache and the search query is performed on the saved data. The
result is then displayed to the user for browsing.
After a response from the API server arrives, the application attempts to
\emph{bind} the returned data to the pre-programmed \emph{model} for
validation, before finally parsing it. If the data can be successfully
validated, it is saved into the database as a cache and the search query is
performed on the saved data. The result is then displayed to the user for
browsing.
\n{1}{Deployment recommendations}\label{sec:deploymentRecommendations}
@ -756,18 +771,18 @@ It is, of course, recommended that the application runs in a secure environment
who you ask. General recommendations would be either to effectively reserve a
machine for a single use case - running this program - so as to dramatically
decrease the potential attack surface of the host, or run the program isolated
in a container or a virtual machine. Further, if the host does not need
in a container or a virtual machine. Furthermore, if the host does not need
management access (it is a deployed-to-only machine that is configured
out-of-band, such as with a \emph{golden} image/container or declaratively with
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
needed. In an ideal scenario, the host machine would have as little software
installed as possible besides what the application absolutely requires.
System-wide cryptographic policies should target highest feasible security
level, if at all available (such as by default on Fedora or RHEL), covering
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
and SELinux (kernel-level mandatory access control and security policy
mechanism) running in \emph{enforcing} mode, if available.
System-wide cryptographic policies should target the highest feasible security
level, if at all available (as is the case by default on e.g.\ Fedora),
covering SSH, DNSSec and TLS protocols. Firewalls should be configured and
SELinux (kernel-level mandatory access control and security policy mechanism)
running in \emph{enforcing} mode, if available.
\n{2}{Transport security}
@ -786,52 +801,67 @@ have it listen behind a TLS-terminating \emph{reverse proxy}.
\n{2}{Containerisation}
Whether the pre-built or a custom container image is used to deploy the
application, it still needs access to secrets, such as database connection
Whether containerised or not, the application needs runtime access to secrets
such as cookie encryption and authentication keys, or the database connection
string (containing database host, port, user, password/encrypted password,
authentication method and database name).
authentication method and database name). It is a relatively common practice to
deliver secrets to programs in configuration files; however, environment
variables should be preferred. The program could go one step further and only
accept certain secrets as environment variables.
The application should be able to handle the most common Postgres
authentication methods~\cite{pgauthmethods}, namely \emph{peer},
\emph{scram-sha-256}, \emph{user name maps} and raw \emph{password}, although
the \emph{password} option should not be used in production, \emph{unless} the
connection to the database is protected by TLS.\ In any case, using the
\emph{scram-sha-256}~\cite{scramsha256rfc7677} method is preferable. One of the
ways to verify in development environment that everything works as intended is
the \emph{Password generator for PostgreSQL} tool~\cite{goscramsha256}, which
allows retrieving the encrypted string from a raw user input.
While it is not impossible to run a process scheduler (such as SystemD) inside
a container, containers are well suited for single-program workloads. The fact
that the application needs persistent storage also begs the question of
\emph{how to run the database in the container?}. Should data be stored inside
the ephemeral container, it could end up being very short-lived (wiped on
container restart), and barring container root volume snapshotting, it could
turn backing up of data into a chore, which are likely not the desired features
in this case. Moreover, it is the opinion of the author that multiprocess
scheduling would inordinately complicate the container set-up. Instead of
running a single program per container, which also provides good amounts of
isolation if done properly, running multiple programs in one container would
likely do the opposite.
If the application running in a container wants to use the \emph{peer}
authentication method, it is up to the operator to supply the Postgres socket
to the application (e.g.\ as a volume bind mount). This scenario was not
tested; however, and the author is also not entirely certain how \emph{user
namespaces} (on GNU/Linux) would influence the process (as in when the
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
need to account.
As per the above, a more \emph{sane} thing to do is to store data externally
using a proper persistent storage method, such as a database. With Postgres
being the safe bet among database engines, the program should be able to handle
Postgres' most common authentication methods, namely \emph{peer},
\emph{scram-sha-256} and raw \emph{password}, although the \emph{password}
option should not be used in production, \emph{unless} the database connection
is protected by TLS~\cite{pgauthmethods}. In any case, using the
\emph{scram-sha-256} method is preferable~\cite{scramsha256rfc7677}. One way to
verify during development that authentication works as intended is the
\emph{Password generator for PostgreSQL} tool, which generates an encrypted
string from a raw user input~\cite{goscramsha256}.
Equally, if the application is running inside the container, the operator needs
to make sure that the database is either running in a network that is also
directly attached to the container or that there is a mechanism in place that
routes the requests for the database hostname to the destination.
If the application wants to use the \emph{peer} authentication method, it is up
to the operator to supply the Postgres socket to the container (e.g.\ as a
volume bind mount). Equally, the operator needs to make sure that the database
is either running in a network that is also directly attached to the container
or that there is a mechanism in place that routes the requests for the database
hostname to the destination, unless a static IP configuration is used, which is
also possible.
One such mechanism is container name based routing inside \emph{pods}
(Podman/Kubernetes), where the resolution of container names is the
responsibility of a specially configured (often auto-configured) piece of
software called Aardvark for the former and CoreDNS for the latter.
Practically every container runtime satisfies this use case with a container
\emph{name-based routing} mechanism, which inside \emph{pods} (in case of
Podman/Kubernetes) or common default networks (that are both NAT-ted \emph{and}
routed) enables resolution of container names. This abstraction is a
responsibility of specially configured (most often autoconfigured) pieces of
software, Aardvark in case of Podman, and CoreDNS for Kubernetes, and it makes
using short-lived containers in dynamic networks convenient.
\n{1}{Summary}
Passwords (and/or passphrases) are in use everywhere and quite probably will be
for the foreseeable future. If not as \textit{the} principal way to
authenticate, then at least as \textit{a} way to authenticate. As long as
passwords are going to be handled and stored by service/application providers,
they are going to get leaked, be it due to provider carelessness or the
attackers' resolve and wit. Of course, sifting through all the available
password breach data by hand is not a reasonable option, and therefore tools
providing assistance come in handy. The next part of this diploma thesis will
explore that issue and introduce a solution.
Passwords (and/or passphrases) are in use everywhere and will quite probably
continue to be for the foreseeable future. If not as \textit{the} principal way
to authenticate, then at least as \textit{a} way to authenticate. And for as
long as passwords are going to be handled and stored, they \emph{are} going to
get leaked, be it due to user or provider carelessness, or the attackers'
resolve and wit. Of course, sifting through the heaps of available password
breach data by hand is not a reasonable option, and therefore tools providing
assistance come in handy. The following part of this thesis will explore that
issue and suggest a solution.
% =========================================================================== %

@ -421,4 +421,96 @@ institution = {International Organization for Standardization}
note={{Available from: \url{https://www.troyhunt.com/the-have-i-been-pwned-api-now-has-different-rate-limits-and-annual-billing/} [viewed 2023-08-15]}}
}
@misc{blake3,
author = {Jack O'Connor and Jean-Philippe Aumasson and Samuel Neves and Zooko Wilcox-O-Hearn},
year = 2021,
title = {{BLAKE3 - one function, fast everywhere}},
subtitle = {{one function, fast everywhere}},
howpublished = {[online]},
note={{Available from: \url{https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf} [viewed 2023-08-14]}}
}
@misc{megatron,
author = {m3g9tr0n},
year = 2012,
publisher ={Thireus},
title = {{Cracking Story - How I Cracked Over 122 Million SHA1 and MD5 Hashed Passwords}},
howpublished = {[online]},
note={{Available from: \url{https://blog.thireus.com/cracking-story-how-i-cracked-over-122-million-sha1-and-md5-hashed-passwords/} [viewed 2023-08-13]}}
}
@misc{linkedin1,
author = {Chris Velazco},
year = 2012,
monnt = {{June}},
title = {{6.5 Million LinkedIn Passwords Reportedly Leaked, LinkedIn Is “Looking Into” It}},
howpublished = {[online]},
note={{Available from: \url{https://techcrunch.com/2012/06/06/6-5-million-linkedin-passwords-reportedly-leaked-linkedin-is-looking-into-it/} [viewed 2023-08-13]}}
}
@misc{linkedin2,
author = {Sarah Perez},
year = 2016,
month = may,
title = {{117 million LinkedIn emails and passwords from a 2012 hack just got posted online}},
howpublished = {[online]},
note={{Available from: \url{https://techcrunch.com/2016/05/18/117-million-linkedin-emails-and-passwords-from-a-2012-hack-just-got-posted-online/} [viewed 2023-08-13]}}
}
@misc{plaintextpasswds1,
author = {Dan Goodin},
year = 2015,
publisher = {ArsTechnica},
title = {{13 million plaintext passwords belonging to webhost users leaked online}},
howpublished = {[online]},
note={{Available from: \url{https://arstechnica.com/information-technology/2015/10/13-million-plaintext-passwords-belonging-to-webhost-users-leaked-online/} [viewed 2023-08-13]}}
}
@misc{plaintextpasswds2,
author = {Forcepoint},
year = 2011,
month = dec,
title = {{Chinese Internet Suffers the Most Serious User Data Leak in History}},
howpublished = {[online]},
note={{Available from: \url{https://www.forcepoint.com/blog/x-labs/chinese-internet-suffers-most-serious-user-data-leak-history} [viewed 2023-08-13]}}
}
@misc{plaintextpasswds3,
author = {Dan Goodin},
year = 2016,
month = sep,
title = {{6.6 million plaintext passwords exposed as site gets hacked to the bone}},
howpublished = {[online]},
note={{Available from: \url{https:
//arstechnica.com/information-technology/2016/09/plaintext-passwords-
and-wealth-of-other-data-for-6-6-million-people-go-public/} [viewed 2023-08-13]}}
}
@misc{rockyou,
author = {Imperva},
year = 2014,
title = {{Consumer Password Worst Practices}},
howpublished = {[online]},
note={{Available from: \url{https://www.imperva.com/docs/gated/WP_Consumer_Password_Worst_Practices.pdf} [viewed 2023-08-13]}}
}
@misc{hashcracking,
author = {Dan Goodin},
year = 2012,
month = aug,
publisher = {ArsTechnica},
title = {{Why passwords have never been weaker—and crackers have never been stronger}},
howpublished = {[online]},
note={{Available from: \url{https://arstechnica.com/information-technology/2012/08/passwords-under-assault/} [viewed 2023-08-13]}}
}
@misc{hashcracking2,
author = {Per Thorsheim},
year = 2012,
month = june,
title = {{Linkedin Password Infographic}},
howpublished = {[online]},
note={{Available from: \url{https://securitynirvana.blogspot.com/2012/06/linkedin-password-infographic.html} [viewed 2023-08-13]}}
}
% =========================================================================== %