theor.: add text, refs, fix typos, grammar; reword
This commit is contained in:
parent
5e6dacaf59
commit
daa4f58489
@ -34,11 +34,11 @@ Hash functions are algorithms used to help with a number of things: integrity
|
||||
verification, password protection, digital signature, public-key encryption and
|
||||
others. Hashes are used in forensic analysis to prove authenticity of digital
|
||||
artifacts, to uniquely identify a change-set within revision-based source code
|
||||
management systems such as Git, Subversion or Mercurial, to detect
|
||||
known-malicious software by anti-virus programs or by advanced filesystems in
|
||||
order to verify block integrity and enable repairs, and also in many other
|
||||
applications that each person using a modern computing device has come across,
|
||||
such as when connecting to a website protected by the famed HTTPS.
|
||||
management systems such as Git or Mercurial, to detect known-malicious software
|
||||
by anti-virus programs or by advanced filesystems in order to verify block
|
||||
integrity and enable repairs, and also in many other applications that each
|
||||
person using a modern computing device has come across, such as when connecting
|
||||
to a website protected by the famed HTTPS.
|
||||
|
||||
The popularity of hash functions stems from a common use case: the need to
|
||||
simplify reliably identifying a chunk of data. Of course, two chunks of data,
|
||||
@ -74,45 +74,50 @@ for the wrong job can potentially result in a security breach.
|
||||
|
||||
As an example, suppose \texttt{MD5}, a popular hash function internally using
|
||||
the same data structure - \emph{Merkle-Damgård} construction - as
|
||||
\texttt{BLAKE3}. While the former produces 128 bit digests, the latter by
|
||||
default outputs 256 bit digest with no upper limit (Merkle tree extensibility).
|
||||
|
||||
There is a list of differences that could further be mentioned, however, they
|
||||
both have one thing in common: they are \emph{designed} to be \emph{fast}. The
|
||||
latter, as a cryptographic hash function, is conjectured to be \emph{random
|
||||
oracle indifferentiable}, secure against length extension, but it is also in
|
||||
fact faster than all of \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and
|
||||
even \texttt{Blake2} family of functions.
|
||||
\texttt{BLAKE3}. The former produces 128 bit digests, compared to the default
|
||||
256 bits of output and no upper ($<2^{64}$ bytes) limit (Merkle tree
|
||||
extensibility) for the latter. There is a list of differences that could
|
||||
further be mentioned, however, they both have one thing in common: they are
|
||||
\emph{designed} to be \emph{fast}. The latter, as a cryptographic hash
|
||||
function, is conjectured to be \emph{random oracle indifferentiable}, secure
|
||||
against length extension, but it is also in fact faster than all of
|
||||
\texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and even \texttt{Blake2} family
|
||||
of functions~\cite{blake3}.
|
||||
|
||||
The use case of both is to (quickly) verify integrity of a given chunk of data,
|
||||
in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not
|
||||
to secure a password by hashing it first, which poses a big issue when used
|
||||
to...secure passwords by hashing them first.
|
||||
|
||||
A password hash function, such as \texttt{argon2} or \texttt{bcrypt} are good
|
||||
choices for securely storing hashed passwords, namely because they place CPU
|
||||
and memory burden on the machine that is computing the digest, in case of the
|
||||
mentioned functions the \emph{hardness} is even configurable to satisfy the
|
||||
most possible scenarios. They also forcefully limit potential parallelism, thus
|
||||
restricting the scale at which an exhaustive search could be launched.
|
||||
Additionally, both functions can automatically \emph{salt} the passwords before
|
||||
hashing them, automatically ensuring that two exact same passwords of two
|
||||
different users will not end up hashing to the same digest value. That makes it
|
||||
much harder to recover the original, supposedly weak user-provided password.
|
||||
Password hashing functions such as \texttt{argon2} or \texttt{bcrypt} are good
|
||||
choices for \emph{securely} storing hashed passwords, namely because they place
|
||||
CPU and memory burden on the machine that is computing the digest. In case of
|
||||
the mentioned functions, \emph{hardness} is even configurable to satisfy the
|
||||
greatest possible array of scenarios. These functions also forcefully limit
|
||||
potential parallelism, thereby restricting the scale at which exhaustive
|
||||
searches performed using tools like \texttt{Hashcat} or \texttt{John the
|
||||
Ripper} could be at all feasible, practically obviating old-school hash
|
||||
cracking~\cite{hashcracking},~\cite{hashcracking2}. Additionally, both
|
||||
functions can automatically add random \emph{salt} to passwords, automatically
|
||||
ensuring that no copies of the same password provided by different users will
|
||||
end up hashing to the same digest value.
|
||||
|
||||
|
||||
\n{3}{Why are hashes interesting}
|
||||
|
||||
As already mentioned, since hashes are often used to store the representation
|
||||
of the password instead of the password itself, which is where the allure comes
|
||||
from, especially services storing hashed user passwords happen to
|
||||
non-voluntarily leak them. Should wrong type of hash be used for password
|
||||
hashing or weak parameters be set or the hash function be simply used
|
||||
improperly, it sparks even more interest.
|
||||
As already hinted, hashes are often used to store a \emph{logical proof of the
|
||||
password}, rather than the password itself. Especially services storing hashed
|
||||
user passwords happen to non-voluntarily leak them. Using a wrong type of hash
|
||||
for password hashing, weak hash function parameters, reusing \emph{salt} or the
|
||||
inadvertently \emph{misusing} the hash function in some other way, is a sure
|
||||
way to spark a lot of
|
||||
interest~\cite{megatron},~\cite{linkedin1},~\cite{linkedin2}.
|
||||
|
||||
Historically, there have also been enough instances of leaked raw passwords
|
||||
that anyone with enough interest had more than enough time to additionally put
|
||||
together a neat list of hashes of the most commonly used passwords.
|
||||
Historically, plain-text passwords have also leaked enough times (or weak
|
||||
hashes have been cracked) that anyone with enough interest had more than
|
||||
sufficient amount of time to additionally put together neat lists of hashes of
|
||||
the most commonly used
|
||||
passwords~\cite{rockyou},~\cite{plaintextpasswds1},~\cite{plaintextpasswds2},~\cite{plaitextpasswds3}.
|
||||
|
||||
So while a service might not be storing passwords in \emph{plain text}, which
|
||||
is a good practice, using a hashing function not designed to protect passwords
|
||||
@ -150,7 +155,7 @@ ones deemed secure enough, which is why it is no longer needed to manually
|
||||
specify what cipher suite should be used (or rely on the client/server to
|
||||
choose wisely). While possibly facing compatibility issues with legacy devices,
|
||||
the simplicity brought by enabling TLSv1.3 might be considered a worthy
|
||||
trade-off.
|
||||
trade-off~\cite{tls13rfc8446}.
|
||||
|
||||
|
||||
\n{1}{Passwords}\label{sec:passwords}
|
||||
@ -166,7 +171,7 @@ During the World War II.\ the US paratroopers' use of passwords has evolved to
|
||||
even include a counter-password.
|
||||
|
||||
According to McMillan, the first \textit{computer} passwords date back to
|
||||
mid-1960s' Massachusetts Institute of Technology (MIT), when researchers at the
|
||||
mid-1960s Massachusetts Institute of Technology (MIT), when researchers at the
|
||||
university built a massive time-sharing computer called CTSS. Apparently,
|
||||
\textit{even then} the passwords did not protect the users as well as they were
|
||||
expected to~\cite{mcmillan}.
|
||||
@ -249,10 +254,10 @@ passwords}{fig:forbiddencharacters}{.8}{graphics/forbiddencharacters.jpg}
|
||||
|
||||
Note that ``Passw0rd!'' would have been a perfectly acceptable password for the
|
||||
validator displayed in
|
||||
Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's
|
||||
recommendations on this matter are that all printing ASCII~\cite{asciirfc20}
|
||||
characters as well as the space character SHOULD be acceptable in memorized
|
||||
secrets and Unicode~\cite{iso10646} characters SHOULD be accepted as well.
|
||||
Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's
|
||||
recommendations on this matter are that all printing ASCII characters as well
|
||||
as the space character SHOULD be acceptable in memorized secrets, and Unicode
|
||||
characters SHOULD be accepted as well~\cite{asciirfc20},~\cite{iso10646}.
|
||||
|
||||
\n{3}{Character composition requirements}
|
||||
|
||||
@ -261,13 +266,17 @@ composition requirements in place, too. The reality is that instead of
|
||||
creating strong passwords directly, most users first try a basic version and
|
||||
then keep tweaking characters until the password ends up fulfilling the minimum
|
||||
requirement.
|
||||
The \emph{problem} with that is that it has been shown, that people use similar
|
||||
patterns, i.e. starting with capital letters, putting a symbol last and a
|
||||
number in the last two positions. This is also known to cyber criminals
|
||||
cracking passwords and they run their dictionary attacks using the common
|
||||
substitutions, such as "\$" for "s", "E" for "3", "1" for "l", "@" for "a" etc.
|
||||
The password created in this manner will almost certainly be bad so all that is
|
||||
achieved is frustrating the user in order to still arrive at a bad password.
|
||||
|
||||
The \emph{problem} with it is that it has been shown, that people use similar
|
||||
patterns, i.e.\ starting with capital letters, putting a symbol last and a
|
||||
number in the last two positions. This is also known to people cracking the
|
||||
password hashes and they run their dictionary attacks using the common
|
||||
substitutions, such as ``\$'' for ``s'', ``E'' for ``3'', ``1'' for ``l'',
|
||||
``@'' for ``a''
|
||||
etc.~\cite{megatron},~\cite{hashcracking},~\cite{hashcracking2}. It is safe to
|
||||
expect that the password created in this manner will almost certainly be bad,
|
||||
and the only achievement was to frustrate the user in order to still arrive at
|
||||
a bad password.
|
||||
|
||||
\n{3}{Other common issues}
|
||||
|
||||
@ -276,11 +285,12 @@ using JavaScript), thereby essentially breaking the password manager
|
||||
functionality, which is an issue because it encourages bad password practices
|
||||
such as weak passwords and likewise, password reuse.
|
||||
|
||||
Another frequent issue is forced frequent password rotation. Making frequent
|
||||
password rotations mandatory contributes to users developing a password
|
||||
creation pattern and is further a modern-day security anti-pattern and
|
||||
according to the British NCSC the practice ``carries no real benefits as stolen
|
||||
passwords are generally exploited immediately''~\cite{ncsc}.
|
||||
Forced frequent password rotation is another common issue. Apparently, making
|
||||
frequent password rotations mandatory contributes to users developing a
|
||||
password creation \emph{patterns}. Moreover, according to the British NCSC, the
|
||||
subject practice ``carries no real benefits as stolen passwords are generally
|
||||
exploited immediately'', and the organisation calls it a modern-day security
|
||||
anti-pattern~\cite{ncsc}.
|
||||
|
||||
|
||||
\n{1}{Web security}\label{sec:websecurity}
|
||||
@ -314,8 +324,8 @@ it should find it more difficult to steal data from other
|
||||
sites~\cite{siteisolation}.
|
||||
|
||||
Firefox calls their version of \emph{Site Isolation}-like functionality Project
|
||||
Fission~\cite{projectfission} but the two are very similar, both in internal
|
||||
architecture and what they try to achieve. Elements of the web page are scanned
|
||||
Fission, but the two are very similar, both in internal architecture and what
|
||||
they try to achieve~\cite{projectfission}. Elements of the web page are scanned
|
||||
to decide whether they are allowed according to \emph{same-site} restrictions
|
||||
and allocated shared or isolated memory based on the result.
|
||||
|
||||
@ -326,15 +336,15 @@ features, unbeknownst to them.
|
||||
|
||||
\n{2}{Cross-site scripting}\label{sec:xss}
|
||||
|
||||
As per OWASP Top Ten list~\cite{owasptop10}, injection is the third most
|
||||
observed issue across millions of websites. Cross-site scripting is a type of
|
||||
attack in which malicious code, such as infected scripts are injected into a
|
||||
website that would otherwise be trusted. Since the misconfiguration or a flaw
|
||||
of the application allowed this, the browser of the victim that trusts the
|
||||
website simply executes the code provided by the attacker. This code thus gains
|
||||
access to session tokens and any cookies associated with the website's origin,
|
||||
apart from being able to rewrite the HTML content. The results of XSS can range
|
||||
from account compromise to identity theft.
|
||||
As per OWASP Top Ten list, injection is the third most observed issue across
|
||||
millions of websites. Cross-site scripting is a type of attack in which
|
||||
malicious code, such as infected scripts, is injected into a website that would
|
||||
otherwise be trusted. Since the misconfiguration or a flaw of the application
|
||||
allowed this, the browser of the victim that trusts the website simply executes
|
||||
the code provided by the attacker. This code thus gains access to session
|
||||
tokens and any cookies associated with the website's origin, apart from being
|
||||
able to rewrite the HTML content. The results of XSS can range from account
|
||||
compromise to identity theft~\cite{owasptop10}.
|
||||
|
||||
Solutions deployed against XSS vary. On the client side, it mainly comes down
|
||||
to good browser patching hygiene, browser features such as Site Isolation (see
|
||||
@ -348,10 +358,10 @@ On the server side though, these options (indicating to the browsers \emph{how}
|
||||
the site should be parsed) can directly be manipulated and configured. They
|
||||
should be fine-tuned to fit the needs of each specific website.
|
||||
|
||||
Further, more than 10 years ago now, a new, powerful and comprehensive
|
||||
framework for controlling the admissibility of content has been devised:
|
||||
Content Security Policy. Its capabilities superseded those of the previously
|
||||
mentioned options and it is discussed more in-depth in the following section.
|
||||
Furthermore, a new, powerful and comprehensive framework for controlling the
|
||||
admissibility of content has been devised more than 10 years ago now: Content
|
||||
Security Policy. Its capabilities superseded those of the previously mentioned
|
||||
options, and it is discussed more in-depth in the following section.
|
||||
|
||||
|
||||
\n{2}{Content Security Policy}\label{sec:csp}
|
||||
@ -370,14 +380,14 @@ website operator to decide what client-side resources can load on their website
|
||||
are permitted \emph{sources} of content.
|
||||
|
||||
For example, scripts can be restricted to only load from a list of trusted
|
||||
domains and inline scripts can be blocked completely, which is a huge win
|
||||
domains, and inline scripts can be blocked entirely, which is a huge win
|
||||
against popular XSS techniques.
|
||||
|
||||
Further, scripts and stylesheets can also be allowed based on a cryptographic
|
||||
(SHA256, SHA384 or SHA512) hash of their content, which should be a known
|
||||
information to legitimate website operators prior to or at the time scripts are
|
||||
served, making sure no unauthorised script or stylesheet will ever be run on
|
||||
user's computer (running a compliant browser).
|
||||
Not only that, scripts and stylesheets can also be allowed based on a
|
||||
cryptographic (SHA256, SHA384 or SHA512) hash of their content, which should be
|
||||
a known information to legitimate website operators prior to or at the time
|
||||
scripts are served, making sure no unauthorised script or stylesheet will ever
|
||||
be run on user's computer (running a compliant browser).
|
||||
|
||||
A policy of CSPv3, which is the current iteration of the concept, can be served
|
||||
either as a header or inside website's \texttt{<meta>} tag. Configuration is
|
||||
@ -415,8 +425,6 @@ There are many more directives and settings than mentioned in this section, the
|
||||
author encourages anybody interested to give it a read, e.g.\ at
|
||||
\url{https://web.dev/csp/}.
|
||||
|
||||
\textbf{TODO}: add more concrete examples.
|
||||
|
||||
|
||||
\n{1}{Configuration}
|
||||
|
||||
@ -426,7 +434,7 @@ tweak/manage its behaviour, and these changes are usually persisted
|
||||
\emph{LocalStorage} key-value store in the browser, a binary or plain text
|
||||
configuration file. These configuration files need to be read and checked at
|
||||
least on program start-up and either stored into operating memory for the
|
||||
duration of the runtime of the program, or loaded and parsed and the memory
|
||||
duration of the runtime of the program, or loaded and parsed, and the memory
|
||||
subsequently \emph{freed} (initial configuration).
|
||||
|
||||
There is an abundance of configuration languages (or file formats used to craft
|
||||
@ -437,22 +445,22 @@ Dhall stood out as a language that was designed with both security and the
|
||||
needs of dynamic configuration scenarios in mind, borrowing a concept or two
|
||||
from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a
|
||||
few of its concepts from Haskell), and in its apparent core being very similar
|
||||
to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it
|
||||
is: ``a programmable configuration language that you can think of as: JSON +
|
||||
to JSON, which adds to a familiar feel. In fact, in Dhall's authors' own words
|
||||
it is: ``a programmable configuration language that you can think of as: JSON +
|
||||
functions + types + imports''~\cite{dhalllang}.
|
||||
|
||||
Among all of the listed features, the especially intriguing one to the author
|
||||
was the promise of \emph{types}. There are multiple examples directly on the
|
||||
Among all the listed features, the especially intriguing one to the author was
|
||||
the promise of \emph{types}. There are multiple examples directly on the
|
||||
project's documentation webpage demonstrating for instance the declaration and
|
||||
usage of custom types (that are, of course merely combinations of the primitive
|
||||
types that the language provides, such as \emph{Bool}, \emph{Natural} or
|
||||
\emph{List}, to name just a few), so it was not exceedingly hard to start
|
||||
designing a custom configuration \emph{schema} for the program.
|
||||
Dhall not being a Turing-complete language also guarantees that evaluation
|
||||
\emph{always} terminates eventually, which is a good attribute to possess as a
|
||||
configuration language.
|
||||
usage of custom types (that are, of course, merely combinations of the
|
||||
primitive types that the language provides, such as \emph{Bool}, \emph{Natural}
|
||||
or \emph{List}, to name just a few), so it was not exceedingly hard to start
|
||||
designing a custom configuration \emph{schema} for the program. Dhall, not
|
||||
being a Turing-complete language, also guarantees that evaluation \emph{always}
|
||||
terminates eventually, which is a good attribute to possess for a configuration
|
||||
language.
|
||||
|
||||
\n{3}{Safety considerations}
|
||||
\n{2}{Safety considerations}
|
||||
|
||||
Having a programmable configuration language that understands functions and
|
||||
allows importing not only arbitrary text from random internet URLs, but also
|
||||
@ -462,14 +470,14 @@ relied on by the user. Dhall offers this in multiple features: enforcing a
|
||||
same-origin policy and (optionally) pinning a cryptographic hash of the value
|
||||
of the expression being imported.
|
||||
|
||||
\n{3}{Possible alternatives}
|
||||
\n{2}{Possible alternatives}
|
||||
|
||||
While developing the program, the author has also
|
||||
come across certain shortcomings of Dhall, namely long start-up with \emph{cold
|
||||
cache}, which can generally be observed in the scenario of running the program
|
||||
in an environment that does not allow to write the cache files (a read-only
|
||||
filesystem), of does not keep the written cache files, such as a container that
|
||||
is not configured to mount a persistent volume at the pertinent location.
|
||||
While developing the program, the author has also come across certain
|
||||
shortcomings of Dhall, namely the long start-up on \emph{cold cache}. It can
|
||||
generally be observed when running the program in an environment that does not
|
||||
allow persistently writing the cache files (a read-only filesystem), or does
|
||||
not keep the written cache files, such as a container that is not configured to
|
||||
mount persistent volumes to pertinent locations.
|
||||
|
||||
To describe the way Dhall works when performing an evaluation, it resolves
|
||||
every expression down to a combination of its most basic types (eliminating all
|
||||
@ -480,7 +488,7 @@ variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base
|
||||
Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the
|
||||
results of the normalisation will be written for repeated use. Do note that
|
||||
this behaviour has been observed on a GNU/Linux host and the author has not
|
||||
verified this behaviour on a non-GNU/Linux host, such as FreeBSD.
|
||||
verified this behaviour on another platforms, such as FreeBSD.
|
||||
|
||||
If normalisation is performed inside an ephemeral container (as opposed to, for
|
||||
instance, an interactive desktop session), the results effectively get lost on
|
||||
@ -490,15 +498,15 @@ internally branches widely) can take an upwards of two minutes, during which
|
||||
the user is left waiting for the hanging application with no reporting on the
|
||||
progress or current status.
|
||||
|
||||
While workarounds for the above mentioned problem can be devised relatively
|
||||
easily (such as bind mounting \emph{persistent} volumes inside containers
|
||||
to\texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and
|
||||
\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} in order to preserve cache
|
||||
between restarts, or let the cache be pre-computed during container build,
|
||||
since the application is only really expected to run together with a compatible
|
||||
version of the configuration schema and this version \emph{is} known at
|
||||
container build time), it would certainly feel better if there was no need to
|
||||
work \emph{around} the configuration system of choice.
|
||||
Workarounds for the above-mentioned problem can be devised relatively easily,
|
||||
but it would certainly \emph{feel} better if there was no need to work
|
||||
\emph{around} the configuration system of choice. For instance, bind mounting
|
||||
\emph{persistent} volumes to pertinent locations inside the container
|
||||
(\texttt{\$\{XDG\_CACHE\_HOME\}/\{dhall,dhall-haskell\}}) would preserve cache
|
||||
between restarts. Alternatively, the cache could be pre-computed on container
|
||||
build (as the program is only expected to run with a compatible schema version,
|
||||
and that version \emph{is} known at container build time for the supplied
|
||||
configuration).
|
||||
|
||||
Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely
|
||||
as an almost drop-in replacement for Dhall feature-wise, while also resolving
|
||||
@ -506,7 +514,7 @@ the costly \emph{cold cache} normalisation operations, which is in author's
|
||||
view Dhall's titular flaw. In a slightly contrasting approach, another emerging
|
||||
project called \texttt{TySON} (\url{https://github.com/jetpack-io/tyson}),
|
||||
which uses \emph{a subset} of TypeScript to also create a programmable,
|
||||
strictly typed configuration language, opted to take a well known language
|
||||
strictly typed configuration language, opted to take a well-known language
|
||||
instead of reinventing the wheel, while still being able to retain feature
|
||||
parity with Dhall.
|
||||
|
||||
@ -514,7 +522,7 @@ parity with Dhall.
|
||||
\n{1}{Compromise Monitoring}
|
||||
|
||||
There are, of course, several ways one could approach monitoring of compromised
|
||||
of credentials, some more \emph{manual} in nature than others. When using a
|
||||
credentials, some more \emph{manual} in nature than others. When using a
|
||||
service that is suspected/expected to be breached in the future, one can always
|
||||
create a unique username/password combination specifically for the subject
|
||||
service and never use that combination anywhere else. That way, if the
|
||||
@ -533,8 +541,8 @@ hand, namely:
|
||||
\item sifting through (possibly) unstructured data by hand
|
||||
\end{itemize}
|
||||
|
||||
Of course, as this is a popular topic for a number of people, the above
|
||||
mentioned work has already been packaged into neat and practical online
|
||||
Of course, as this is a popular topic for a number of people, the
|
||||
above-mentioned work has already been packaged into neat and practical online
|
||||
offerings. In case one decides in favour of using those, an additional range of
|
||||
issues (the previous one still applicable) arises:
|
||||
|
||||
@ -595,10 +603,11 @@ Another source is then simply any locally supplied data, which, of course,
|
||||
could have been obtained from a breach available online beforehand.
|
||||
|
||||
Locally supplied data is specific in that it needs to be formatted in such a
|
||||
way that it can be understood by the application. That is, the data cannot be
|
||||
in its raw form anymore but has to have been morphed into the precise shape the
|
||||
application needs for further processing. Once imported, the application can
|
||||
query the data at will, as it knows exactly the shape of it.
|
||||
way that it is understood by the application. That is, the data supplied for
|
||||
importing cannot be in its original raw form anymore, instead it has to have
|
||||
been morphed into the precise shape the application needs for further
|
||||
processing. Once imported, the application can query the data at will, as it
|
||||
knows exactly the shape of it.
|
||||
|
||||
This supposes the existence of a \emph{format} for importing, the schema of
|
||||
which is devised in Section~\ref{sec:localDatasetPlugin}.
|
||||
@ -650,7 +659,7 @@ morekeywords={any,time}
|
||||
The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in
|
||||
actuality translate to a YAML document written and supplied by an
|
||||
administrative user of the program. And while the author is personally not the
|
||||
greatest supporter of YAML, however, the format was still chosen for several
|
||||
greatest supporter of YAML; however, the format was still chosen for several
|
||||
reasons:
|
||||
|
||||
\begin{itemize}
|
||||
@ -723,30 +732,36 @@ system, different-levels-of-symbolic fees were introduced to obtain the API
|
||||
keys. These Apparently, the top consumers of the API seemed to utilise it
|
||||
orders of magnitude more than the average person, which led Hunt to devising a
|
||||
new, tiered API access system in which the \emph{little guys} would not be
|
||||
subsidising the \emph{big guys}\cite{hibpBillingChanges}. Additionally, the
|
||||
symbolic fee of \$3.50/mo for the entry-level, 10 requests per minute API key
|
||||
was meant to serve as a small barrier for (mis)users with nefarious purposes,
|
||||
but pose practically no obstacle for \emph{legitimate} users, which is entirely
|
||||
reasonable.
|
||||
subsidising the \emph{big guys}. Additionally, the symbolic fee of \$3.50 a
|
||||
month for the entry-level 10 requests-per-minute API key was meant to serve as
|
||||
a small barrier for (mis)users with nefarious purposes, but pose practically no
|
||||
obstacle for \emph{legitimate} users, which is entirely
|
||||
reasonable~\cite{hibpBillingChanges}.
|
||||
|
||||
The application's \texttt{hibp} module and database representation attempts to
|
||||
model the values returned by this API and declare actions to be performed upon
|
||||
the data, which is what facilitates the breach search functionality in the
|
||||
program.
|
||||
The application's \texttt{hibp} module and database representation
|
||||
(\texttt{schema.HIBPSchema}) attempts to model the values returned by this API
|
||||
and declare actions to be performed upon the data, which is what facilitates
|
||||
the breach search functionality in the program.
|
||||
|
||||
The architecture is relatively simple: the application administrator configures
|
||||
an API key for the HIBP service via the management interface, the user enters
|
||||
the query parameters and the application then constructs the API call that is
|
||||
sent to the API, awaiting the response. As the API is rate-limited
|
||||
(individually, based on the API key supplied), this \emph{could} pose an issue
|
||||
at high utilisation times, and thus needs to be handled in the backend as well
|
||||
as in the UI.
|
||||
The architecture is relatively simple. Breach data, including title, date,
|
||||
description and tags are cached by the application on start-up, as this API is
|
||||
not authenticated. In order for the authenticated API to be called, the
|
||||
application administrator first needs to configure an API key for the HIBP
|
||||
service via the management interface. The user can then enter the desired query
|
||||
parameters and the application then constructs the API call that is sent to the
|
||||
authenticated API, and awaits the response. As the API is rate-limited
|
||||
(individually, based on the API key supplied), sending requests directly after
|
||||
receiving them from the users would likely pose an issue at high utilisation
|
||||
times, and would result in the application ending up unnecessarily throttled.
|
||||
Request sending thus needs to be handled in the backend by a requests
|
||||
scheduler, as well as appropriately in the UI.
|
||||
|
||||
After a response from the API server arrives, the application parses the
|
||||
returned data and attempts to \emph{bind} it to the pre-programmed \emph{model}
|
||||
for validation. If the data can be successfully validated, it is saved into the
|
||||
database as a cache and the search query is performed on the saved data. The
|
||||
result is then displayed to the user for browsing.
|
||||
After a response from the API server arrives, the application attempts to
|
||||
\emph{bind} the returned data to the pre-programmed \emph{model} for
|
||||
validation, before finally parsing it. If the data can be successfully
|
||||
validated, it is saved into the database as a cache and the search query is
|
||||
performed on the saved data. The result is then displayed to the user for
|
||||
browsing.
|
||||
|
||||
|
||||
\n{1}{Deployment recommendations}\label{sec:deploymentRecommendations}
|
||||
@ -756,18 +771,18 @@ It is, of course, recommended that the application runs in a secure environment
|
||||
who you ask. General recommendations would be either to effectively reserve a
|
||||
machine for a single use case - running this program - so as to dramatically
|
||||
decrease the potential attack surface of the host, or run the program isolated
|
||||
in a container or a virtual machine. Further, if the host does not need
|
||||
in a container or a virtual machine. Furthermore, if the host does not need
|
||||
management access (it is a deployed-to-only machine that is configured
|
||||
out-of-band, such as with a \emph{golden} image/container or declaratively with
|
||||
Nix), then an SSH \emph{daemon} should not be running in it, since it is not
|
||||
needed. In an ideal scenario, the host machine would have as little software
|
||||
installed as possible besides what the application absolutely requires.
|
||||
|
||||
System-wide cryptographic policies should target highest feasible security
|
||||
level, if at all available (such as by default on Fedora or RHEL), covering
|
||||
SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured
|
||||
and SELinux (kernel-level mandatory access control and security policy
|
||||
mechanism) running in \emph{enforcing} mode, if available.
|
||||
System-wide cryptographic policies should target the highest feasible security
|
||||
level, if at all available (as is the case by default on e.g.\ Fedora),
|
||||
covering SSH, DNSSec and TLS protocols. Firewalls should be configured and
|
||||
SELinux (kernel-level mandatory access control and security policy mechanism)
|
||||
running in \emph{enforcing} mode, if available.
|
||||
|
||||
\n{2}{Transport security}
|
||||
|
||||
@ -786,52 +801,67 @@ have it listen behind a TLS-terminating \emph{reverse proxy}.
|
||||
|
||||
\n{2}{Containerisation}
|
||||
|
||||
Whether the pre-built or a custom container image is used to deploy the
|
||||
application, it still needs access to secrets, such as database connection
|
||||
Whether containerised or not, the application needs runtime access to secrets
|
||||
such as cookie encryption and authentication keys, or the database connection
|
||||
string (containing database host, port, user, password/encrypted password,
|
||||
authentication method and database name).
|
||||
authentication method and database name). It is a relatively common practice to
|
||||
deliver secrets to programs in configuration files; however, environment
|
||||
variables should be preferred. The program could go one step further and only
|
||||
accept certain secrets as environment variables.
|
||||
|
||||
The application should be able to handle the most common Postgres
|
||||
authentication methods~\cite{pgauthmethods}, namely \emph{peer},
|
||||
\emph{scram-sha-256}, \emph{user name maps} and raw \emph{password}, although
|
||||
the \emph{password} option should not be used in production, \emph{unless} the
|
||||
connection to the database is protected by TLS.\ In any case, using the
|
||||
\emph{scram-sha-256}~\cite{scramsha256rfc7677} method is preferable. One of the
|
||||
ways to verify in development environment that everything works as intended is
|
||||
the \emph{Password generator for PostgreSQL} tool~\cite{goscramsha256}, which
|
||||
allows retrieving the encrypted string from a raw user input.
|
||||
While it is not impossible to run a process scheduler (such as SystemD) inside
|
||||
a container, containers are well suited for single-program workloads. The fact
|
||||
that the application needs persistent storage also begs the question of
|
||||
\emph{how to run the database in the container?}. Should data be stored inside
|
||||
the ephemeral container, it could end up being very short-lived (wiped on
|
||||
container restart), and barring container root volume snapshotting, it could
|
||||
turn backing up of data into a chore, which are likely not the desired features
|
||||
in this case. Moreover, it is the opinion of the author that multiprocess
|
||||
scheduling would inordinately complicate the container set-up. Instead of
|
||||
running a single program per container, which also provides good amounts of
|
||||
isolation if done properly, running multiple programs in one container would
|
||||
likely do the opposite.
|
||||
|
||||
If the application running in a container wants to use the \emph{peer}
|
||||
authentication method, it is up to the operator to supply the Postgres socket
|
||||
to the application (e.g.\ as a volume bind mount). This scenario was not
|
||||
tested; however, and the author is also not entirely certain how \emph{user
|
||||
namespaces} (on GNU/Linux) would influence the process (as in when the
|
||||
\emph{ID}s of a user \textbf{outside} the container are mapped to a range of
|
||||
\emph{UIDs} \textbf{inside} the container), for which the setup would likely
|
||||
need to account.
|
||||
As per the above, a more \emph{sane} thing to do is to store data externally
|
||||
using a proper persistent storage method, such as a database. With Postgres
|
||||
being the safe bet among database engines, the program should be able to handle
|
||||
Postgres' most common authentication methods, namely \emph{peer},
|
||||
\emph{scram-sha-256} and raw \emph{password}, although the \emph{password}
|
||||
option should not be used in production, \emph{unless} the database connection
|
||||
is protected by TLS~\cite{pgauthmethods}. In any case, using the
|
||||
\emph{scram-sha-256} method is preferable~\cite{scramsha256rfc7677}. One way to
|
||||
verify during development that authentication works as intended is the
|
||||
\emph{Password generator for PostgreSQL} tool, which generates an encrypted
|
||||
string from a raw user input~\cite{goscramsha256}.
|
||||
|
||||
Equally, if the application is running inside the container, the operator needs
|
||||
to make sure that the database is either running in a network that is also
|
||||
directly attached to the container or that there is a mechanism in place that
|
||||
routes the requests for the database hostname to the destination.
|
||||
If the application wants to use the \emph{peer} authentication method, it is up
|
||||
to the operator to supply the Postgres socket to the container (e.g.\ as a
|
||||
volume bind mount). Equally, the operator needs to make sure that the database
|
||||
is either running in a network that is also directly attached to the container
|
||||
or that there is a mechanism in place that routes the requests for the database
|
||||
hostname to the destination, unless a static IP configuration is used, which is
|
||||
also possible.
|
||||
|
||||
One such mechanism is container name based routing inside \emph{pods}
|
||||
(Podman/Kubernetes), where the resolution of container names is the
|
||||
responsibility of a specially configured (often auto-configured) piece of
|
||||
software called Aardvark for the former and CoreDNS for the latter.
|
||||
Practically every container runtime satisfies this use case with a container
|
||||
\emph{name-based routing} mechanism, which inside \emph{pods} (in case of
|
||||
Podman/Kubernetes) or common default networks (that are both NAT-ted \emph{and}
|
||||
routed) enables resolution of container names. This abstraction is a
|
||||
responsibility of specially configured (most often autoconfigured) pieces of
|
||||
software, Aardvark in case of Podman, and CoreDNS for Kubernetes, and it makes
|
||||
using short-lived containers in dynamic networks convenient.
|
||||
|
||||
|
||||
\n{1}{Summary}
|
||||
|
||||
Passwords (and/or passphrases) are in use everywhere and quite probably will be
|
||||
for the foreseeable future. If not as \textit{the} principal way to
|
||||
authenticate, then at least as \textit{a} way to authenticate. As long as
|
||||
passwords are going to be handled and stored by service/application providers,
|
||||
they are going to get leaked, be it due to provider carelessness or the
|
||||
attackers' resolve and wit. Of course, sifting through all the available
|
||||
password breach data by hand is not a reasonable option, and therefore tools
|
||||
providing assistance come in handy. The next part of this diploma thesis will
|
||||
explore that issue and introduce a solution.
|
||||
Passwords (and/or passphrases) are in use everywhere and will quite probably
|
||||
continue to be for the foreseeable future. If not as \textit{the} principal way
|
||||
to authenticate, then at least as \textit{a} way to authenticate. And for as
|
||||
long as passwords are going to be handled and stored, they \emph{are} going to
|
||||
get leaked, be it due to user or provider carelessness, or the attackers'
|
||||
resolve and wit. Of course, sifting through the heaps of available password
|
||||
breach data by hand is not a reasonable option, and therefore tools providing
|
||||
assistance come in handy. The following part of this thesis will explore that
|
||||
issue and suggest a solution.
|
||||
|
||||
|
||||
% =========================================================================== %
|
||||
|
@ -421,4 +421,96 @@ institution = {International Organization for Standardization}
|
||||
note={{Available from: \url{https://www.troyhunt.com/the-have-i-been-pwned-api-now-has-different-rate-limits-and-annual-billing/} [viewed 2023-08-15]}}
|
||||
}
|
||||
|
||||
@misc{blake3,
|
||||
author = {Jack O'Connor and Jean-Philippe Aumasson and Samuel Neves and Zooko Wilcox-O-Hearn},
|
||||
year = 2021,
|
||||
title = {{BLAKE3 - one function, fast everywhere}},
|
||||
subtitle = {{one function, fast everywhere}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf} [viewed 2023-08-14]}}
|
||||
}
|
||||
|
||||
@misc{megatron,
|
||||
author = {m3g9tr0n},
|
||||
year = 2012,
|
||||
publisher ={Thireus},
|
||||
title = {{Cracking Story - How I Cracked Over 122 Million SHA1 and MD5 Hashed Passwords}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://blog.thireus.com/cracking-story-how-i-cracked-over-122-million-sha1-and-md5-hashed-passwords/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{linkedin1,
|
||||
author = {Chris Velazco},
|
||||
year = 2012,
|
||||
monnt = {{June}},
|
||||
title = {{6.5 Million LinkedIn Passwords Reportedly Leaked, LinkedIn Is “Looking Into” It}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://techcrunch.com/2012/06/06/6-5-million-linkedin-passwords-reportedly-leaked-linkedin-is-looking-into-it/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{linkedin2,
|
||||
author = {Sarah Perez},
|
||||
year = 2016,
|
||||
month = may,
|
||||
title = {{117 million LinkedIn emails and passwords from a 2012 hack just got posted online}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://techcrunch.com/2016/05/18/117-million-linkedin-emails-and-passwords-from-a-2012-hack-just-got-posted-online/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{plaintextpasswds1,
|
||||
author = {Dan Goodin},
|
||||
year = 2015,
|
||||
publisher = {ArsTechnica},
|
||||
title = {{13 million plaintext passwords belonging to webhost users leaked online}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://arstechnica.com/information-technology/2015/10/13-million-plaintext-passwords-belonging-to-webhost-users-leaked-online/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{plaintextpasswds2,
|
||||
author = {Forcepoint},
|
||||
year = 2011,
|
||||
month = dec,
|
||||
title = {{Chinese Internet Suffers the Most Serious User Data Leak in History}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://www.forcepoint.com/blog/x-labs/chinese-internet-suffers-most-serious-user-data-leak-history} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{plaintextpasswds3,
|
||||
author = {Dan Goodin},
|
||||
year = 2016,
|
||||
month = sep,
|
||||
title = {{6.6 million plaintext passwords exposed as site gets hacked to the bone}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https:
|
||||
//arstechnica.com/information-technology/2016/09/plaintext-passwords-
|
||||
and-wealth-of-other-data-for-6-6-million-people-go-public/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{rockyou,
|
||||
author = {Imperva},
|
||||
year = 2014,
|
||||
title = {{Consumer Password Worst Practices}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://www.imperva.com/docs/gated/WP_Consumer_Password_Worst_Practices.pdf} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{hashcracking,
|
||||
author = {Dan Goodin},
|
||||
year = 2012,
|
||||
month = aug,
|
||||
publisher = {ArsTechnica},
|
||||
title = {{Why passwords have never been weaker—and crackers have never been stronger}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://arstechnica.com/information-technology/2012/08/passwords-under-assault/} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
@misc{hashcracking2,
|
||||
author = {Per Thorsheim},
|
||||
year = 2012,
|
||||
month = june,
|
||||
title = {{Linkedin Password Infographic}},
|
||||
howpublished = {[online]},
|
||||
note={{Available from: \url{https://securitynirvana.blogspot.com/2012/06/linkedin-password-infographic.html} [viewed 2023-08-13]}}
|
||||
}
|
||||
|
||||
% =========================================================================== %
|
||||
|
Reference in New Issue
Block a user