From daa4f584899b423ce9f5e3a98b7370a5a25f3bac Mon Sep 17 00:00:00 2001 From: surtur Date: Wed, 23 Aug 2023 00:04:26 +0200 Subject: [PATCH] theor.: add text, refs, fix typos, grammar; reword --- tex/part-theoretical.tex | 384 +++++++++++++++++++++------------------ tex/references.bib | 92 ++++++++++ 2 files changed, 299 insertions(+), 177 deletions(-) diff --git a/tex/part-theoretical.tex b/tex/part-theoretical.tex index eb00f71..8b58247 100644 --- a/tex/part-theoretical.tex +++ b/tex/part-theoretical.tex @@ -34,11 +34,11 @@ Hash functions are algorithms used to help with a number of things: integrity verification, password protection, digital signature, public-key encryption and others. Hashes are used in forensic analysis to prove authenticity of digital artifacts, to uniquely identify a change-set within revision-based source code -management systems such as Git, Subversion or Mercurial, to detect -known-malicious software by anti-virus programs or by advanced filesystems in -order to verify block integrity and enable repairs, and also in many other -applications that each person using a modern computing device has come across, -such as when connecting to a website protected by the famed HTTPS. +management systems such as Git or Mercurial, to detect known-malicious software +by anti-virus programs or by advanced filesystems in order to verify block +integrity and enable repairs, and also in many other applications that each +person using a modern computing device has come across, such as when connecting +to a website protected by the famed HTTPS. The popularity of hash functions stems from a common use case: the need to simplify reliably identifying a chunk of data. Of course, two chunks of data, @@ -74,45 +74,50 @@ for the wrong job can potentially result in a security breach. As an example, suppose \texttt{MD5}, a popular hash function internally using the same data structure - \emph{Merkle-Damgård} construction - as -\texttt{BLAKE3}. While the former produces 128 bit digests, the latter by -default outputs 256 bit digest with no upper limit (Merkle tree extensibility). - -There is a list of differences that could further be mentioned, however, they -both have one thing in common: they are \emph{designed} to be \emph{fast}. The -latter, as a cryptographic hash function, is conjectured to be \emph{random -oracle indifferentiable}, secure against length extension, but it is also in -fact faster than all of \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and -even \texttt{Blake2} family of functions. +\texttt{BLAKE3}. The former produces 128 bit digests, compared to the default +256 bits of output and no upper ($<2^{64}$ bytes) limit (Merkle tree +extensibility) for the latter. There is a list of differences that could +further be mentioned, however, they both have one thing in common: they are +\emph{designed} to be \emph{fast}. The latter, as a cryptographic hash +function, is conjectured to be \emph{random oracle indifferentiable}, secure +against length extension, but it is also in fact faster than all of +\texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and even \texttt{Blake2} family +of functions~\cite{blake3}. The use case of both is to (quickly) verify integrity of a given chunk of data, in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not to secure a password by hashing it first, which poses a big issue when used to...secure passwords by hashing them first. -A password hash function, such as \texttt{argon2} or \texttt{bcrypt} are good -choices for securely storing hashed passwords, namely because they place CPU -and memory burden on the machine that is computing the digest, in case of the -mentioned functions the \emph{hardness} is even configurable to satisfy the -most possible scenarios. They also forcefully limit potential parallelism, thus -restricting the scale at which an exhaustive search could be launched. -Additionally, both functions can automatically \emph{salt} the passwords before -hashing them, automatically ensuring that two exact same passwords of two -different users will not end up hashing to the same digest value. That makes it -much harder to recover the original, supposedly weak user-provided password. +Password hashing functions such as \texttt{argon2} or \texttt{bcrypt} are good +choices for \emph{securely} storing hashed passwords, namely because they place +CPU and memory burden on the machine that is computing the digest. In case of +the mentioned functions, \emph{hardness} is even configurable to satisfy the +greatest possible array of scenarios. These functions also forcefully limit +potential parallelism, thereby restricting the scale at which exhaustive +searches performed using tools like \texttt{Hashcat} or \texttt{John the +Ripper} could be at all feasible, practically obviating old-school hash +cracking~\cite{hashcracking},~\cite{hashcracking2}. Additionally, both +functions can automatically add random \emph{salt} to passwords, automatically +ensuring that no copies of the same password provided by different users will +end up hashing to the same digest value. \n{3}{Why are hashes interesting} -As already mentioned, since hashes are often used to store the representation -of the password instead of the password itself, which is where the allure comes -from, especially services storing hashed user passwords happen to -non-voluntarily leak them. Should wrong type of hash be used for password -hashing or weak parameters be set or the hash function be simply used -improperly, it sparks even more interest. +As already hinted, hashes are often used to store a \emph{logical proof of the +password}, rather than the password itself. Especially services storing hashed +user passwords happen to non-voluntarily leak them. Using a wrong type of hash +for password hashing, weak hash function parameters, reusing \emph{salt} or the +inadvertently \emph{misusing} the hash function in some other way, is a sure +way to spark a lot of +interest~\cite{megatron},~\cite{linkedin1},~\cite{linkedin2}. -Historically, there have also been enough instances of leaked raw passwords -that anyone with enough interest had more than enough time to additionally put -together a neat list of hashes of the most commonly used passwords. +Historically, plain-text passwords have also leaked enough times (or weak +hashes have been cracked) that anyone with enough interest had more than +sufficient amount of time to additionally put together neat lists of hashes of +the most commonly used +passwords~\cite{rockyou},~\cite{plaintextpasswds1},~\cite{plaintextpasswds2},~\cite{plaitextpasswds3}. So while a service might not be storing passwords in \emph{plain text}, which is a good practice, using a hashing function not designed to protect passwords @@ -150,7 +155,7 @@ ones deemed secure enough, which is why it is no longer needed to manually specify what cipher suite should be used (or rely on the client/server to choose wisely). While possibly facing compatibility issues with legacy devices, the simplicity brought by enabling TLSv1.3 might be considered a worthy -trade-off. +trade-off~\cite{tls13rfc8446}. \n{1}{Passwords}\label{sec:passwords} @@ -166,7 +171,7 @@ During the World War II.\ the US paratroopers' use of passwords has evolved to even include a counter-password. According to McMillan, the first \textit{computer} passwords date back to -mid-1960s' Massachusetts Institute of Technology (MIT), when researchers at the +mid-1960s Massachusetts Institute of Technology (MIT), when researchers at the university built a massive time-sharing computer called CTSS. Apparently, \textit{even then} the passwords did not protect the users as well as they were expected to~\cite{mcmillan}. @@ -249,10 +254,10 @@ passwords}{fig:forbiddencharacters}{.8}{graphics/forbiddencharacters.jpg} Note that ``Passw0rd!'' would have been a perfectly acceptable password for the validator displayed in -Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's -recommendations on this matter are that all printing ASCII~\cite{asciirfc20} -characters as well as the space character SHOULD be acceptable in memorized -secrets and Unicode~\cite{iso10646} characters SHOULD be accepted as well. +Figure~\ref{fig:forbiddencharacters}~\cite{forbiddencharacters}. NIST's +recommendations on this matter are that all printing ASCII characters as well +as the space character SHOULD be acceptable in memorized secrets, and Unicode +characters SHOULD be accepted as well~\cite{asciirfc20},~\cite{iso10646}. \n{3}{Character composition requirements} @@ -261,13 +266,17 @@ composition requirements in place, too. The reality is that instead of creating strong passwords directly, most users first try a basic version and then keep tweaking characters until the password ends up fulfilling the minimum requirement. -The \emph{problem} with that is that it has been shown, that people use similar -patterns, i.e. starting with capital letters, putting a symbol last and a -number in the last two positions. This is also known to cyber criminals -cracking passwords and they run their dictionary attacks using the common -substitutions, such as "\$" for "s", "E" for "3", "1" for "l", "@" for "a" etc. -The password created in this manner will almost certainly be bad so all that is -achieved is frustrating the user in order to still arrive at a bad password. + +The \emph{problem} with it is that it has been shown, that people use similar +patterns, i.e.\ starting with capital letters, putting a symbol last and a +number in the last two positions. This is also known to people cracking the +password hashes and they run their dictionary attacks using the common +substitutions, such as ``\$'' for ``s'', ``E'' for ``3'', ``1'' for ``l'', +``@'' for ``a'' +etc.~\cite{megatron},~\cite{hashcracking},~\cite{hashcracking2}. It is safe to +expect that the password created in this manner will almost certainly be bad, +and the only achievement was to frustrate the user in order to still arrive at +a bad password. \n{3}{Other common issues} @@ -276,11 +285,12 @@ using JavaScript), thereby essentially breaking the password manager functionality, which is an issue because it encourages bad password practices such as weak passwords and likewise, password reuse. -Another frequent issue is forced frequent password rotation. Making frequent -password rotations mandatory contributes to users developing a password -creation pattern and is further a modern-day security anti-pattern and -according to the British NCSC the practice ``carries no real benefits as stolen -passwords are generally exploited immediately''~\cite{ncsc}. +Forced frequent password rotation is another common issue. Apparently, making +frequent password rotations mandatory contributes to users developing a +password creation \emph{patterns}. Moreover, according to the British NCSC, the +subject practice ``carries no real benefits as stolen passwords are generally +exploited immediately'', and the organisation calls it a modern-day security +anti-pattern~\cite{ncsc}. \n{1}{Web security}\label{sec:websecurity} @@ -314,8 +324,8 @@ it should find it more difficult to steal data from other sites~\cite{siteisolation}. Firefox calls their version of \emph{Site Isolation}-like functionality Project -Fission~\cite{projectfission} but the two are very similar, both in internal -architecture and what they try to achieve. Elements of the web page are scanned +Fission, but the two are very similar, both in internal architecture and what +they try to achieve~\cite{projectfission}. Elements of the web page are scanned to decide whether they are allowed according to \emph{same-site} restrictions and allocated shared or isolated memory based on the result. @@ -326,15 +336,15 @@ features, unbeknownst to them. \n{2}{Cross-site scripting}\label{sec:xss} -As per OWASP Top Ten list~\cite{owasptop10}, injection is the third most -observed issue across millions of websites. Cross-site scripting is a type of -attack in which malicious code, such as infected scripts are injected into a -website that would otherwise be trusted. Since the misconfiguration or a flaw -of the application allowed this, the browser of the victim that trusts the -website simply executes the code provided by the attacker. This code thus gains -access to session tokens and any cookies associated with the website's origin, -apart from being able to rewrite the HTML content. The results of XSS can range -from account compromise to identity theft. +As per OWASP Top Ten list, injection is the third most observed issue across +millions of websites. Cross-site scripting is a type of attack in which +malicious code, such as infected scripts, is injected into a website that would +otherwise be trusted. Since the misconfiguration or a flaw of the application +allowed this, the browser of the victim that trusts the website simply executes +the code provided by the attacker. This code thus gains access to session +tokens and any cookies associated with the website's origin, apart from being +able to rewrite the HTML content. The results of XSS can range from account +compromise to identity theft~\cite{owasptop10}. Solutions deployed against XSS vary. On the client side, it mainly comes down to good browser patching hygiene, browser features such as Site Isolation (see @@ -348,10 +358,10 @@ On the server side though, these options (indicating to the browsers \emph{how} the site should be parsed) can directly be manipulated and configured. They should be fine-tuned to fit the needs of each specific website. -Further, more than 10 years ago now, a new, powerful and comprehensive -framework for controlling the admissibility of content has been devised: -Content Security Policy. Its capabilities superseded those of the previously -mentioned options and it is discussed more in-depth in the following section. +Furthermore, a new, powerful and comprehensive framework for controlling the +admissibility of content has been devised more than 10 years ago now: Content +Security Policy. Its capabilities superseded those of the previously mentioned +options, and it is discussed more in-depth in the following section. \n{2}{Content Security Policy}\label{sec:csp} @@ -370,14 +380,14 @@ website operator to decide what client-side resources can load on their website are permitted \emph{sources} of content. For example, scripts can be restricted to only load from a list of trusted -domains and inline scripts can be blocked completely, which is a huge win +domains, and inline scripts can be blocked entirely, which is a huge win against popular XSS techniques. -Further, scripts and stylesheets can also be allowed based on a cryptographic -(SHA256, SHA384 or SHA512) hash of their content, which should be a known -information to legitimate website operators prior to or at the time scripts are -served, making sure no unauthorised script or stylesheet will ever be run on -user's computer (running a compliant browser). +Not only that, scripts and stylesheets can also be allowed based on a +cryptographic (SHA256, SHA384 or SHA512) hash of their content, which should be +a known information to legitimate website operators prior to or at the time +scripts are served, making sure no unauthorised script or stylesheet will ever +be run on user's computer (running a compliant browser). A policy of CSPv3, which is the current iteration of the concept, can be served either as a header or inside website's \texttt{} tag. Configuration is @@ -415,8 +425,6 @@ There are many more directives and settings than mentioned in this section, the author encourages anybody interested to give it a read, e.g.\ at \url{https://web.dev/csp/}. -\textbf{TODO}: add more concrete examples. - \n{1}{Configuration} @@ -426,7 +434,7 @@ tweak/manage its behaviour, and these changes are usually persisted \emph{LocalStorage} key-value store in the browser, a binary or plain text configuration file. These configuration files need to be read and checked at least on program start-up and either stored into operating memory for the -duration of the runtime of the program, or loaded and parsed and the memory +duration of the runtime of the program, or loaded and parsed, and the memory subsequently \emph{freed} (initial configuration). There is an abundance of configuration languages (or file formats used to craft @@ -437,22 +445,22 @@ Dhall stood out as a language that was designed with both security and the needs of dynamic configuration scenarios in mind, borrowing a concept or two from Nix~\cite{nixoslearn}~\cite{nixlang} (which in turn sources more than a few of its concepts from Haskell), and in its apparent core being very similar -to JSON, which adds to familiar feel. In fact, in Dhall's authors' own words it -is: ``a programmable configuration language that you can think of as: JSON + +to JSON, which adds to a familiar feel. In fact, in Dhall's authors' own words +it is: ``a programmable configuration language that you can think of as: JSON + functions + types + imports''~\cite{dhalllang}. -Among all of the listed features, the especially intriguing one to the author -was the promise of \emph{types}. There are multiple examples directly on the +Among all the listed features, the especially intriguing one to the author was +the promise of \emph{types}. There are multiple examples directly on the project's documentation webpage demonstrating for instance the declaration and -usage of custom types (that are, of course merely combinations of the primitive -types that the language provides, such as \emph{Bool}, \emph{Natural} or -\emph{List}, to name just a few), so it was not exceedingly hard to start -designing a custom configuration \emph{schema} for the program. -Dhall not being a Turing-complete language also guarantees that evaluation -\emph{always} terminates eventually, which is a good attribute to possess as a -configuration language. +usage of custom types (that are, of course, merely combinations of the +primitive types that the language provides, such as \emph{Bool}, \emph{Natural} +or \emph{List}, to name just a few), so it was not exceedingly hard to start +designing a custom configuration \emph{schema} for the program. Dhall, not +being a Turing-complete language, also guarantees that evaluation \emph{always} +terminates eventually, which is a good attribute to possess for a configuration +language. -\n{3}{Safety considerations} +\n{2}{Safety considerations} Having a programmable configuration language that understands functions and allows importing not only arbitrary text from random internet URLs, but also @@ -462,14 +470,14 @@ relied on by the user. Dhall offers this in multiple features: enforcing a same-origin policy and (optionally) pinning a cryptographic hash of the value of the expression being imported. -\n{3}{Possible alternatives} +\n{2}{Possible alternatives} -While developing the program, the author has also -come across certain shortcomings of Dhall, namely long start-up with \emph{cold -cache}, which can generally be observed in the scenario of running the program -in an environment that does not allow to write the cache files (a read-only -filesystem), of does not keep the written cache files, such as a container that -is not configured to mount a persistent volume at the pertinent location. +While developing the program, the author has also come across certain +shortcomings of Dhall, namely the long start-up on \emph{cold cache}. It can +generally be observed when running the program in an environment that does not +allow persistently writing the cache files (a read-only filesystem), or does +not keep the written cache files, such as a container that is not configured to +mount persistent volumes to pertinent locations. To describe the way Dhall works when performing an evaluation, it resolves every expression down to a combination of its most basic types (eliminating all @@ -480,7 +488,7 @@ variable \texttt{\$\{XDG\_CACHE\_HOME\}} (have a look at \emph{XDG Base Directory Spec}~\cite{xdgbasedirspec} for details) to decide \emph{where} the results of the normalisation will be written for repeated use. Do note that this behaviour has been observed on a GNU/Linux host and the author has not -verified this behaviour on a non-GNU/Linux host, such as FreeBSD. +verified this behaviour on another platforms, such as FreeBSD. If normalisation is performed inside an ephemeral container (as opposed to, for instance, an interactive desktop session), the results effectively get lost on @@ -490,15 +498,15 @@ internally branches widely) can take an upwards of two minutes, during which the user is left waiting for the hanging application with no reporting on the progress or current status. -While workarounds for the above mentioned problem can be devised relatively -easily (such as bind mounting \emph{persistent} volumes inside containers -to\texttt{\$\{XDG\_CACHE\_HOME\}/dhall} and -\texttt{\$\{XDG\_CACHE\_HOME\}/dhall-haskell} in order to preserve cache -between restarts, or let the cache be pre-computed during container build, -since the application is only really expected to run together with a compatible -version of the configuration schema and this version \emph{is} known at -container build time), it would certainly feel better if there was no need to -work \emph{around} the configuration system of choice. +Workarounds for the above-mentioned problem can be devised relatively easily, +but it would certainly \emph{feel} better if there was no need to work +\emph{around} the configuration system of choice. For instance, bind mounting +\emph{persistent} volumes to pertinent locations inside the container +(\texttt{\$\{XDG\_CACHE\_HOME\}/\{dhall,dhall-haskell\}}) would preserve cache +between restarts. Alternatively, the cache could be pre-computed on container +build (as the program is only expected to run with a compatible schema version, +and that version \emph{is} known at container build time for the supplied +configuration). Alternatives such as CUE (\url{https://cuelang.org/}) offer themselves nicely as an almost drop-in replacement for Dhall feature-wise, while also resolving @@ -506,7 +514,7 @@ the costly \emph{cold cache} normalisation operations, which is in author's view Dhall's titular flaw. In a slightly contrasting approach, another emerging project called \texttt{TySON} (\url{https://github.com/jetpack-io/tyson}), which uses \emph{a subset} of TypeScript to also create a programmable, -strictly typed configuration language, opted to take a well known language +strictly typed configuration language, opted to take a well-known language instead of reinventing the wheel, while still being able to retain feature parity with Dhall. @@ -514,7 +522,7 @@ parity with Dhall. \n{1}{Compromise Monitoring} There are, of course, several ways one could approach monitoring of compromised -of credentials, some more \emph{manual} in nature than others. When using a +credentials, some more \emph{manual} in nature than others. When using a service that is suspected/expected to be breached in the future, one can always create a unique username/password combination specifically for the subject service and never use that combination anywhere else. That way, if the @@ -533,8 +541,8 @@ hand, namely: \item sifting through (possibly) unstructured data by hand \end{itemize} -Of course, as this is a popular topic for a number of people, the above -mentioned work has already been packaged into neat and practical online +Of course, as this is a popular topic for a number of people, the +above-mentioned work has already been packaged into neat and practical online offerings. In case one decides in favour of using those, an additional range of issues (the previous one still applicable) arises: @@ -595,10 +603,11 @@ Another source is then simply any locally supplied data, which, of course, could have been obtained from a breach available online beforehand. Locally supplied data is specific in that it needs to be formatted in such a -way that it can be understood by the application. That is, the data cannot be -in its raw form anymore but has to have been morphed into the precise shape the -application needs for further processing. Once imported, the application can -query the data at will, as it knows exactly the shape of it. +way that it is understood by the application. That is, the data supplied for +importing cannot be in its original raw form anymore, instead it has to have +been morphed into the precise shape the application needs for further +processing. Once imported, the application can query the data at will, as it +knows exactly the shape of it. This supposes the existence of a \emph{format} for importing, the schema of which is devised in Section~\ref{sec:localDatasetPlugin}. @@ -650,7 +659,7 @@ morekeywords={any,time} The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in actuality translate to a YAML document written and supplied by an administrative user of the program. And while the author is personally not the -greatest supporter of YAML, however, the format was still chosen for several +greatest supporter of YAML; however, the format was still chosen for several reasons: \begin{itemize} @@ -723,30 +732,36 @@ system, different-levels-of-symbolic fees were introduced to obtain the API keys. These Apparently, the top consumers of the API seemed to utilise it orders of magnitude more than the average person, which led Hunt to devising a new, tiered API access system in which the \emph{little guys} would not be -subsidising the \emph{big guys}\cite{hibpBillingChanges}. Additionally, the -symbolic fee of \$3.50/mo for the entry-level, 10 requests per minute API key -was meant to serve as a small barrier for (mis)users with nefarious purposes, -but pose practically no obstacle for \emph{legitimate} users, which is entirely -reasonable. +subsidising the \emph{big guys}. Additionally, the symbolic fee of \$3.50 a +month for the entry-level 10 requests-per-minute API key was meant to serve as +a small barrier for (mis)users with nefarious purposes, but pose practically no +obstacle for \emph{legitimate} users, which is entirely +reasonable~\cite{hibpBillingChanges}. -The application's \texttt{hibp} module and database representation attempts to -model the values returned by this API and declare actions to be performed upon -the data, which is what facilitates the breach search functionality in the -program. +The application's \texttt{hibp} module and database representation +(\texttt{schema.HIBPSchema}) attempts to model the values returned by this API +and declare actions to be performed upon the data, which is what facilitates +the breach search functionality in the program. -The architecture is relatively simple: the application administrator configures -an API key for the HIBP service via the management interface, the user enters -the query parameters and the application then constructs the API call that is -sent to the API, awaiting the response. As the API is rate-limited -(individually, based on the API key supplied), this \emph{could} pose an issue -at high utilisation times, and thus needs to be handled in the backend as well -as in the UI. +The architecture is relatively simple. Breach data, including title, date, +description and tags are cached by the application on start-up, as this API is +not authenticated. In order for the authenticated API to be called, the +application administrator first needs to configure an API key for the HIBP +service via the management interface. The user can then enter the desired query +parameters and the application then constructs the API call that is sent to the +authenticated API, and awaits the response. As the API is rate-limited +(individually, based on the API key supplied), sending requests directly after +receiving them from the users would likely pose an issue at high utilisation +times, and would result in the application ending up unnecessarily throttled. +Request sending thus needs to be handled in the backend by a requests +scheduler, as well as appropriately in the UI. -After a response from the API server arrives, the application parses the -returned data and attempts to \emph{bind} it to the pre-programmed \emph{model} -for validation. If the data can be successfully validated, it is saved into the -database as a cache and the search query is performed on the saved data. The -result is then displayed to the user for browsing. +After a response from the API server arrives, the application attempts to +\emph{bind} the returned data to the pre-programmed \emph{model} for +validation, before finally parsing it. If the data can be successfully +validated, it is saved into the database as a cache and the search query is +performed on the saved data. The result is then displayed to the user for +browsing. \n{1}{Deployment recommendations}\label{sec:deploymentRecommendations} @@ -756,18 +771,18 @@ It is, of course, recommended that the application runs in a secure environment who you ask. General recommendations would be either to effectively reserve a machine for a single use case - running this program - so as to dramatically decrease the potential attack surface of the host, or run the program isolated -in a container or a virtual machine. Further, if the host does not need +in a container or a virtual machine. Furthermore, if the host does not need management access (it is a deployed-to-only machine that is configured out-of-band, such as with a \emph{golden} image/container or declaratively with Nix), then an SSH \emph{daemon} should not be running in it, since it is not needed. In an ideal scenario, the host machine would have as little software installed as possible besides what the application absolutely requires. -System-wide cryptographic policies should target highest feasible security -level, if at all available (such as by default on Fedora or RHEL), covering -SSH, DNSSec, IPsec, Kerberos and TLS protocols. Firewalls should be configured -and SELinux (kernel-level mandatory access control and security policy -mechanism) running in \emph{enforcing} mode, if available. +System-wide cryptographic policies should target the highest feasible security +level, if at all available (as is the case by default on e.g.\ Fedora), +covering SSH, DNSSec and TLS protocols. Firewalls should be configured and +SELinux (kernel-level mandatory access control and security policy mechanism) +running in \emph{enforcing} mode, if available. \n{2}{Transport security} @@ -786,52 +801,67 @@ have it listen behind a TLS-terminating \emph{reverse proxy}. \n{2}{Containerisation} -Whether the pre-built or a custom container image is used to deploy the -application, it still needs access to secrets, such as database connection +Whether containerised or not, the application needs runtime access to secrets +such as cookie encryption and authentication keys, or the database connection string (containing database host, port, user, password/encrypted password, -authentication method and database name). +authentication method and database name). It is a relatively common practice to +deliver secrets to programs in configuration files; however, environment +variables should be preferred. The program could go one step further and only +accept certain secrets as environment variables. -The application should be able to handle the most common Postgres -authentication methods~\cite{pgauthmethods}, namely \emph{peer}, -\emph{scram-sha-256}, \emph{user name maps} and raw \emph{password}, although -the \emph{password} option should not be used in production, \emph{unless} the -connection to the database is protected by TLS.\ In any case, using the -\emph{scram-sha-256}~\cite{scramsha256rfc7677} method is preferable. One of the -ways to verify in development environment that everything works as intended is -the \emph{Password generator for PostgreSQL} tool~\cite{goscramsha256}, which -allows retrieving the encrypted string from a raw user input. +While it is not impossible to run a process scheduler (such as SystemD) inside +a container, containers are well suited for single-program workloads. The fact +that the application needs persistent storage also begs the question of +\emph{how to run the database in the container?}. Should data be stored inside +the ephemeral container, it could end up being very short-lived (wiped on +container restart), and barring container root volume snapshotting, it could +turn backing up of data into a chore, which are likely not the desired features +in this case. Moreover, it is the opinion of the author that multiprocess +scheduling would inordinately complicate the container set-up. Instead of +running a single program per container, which also provides good amounts of +isolation if done properly, running multiple programs in one container would +likely do the opposite. -If the application running in a container wants to use the \emph{peer} -authentication method, it is up to the operator to supply the Postgres socket -to the application (e.g.\ as a volume bind mount). This scenario was not -tested; however, and the author is also not entirely certain how \emph{user -namespaces} (on GNU/Linux) would influence the process (as in when the -\emph{ID}s of a user \textbf{outside} the container are mapped to a range of -\emph{UIDs} \textbf{inside} the container), for which the setup would likely -need to account. +As per the above, a more \emph{sane} thing to do is to store data externally +using a proper persistent storage method, such as a database. With Postgres +being the safe bet among database engines, the program should be able to handle +Postgres' most common authentication methods, namely \emph{peer}, +\emph{scram-sha-256} and raw \emph{password}, although the \emph{password} +option should not be used in production, \emph{unless} the database connection +is protected by TLS~\cite{pgauthmethods}. In any case, using the +\emph{scram-sha-256} method is preferable~\cite{scramsha256rfc7677}. One way to +verify during development that authentication works as intended is the +\emph{Password generator for PostgreSQL} tool, which generates an encrypted +string from a raw user input~\cite{goscramsha256}. -Equally, if the application is running inside the container, the operator needs -to make sure that the database is either running in a network that is also -directly attached to the container or that there is a mechanism in place that -routes the requests for the database hostname to the destination. +If the application wants to use the \emph{peer} authentication method, it is up +to the operator to supply the Postgres socket to the container (e.g.\ as a +volume bind mount). Equally, the operator needs to make sure that the database +is either running in a network that is also directly attached to the container +or that there is a mechanism in place that routes the requests for the database +hostname to the destination, unless a static IP configuration is used, which is +also possible. -One such mechanism is container name based routing inside \emph{pods} -(Podman/Kubernetes), where the resolution of container names is the -responsibility of a specially configured (often auto-configured) piece of -software called Aardvark for the former and CoreDNS for the latter. +Practically every container runtime satisfies this use case with a container +\emph{name-based routing} mechanism, which inside \emph{pods} (in case of +Podman/Kubernetes) or common default networks (that are both NAT-ted \emph{and} +routed) enables resolution of container names. This abstraction is a +responsibility of specially configured (most often autoconfigured) pieces of +software, Aardvark in case of Podman, and CoreDNS for Kubernetes, and it makes +using short-lived containers in dynamic networks convenient. \n{1}{Summary} -Passwords (and/or passphrases) are in use everywhere and quite probably will be -for the foreseeable future. If not as \textit{the} principal way to -authenticate, then at least as \textit{a} way to authenticate. As long as -passwords are going to be handled and stored by service/application providers, -they are going to get leaked, be it due to provider carelessness or the -attackers' resolve and wit. Of course, sifting through all the available -password breach data by hand is not a reasonable option, and therefore tools -providing assistance come in handy. The next part of this diploma thesis will -explore that issue and introduce a solution. +Passwords (and/or passphrases) are in use everywhere and will quite probably +continue to be for the foreseeable future. If not as \textit{the} principal way +to authenticate, then at least as \textit{a} way to authenticate. And for as +long as passwords are going to be handled and stored, they \emph{are} going to +get leaked, be it due to user or provider carelessness, or the attackers' +resolve and wit. Of course, sifting through the heaps of available password +breach data by hand is not a reasonable option, and therefore tools providing +assistance come in handy. The following part of this thesis will explore that +issue and suggest a solution. % =========================================================================== % diff --git a/tex/references.bib b/tex/references.bib index c73e7a8..79ced29 100644 --- a/tex/references.bib +++ b/tex/references.bib @@ -421,4 +421,96 @@ institution = {International Organization for Standardization} note={{Available from: \url{https://www.troyhunt.com/the-have-i-been-pwned-api-now-has-different-rate-limits-and-annual-billing/} [viewed 2023-08-15]}} } +@misc{blake3, + author = {Jack O'Connor and Jean-Philippe Aumasson and Samuel Neves and Zooko Wilcox-O-Hearn}, + year = 2021, + title = {{BLAKE3 - one function, fast everywhere}}, + subtitle = {{one function, fast everywhere}}, + howpublished = {[online]}, + note={{Available from: \url{https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf} [viewed 2023-08-14]}} +} + +@misc{megatron, + author = {m3g9tr0n}, + year = 2012, + publisher ={Thireus}, + title = {{Cracking Story - How I Cracked Over 122 Million SHA1 and MD5 Hashed Passwords}}, + howpublished = {[online]}, + note={{Available from: \url{https://blog.thireus.com/cracking-story-how-i-cracked-over-122-million-sha1-and-md5-hashed-passwords/} [viewed 2023-08-13]}} +} + +@misc{linkedin1, + author = {Chris Velazco}, + year = 2012, + monnt = {{June}}, + title = {{6.5 Million LinkedIn Passwords Reportedly Leaked, LinkedIn Is “Looking Into” It}}, + howpublished = {[online]}, + note={{Available from: \url{https://techcrunch.com/2012/06/06/6-5-million-linkedin-passwords-reportedly-leaked-linkedin-is-looking-into-it/} [viewed 2023-08-13]}} +} + +@misc{linkedin2, + author = {Sarah Perez}, + year = 2016, + month = may, + title = {{117 million LinkedIn emails and passwords from a 2012 hack just got posted online}}, + howpublished = {[online]}, + note={{Available from: \url{https://techcrunch.com/2016/05/18/117-million-linkedin-emails-and-passwords-from-a-2012-hack-just-got-posted-online/} [viewed 2023-08-13]}} +} + +@misc{plaintextpasswds1, + author = {Dan Goodin}, + year = 2015, + publisher = {ArsTechnica}, + title = {{13 million plaintext passwords belonging to webhost users leaked online}}, + howpublished = {[online]}, + note={{Available from: \url{https://arstechnica.com/information-technology/2015/10/13-million-plaintext-passwords-belonging-to-webhost-users-leaked-online/} [viewed 2023-08-13]}} +} + +@misc{plaintextpasswds2, + author = {Forcepoint}, + year = 2011, + month = dec, + title = {{Chinese Internet Suffers the Most Serious User Data Leak in History}}, + howpublished = {[online]}, + note={{Available from: \url{https://www.forcepoint.com/blog/x-labs/chinese-internet-suffers-most-serious-user-data-leak-history} [viewed 2023-08-13]}} +} + +@misc{plaintextpasswds3, + author = {Dan Goodin}, + year = 2016, + month = sep, + title = {{6.6 million plaintext passwords exposed as site gets hacked to the bone}}, + howpublished = {[online]}, + note={{Available from: \url{https: +//arstechnica.com/information-technology/2016/09/plaintext-passwords- +and-wealth-of-other-data-for-6-6-million-people-go-public/} [viewed 2023-08-13]}} +} + +@misc{rockyou, + author = {Imperva}, + year = 2014, + title = {{Consumer Password Worst Practices}}, + howpublished = {[online]}, + note={{Available from: \url{https://www.imperva.com/docs/gated/WP_Consumer_Password_Worst_Practices.pdf} [viewed 2023-08-13]}} +} + +@misc{hashcracking, + author = {Dan Goodin}, + year = 2012, + month = aug, + publisher = {ArsTechnica}, + title = {{Why passwords have never been weaker—and crackers have never been stronger}}, + howpublished = {[online]}, + note={{Available from: \url{https://arstechnica.com/information-technology/2012/08/passwords-under-assault/} [viewed 2023-08-13]}} +} + +@misc{hashcracking2, + author = {Per Thorsheim}, + year = 2012, + month = june, + title = {{Linkedin Password Infographic}}, + howpublished = {[online]}, + note={{Available from: \url{https://securitynirvana.blogspot.com/2012/06/linkedin-password-infographic.html} [viewed 2023-08-13]}} +} + % =========================================================================== %