tex: extend, reword the hash functions section

2023-08-23 20:05:58 +02:00 · 2023-08-23 20:05:58 +02:00 · 56cc0239d8
commit 56cc0239d8
parent 9893ee9b50
2 changed files with 91 additions and 40 deletions
--- a/tex/part-theoretical.tex
+++ b/tex/part-theoretical.tex
@ -43,21 +43,19 @@ to a website protected by the famed HTTPS.
 The popularity of hash functions stems from a common use case: the need to
 simplify reliably identifying a chunk of data. Of course, two chunks of data,
 two files, frames or packets could always be compared bit by bit, but that can
-get prohibitive from both cost and energy point of view relatively quickly.
+get prohibitive from both cost and energy point of view relatively quickly,
-That is when the hash functions come in, since they are able to take a long
+with transport channels being often insecure and unreliable. That is when the
-input and produce a short output, named a digest or a hash value. The function
+hash functions come in, since they are able to take a long input and produce a
-also only works one way.
+short output, named a digest or a hash value. The function also only works one
-
+way. A file, or any original input data for that matter, cannot be
-A file, or any original input data for that matter, cannot be reconstructed
+reconstructed from the hash digest alone by somehow \emph{reversing} the
-from the hash digest alone by somehow \emph{reversing} the hashing operation,
+hashing operation, since at the heart of any hash function there is essentially
-since at the heart of any hash function there is essentially a compression
+a compression function.
 function.
 Most alluringly, hashes are frequently used with the intent of
 \emph{protecting} passwords by making those unreadable, while still being able
-to verify that the user knows the password, therefore should be authorised.
+to verify that the user knows the password, therefore should be authorised. As
-
+the hashing operation is irreversible, once the one-way function produces a
 As the hashing operation is irreversible, once the one-way function produces a
 short a digest, there is no way to reconstruct the original message from it.
 That is, unless the input of the hash function is also known, in which case all
 it takes is hashing the supposed input and comparing the digest with existing
@ -66,41 +64,52 @@ digests that are known to be digests of passwords.
 \n{3}{Types and use cases}
-Hash functions can be loosely categorised based on their intended use case to
+Hash functions can be loosely categorised based on their intended cryptographic
-\emph{password protection hashes}, \emph{integrity verification hashes},
+application to \emph{password protection}, \emph{integrity verification},
-\emph{message authentication codes} and \emph{cryptographic hashes}. Each of
+\emph{message authentication} hashes. Each of them possesses unique
-these possess unique characteristics and using the wrong type of hash function
+characteristics and using the wrong type of hash function for the wrong job can
-for the wrong job can potentially result in a security breach.
+potentially result in a security breach.
-As an example, suppose \texttt{MD5}, a popular hash function internally using
+As a contrived example, suppose \texttt{MD5}, a popular hash function
-the same data structure - \emph{Merkle-Damgård} construction - as
+internally using the same data structure - \emph{Merkle-Damgård} (MD)
-\texttt{BLAKE3}. The former produces 128 bit digests, compared to the default
+construction - as \texttt{BLAKE3}. The former produces 128 bit digests,
-256 bits of output and no upper ($<2^{64}$ bytes) limit (Merkle tree
+compared to the default 256 bits of output and no upper ($<2^{64}$ bytes) limit
-extensibility) for the latter. There is a list of differences that could
+(Merkle tree extensibility) for the latter. Aside from \texttt{MD5} considered
-further be mentioned, however, they both have one thing in common: they are
+to be \emph{broken} in regard to collision
-\emph{designed} to be \emph{fast}. The latter, as a cryptographic hash
+resistance~\cite{md5collision}~\cite{md5collision2} (and have theoretically
-function, is conjectured to be \emph{random oracle indifferentiable}, secure
+weakened resistance to preimages~\cite{md5preimage}~\cite{md5preimage2}), a
-against length extension, but it is also in fact faster than all of
+list of differences could be mentioned; however, they both have one thing in
 common: they are \emph{designed} to be \emph{fast}. The latter cryptographic
 hash function, is conjectured to be \emph{random oracle indifferentiable},
 secure against length extension, and was built with pre-image and collision
 resistance in mind. That said, it is also in fact faster than all of
 \texttt{MD5}, \texttt{SHA3-256}, \texttt{SHA-1} and even \texttt{Blake2} family
 of functions~\cite{blake3}.
-The use case of both is to (quickly) verify integrity of a given chunk of data,
+\begin{lstlisting}[caption=Broken collision resistance of
-in case of \texttt{BLAKE3} with pre-image and collision resistance in mind, not
+\texttt{MD5},label=md5,backgroundcolor=\color{lstbg}]
-to secure a password by hashing it first, which poses a big issue when used
+    m := x
    m' := y
    MD5(m) == MD5(m')
 \end{lstlisting}
 However, the default use case of both \texttt{MD5} and \texttt{BLAKE3}
 (unkeyed) is to (quickly) verify integrity of a given chunk of data, not to
 secure a password by hashing it first, which poses a big issue when used
 to...secure passwords by hashing them first.
 Password hashing functions such as \texttt{argon2} or \texttt{bcrypt} are good
-choices for \emph{securely} storing hashed passwords, namely because they place
+choices for \emph{securely} storing passwords representations, namely because
-CPU and memory burden on the machine that is computing the digest. In case of
+they place CPU and memory burden on the machine that is computing the digest.
-the mentioned functions, \emph{hardness} is even configurable to satisfy the
+In case of the mentioned functions, \emph{hardness} is even configurable to
-greatest possible array of scenarios. These functions also forcefully limit
+satisfy the greatest possible array of scenarios. These functions also
-potential parallelism, thereby restricting the scale at which exhaustive
+forcefully limit potential parallelism, thereby restricting the scale at which
-searches performed using tools like \texttt{Hashcat} or \texttt{John the
+exhaustive searches performed using tools like \texttt{Hashcat} or \texttt{John
-Ripper} could be at all feasible, practically obviating old-school hash
+the Ripper} could be at all feasible, Additionally, both functions can
-cracking~\cite{hashcracking},~\cite{hashcracking2}. Additionally, both
+automatically add random \emph{salts} to passwords, automatically ensuring that
-functions can automatically add random \emph{salt} to passwords, automatically
+no copies of the same password provided by different users end up hashing to
-ensuring that no copies of the same password provided by different users will
+the same digest value, which for practical purposes obviates large-scale
-end up hashing to the same digest value.
+old-school hash cracking~\cite{hashcracking},~\cite{hashcracking2}.
 \n{3}{Why are hashes interesting}
--- a/tex/references.bib
+++ b/tex/references.bib
@ -513,4 +513,46 @@ and-wealth-of-other-data-for-6-6-million-people-go-public/} [viewed 2023-08-13]}
 	note={{Available from: \url{https://securitynirvana.blogspot.com/2012/06/linkedin-password-infographic.html} [viewed 2023-08-13]}}
 }
@inproceedings{md5collision,
 author = {Wang, Xiaoyun and Yu, Hongbo},
 year = {2005},
 month = {05},
 pages = {561-561},
 title = {How to Break MD5 and Other Hash Functions},
 volume = {3494},
 isbn = {978-3-540-25910-7},
 journal = {Lecture Notes in Computer Science},
 doi = {10.1007/11426639_2}
 }
@article{md5collision2,
  author = {Klíma, Vlastimil},
  year = 2006,
  month = jan,
  pages = {105},
  title = {Tunnels in Hash Functions: MD5 Collisions Within a Minute.},
  volume = {2006},
  journal = {IACR Cryptology ePrint Archive}
 }
@inbook{md5preimage,
  title={ Finding Preimages in Full MD5 Faster Than Exhaustive Search },
  author={ Yu Sasaki and Kazumaro Aoki },
  year= 2009 ,
  publisher={ Springer, Berlin, Heidelberg },
  pages={ 134-152 },
  doi={ 10.1007/978-3-642-01001-9_8 },  
 }
@inproceedings{md5preimage2,
  author={Mao, Ming and Chen, Shaohui and Xu, Jin},
  booktitle={2009 International Conference on Computational Intelligence and Security}, 
  title={Construction of the Initial Structure for Preimage Attack of MD5}, 
  year={2009},
  volume={1},
  number={},
  pages={442-445},
  doi={10.1109/CIS.2009.214}
 }
 % =========================================================================== %