diff --git a/tex/abbreviations.tex b/tex/abbreviations.tex index 1d68974..801a56d 100644 --- a/tex/abbreviations.tex +++ b/tex/abbreviations.tex @@ -13,11 +13,15 @@ PID & Process ID \\ Cgroup & Control group \\ TLS & Transport Layer Security \\ +TCP & Transmission Control Protocol \\ SSH & Secure Shell \\ +DNS & Domain Name System \\ +ISP & Internet Service Provider \\ GPG & GNU Privacy Guard \\ GNU & GNU's Not Unix! \\ CSS & Cascading Style Sheets \\ API & Application Programming Interface \\ +CLI & Command Line Interface \\ SCM & Source Code Management \\ HIBP & Have I Been Pwned \\ diff --git a/tex/references.bib b/tex/references.bib index e56f131..101a9d1 100644 --- a/tex/references.bib +++ b/tex/references.bib @@ -162,7 +162,7 @@ title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.}, author = {Filippo Sotille and Ben Cox and age contributors}, year = 2021, - note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-17]}} + note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-23]}} } @misc{x25519rfc7748, @@ -186,7 +186,49 @@ publisher = "GitHub", howpublished = {[online]}, year = "2007", - note={{Available from: \url{https://github.com/504ensicsLabs/LiME}. [viewed 2023-05-17]}} + note={{Available from: \url{https://github.com/504ensicsLabs/LiME}. [viewed 2023-05-23]}}, +} + +@misc{wwwf, + howpublished = {[online]}, + title = {History of the Web}, + author = {{World Wide Web Foundation}}, + year = 2021, + note={{Available from: \url{https://webfoundation.org/about/vision/history-of-the-web/}. [viewed 2023-05-23]}} +} + +@misc{ddvweb, + howpublished = {[online]}, + title = {What is this Gemini thing anyway, and why am I excited about it?}, + author = {{Drew DeVault}}, + year = 2020, + month = nov, + note={{Available from: \url{https://drewdevault.com/2020/11/01/What-is-Gemini-anyway.html}. [viewed 2023-05-23]}} +} + +@misc{gemini, + howpublished = {[online]}, + title = {Project Gemini}, + author = {Solderpunk and Sean Conner and {{The Gemini Contributors}}}, + year = 2019, + note={{Available from: \url{https://gemini.circumlunar.space/} and over Gemini from: \url{gemini://gemini.circumlunar.space/} [viewed 2023-05-23]}} +} + +@misc{geminispec, + howpublished = {[online]}, + title = {Speculative Specification}, + author = {Solderpunk and Sean Conner and {{The Gemini Contributors}}}, + year = 2019, + note={{Available from: \url{https://gemini.circumlunar.space/docs/specification.gmi} and over Gemini from: \url{gemini://gemini.circumlunar.space/docs/specification.gmi} [viewed 2023-05-23]}} +} + +@misc{chromiumrootdns, + howpublished = {[online]}, + title = {This well-intentioned Chrome feature is causing serious problems}, + author = {Anthony Spadafora}, + year = 2020, + month = aug, + note={{Available from: \url{https://www.techradar.com/news/this-well-intentioned-chrome-feature-is-causing-serious-problems} [viewed 2023-05-23]}} } % =========================================================================== % diff --git a/tex/text.tex b/tex/text.tex index 26b3940..a8f434f 100644 --- a/tex/text.tex +++ b/tex/text.tex @@ -120,31 +120,169 @@ Entropy, dictionaries, multiple factors. \n{1}{Web security}\label{sec:websecurity} The internet, being the vast space of intertwined concepts and ideas, is a -superset of the Web, which is the part of the internet that is discussed in the -next section. +superset of the Web, since not everything that is available on internet can be +described as web \emph{resources}. But precisely that is the part of the +internet that is discussed in the next sections and covers what browsers are, +what they do and how they relate to web security. + \n{2}{Browsers}\label{sec:browsers} -The following subsection covers what browsers are, what they do and how they -relate to web security. - TODO: describe how browsers find out where the web page lives, get a webpage, parse it, parse stylesheets, run scripts, apply SAMEORIGIN restrictions etc. - TODO: (privileged process running untrusted code on user's computer), history, present, security focus of the development teams, user facing signalling (padlock colours, scary warnings). +Browsers, sometimes used together with the word that can serve as a real tell +for their specialisation - web browsers - are programs intended for +\emph{browsing} of \emph{the web}. In more technical terms, browsers are +programs that facilitate (directly or via intermediary tools) domain name +lookups, connecting to web servers, optionally establishing a secure +connection, requesting the web page in question, determining its \emph{security +policy} and resolving what accompanying resources the web page specifies and +depending on the applicable security policy, requesting those from their +respective origins, applying stylesheets and running scripts. Constructing a +program that can speak many protocols, securely runs untrusted code from the +internet is no easy task. + +\n{3}{Complexity} + +Browsers these days are also quite ubiquitous programs running on +\emph{billions} of consumer-grade mobile devices (which are also notorious for +bad update hygiene) or desktop devices all over the world. Regular users +usually expect them to work flawlessly with a multitude of network conditions, +network scenarios (café WiFi, cellular data in a remote location, home +broadband that is DNS-poisoned by the ISP), differently tuned (or commonly +misconfigured) web servers, a combination of modern and \emph{legacy} +encryption schemes and different levels of conformance to web standards from +both web server and website developers. Of course, if a website is broken, it +is the browser's fault. Browsers are expected to detect if \emph{captive +portals} (a type of access control that usually tries to force the user through +a webpage with terms of use) are active and offer redirects. All of this is +immense complexity and the combination of ubiquity and great exposure this type +of software gets is in the authors opinion the cause behind a staggering amount +of vulnerabilities found, reported and fixed in browsers every year. + +\n{3}{Standardisation} + +Over the years, a consortium of parties interested in promoting and developing +the web (also due to its potential as a digital marketplace, i.e.\ financial +incentives) and browser vendors (of which the most neutral participant is +perhaps \emph{Mozilla}, with Chrome being run by Google, Edge by Microsoft and +Safari/Webkit by Apple) has evolved a great volume of web standards, which are +also relatively frequently getting updated or deprecated and replaced by +revised or new ones, rendering the browser maintenance task into essentially a +cat-and-mouse game. + +It is the web's extensibility that enabled this build-up and ironically has +been proclaimed by some to be its greatest asset. It has also been ostensibly +been criticised~\cite{ddvweb} in the past and the frustration with the status +quo of web standards has relatively recently prompted a group of people to even +create ``\textit{a new application-level internet protocol for the distribution +of arbitrary files, with some special consideration for serving a lightweight +hypertext format which facilitates linking between files}'': +Gemini~\cite{gemini}\cite{geminispec} that in the words of its authors can be +thought of as ``\textit{the web, stripped right back to its essence}'' or as +``\textit{Gopher, souped up and modernised just a little}'', depending upon the +reader's perspective, noting that the latter view is probably more accurate. + +\n{3}{HTTP} + +Originally, HTTP was also designed just for fetching hypertext +\emph{resources}, but it has evolved since then, particularly due to its +extensibility, to allow for fetching of all sorts of web resources a modern +website of today provides, such as scripts or images, or even to \emph{post} +content back to servers. + +HTTP relies on TCP (Transmission Control Protocol), which is one of the +\emph{reliable} (mandated by HTTP) protocols used to send data across +contemporary IP (Internet Protocol) networks, to deliver the data it requests +or sends. When Tim Berners-Lee invented the World Wide Web (WWW) in 1989 while +working at CERN (The European Organization for Nuclear Research) with a rather +noble intent as a ``\emph{wide-area hypermedia information retrieval initiative + to give universal access to a large universe of documents}''~\cite{wwwf}, he + also invented the HyperText Markup Language (HTML) to serve as a formatting + method for these new hypermedia documents. The first website was written + roughly the same way as today's websites are, using HTML, although the markup + language has changed since, with the current version being HTML5. + +It has been mentioned that the client \textbf{requests} a \textbf{resource} and +receives a \textbf{response}, so those terms should probably be defined. + +A request is what the client sends to the server. A resource is what it +requests and a response is the answer provided by the server. + +HTTP follows a classic client-server model whereby it is \textbf{always} the +client that initiates the request. + +A web page is, to be blunt, a chunk of \emph{hypertext}. To display a web page, +a browser first needs to send a request to fetch the HTML representing the +page, which is then parsed and additional requests for sub-resources are made. +If a page defines a layout information in the form of CSS, that is parsed as +well. + +A web page needs to be present on the local computer first \emph{before} it can +be parsed by the browser, and since websites are usually still served by +programs called \emph{web servers} as in the \emph{early days}, that presents a +problem of how tell the browser where from the resource should be pulled. In +today's browsers, the issue is sorted (short of the CLI) by the \emph{address +bar}, a place into which user types what they wish the browser to fetch for +them. + +The formal name of this segment is a \emph{Universal Resource Locator}, or URL, +and it contains the schema (or the protocol, such as \texttt{http://}), the +host address or a domain name and a (TCP) port number. + +Since a TCP connection needs to be established first, to connect to a server +whose only URL contains a domain name, the browser needs to perform a domain +name \emph{lookup} using system facilities, or as was the case for a couple of +notorious Chromium versions, send some additional and unrelated queries which +(with Chromium-based derivatives' numbers) ended up placing unnecessary load +directly at the root DNS servers~\cite{chromiumrootdns}. + +If a raw IP address+port combination is used, the browser attempts to connect +to it directly and requests the user-requested page by default using the +\texttt{GET} \emph{method}. A \emph{well-known} HTTP port 80 is assumed unless +other port is explicitly specified and it can be omitted both if host is a +domain name or an IP address. + +The method is a way for the user-agent to define what operation it wants to +perform. \texttt{GET} is used for fetching resources while \texttt{POST} is +used to send data to the server, such as to post the values of an HTML form. + +A server response is comprised of a \textbf{status code}, a status message, +HTTP \textbf{headers} and an optional \textbf{body} containing the content. The +status code indicates if the original request was successful or not and the +browser is generally there to interpret these status codes to the user. There +is enough status codes to be confused by the sheer numbers but luckily, there +is a method to the madness and they can be divided into groups/classes: + +\begin{itemize} + \item 1xx: Informational responses + \item 2xx: Successful responses + \item 3xx: Redirection responses + \item 4xx: Client error responses + \item 5xx: Server error responses +\end{itemize} + +In case the \emph{user agent} (a web \emph{client}) such as a browser receives +a response with content, it has to parse it. + +A header is additional information sent by both the server and the client. + \n{2}{Cross-site scripting}\label{sec:xss} \n{2}{Content Security Policy}\label{sec:csp} + Content Security Policy has been an important addition to the arsenal of -website administrators, even though not everybody has necessarily taken notice -or even utilised it properly. To understand what guarantees it provides and +website operators, even though not everybody has necessarily been utilising it +properly or even taken notice. To understand what guarantees it provides and what kind of protections it employs, it is first necessary to grok how websites -are parsed and displayed, which has been discussed in depth in -Section~\ref{sec:browsers}. +are parsed and displayed, which has been discussed in depth in previous +sections. + \n{1}{Sandboxing}\label{sec:sandboxing} \n{2}{User isolation}