1
0

tex: add stuff on browsers

also mention Gemini (https://gemini.circumlunar.space/)
This commit is contained in:
leo 2023-05-23 00:03:47 +02:00
parent dd8eb1d3c5
commit f7db0cb375
Signed by: wanderer
SSH Key Fingerprint: SHA256:Dp8+iwKHSlrMEHzE3bJnPng70I7LEsa3IJXRH/U+idQ
3 changed files with 196 additions and 12 deletions

@ -13,11 +13,15 @@ PID & Process ID \\
Cgroup & Control group \\ Cgroup & Control group \\
TLS & Transport Layer Security \\ TLS & Transport Layer Security \\
TCP & Transmission Control Protocol \\
SSH & Secure Shell \\ SSH & Secure Shell \\
DNS & Domain Name System \\
ISP & Internet Service Provider \\
GPG & GNU Privacy Guard \\ GPG & GNU Privacy Guard \\
GNU & GNU's Not Unix! \\ GNU & GNU's Not Unix! \\
CSS & Cascading Style Sheets \\ CSS & Cascading Style Sheets \\
API & Application Programming Interface \\ API & Application Programming Interface \\
CLI & Command Line Interface \\
SCM & Source Code Management \\ SCM & Source Code Management \\
HIBP & Have I Been Pwned \\ HIBP & Have I Been Pwned \\

@ -162,7 +162,7 @@
title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.}, title = {A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.},
author = {Filippo Sotille and Ben Cox and age contributors}, author = {Filippo Sotille and Ben Cox and age contributors},
year = 2021, year = 2021,
note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-17]}} note={{Available from: \url{https://github.com/FiloSottile/age}. [viewed 2023-05-23]}}
} }
@misc{x25519rfc7748, @misc{x25519rfc7748,
@ -186,7 +186,49 @@
publisher = "GitHub", publisher = "GitHub",
howpublished = {[online]}, howpublished = {[online]},
year = "2007", year = "2007",
note={{Available from: \url{https://github.com/504ensicsLabs/LiME}. [viewed 2023-05-17]}} note={{Available from: \url{https://github.com/504ensicsLabs/LiME}. [viewed 2023-05-23]}},
}
@misc{wwwf,
howpublished = {[online]},
title = {History of the Web},
author = {{World Wide Web Foundation}},
year = 2021,
note={{Available from: \url{https://webfoundation.org/about/vision/history-of-the-web/}. [viewed 2023-05-23]}}
}
@misc{ddvweb,
howpublished = {[online]},
title = {What is this Gemini thing anyway, and why am I excited about it?},
author = {{Drew DeVault}},
year = 2020,
month = nov,
note={{Available from: \url{https://drewdevault.com/2020/11/01/What-is-Gemini-anyway.html}. [viewed 2023-05-23]}}
}
@misc{gemini,
howpublished = {[online]},
title = {Project Gemini},
author = {Solderpunk and Sean Conner and {{The Gemini Contributors}}},
year = 2019,
note={{Available from: \url{https://gemini.circumlunar.space/} and over Gemini from: \url{gemini://gemini.circumlunar.space/} [viewed 2023-05-23]}}
}
@misc{geminispec,
howpublished = {[online]},
title = {Speculative Specification},
author = {Solderpunk and Sean Conner and {{The Gemini Contributors}}},
year = 2019,
note={{Available from: \url{https://gemini.circumlunar.space/docs/specification.gmi} and over Gemini from: \url{gemini://gemini.circumlunar.space/docs/specification.gmi} [viewed 2023-05-23]}}
}
@misc{chromiumrootdns,
howpublished = {[online]},
title = {This well-intentioned Chrome feature is causing serious problems},
author = {Anthony Spadafora},
year = 2020,
month = aug,
note={{Available from: \url{https://www.techradar.com/news/this-well-intentioned-chrome-feature-is-causing-serious-problems} [viewed 2023-05-23]}}
} }
% =========================================================================== % % =========================================================================== %

@ -120,31 +120,169 @@ Entropy, dictionaries, multiple factors.
\n{1}{Web security}\label{sec:websecurity} \n{1}{Web security}\label{sec:websecurity}
The internet, being the vast space of intertwined concepts and ideas, is a The internet, being the vast space of intertwined concepts and ideas, is a
superset of the Web, which is the part of the internet that is discussed in the superset of the Web, since not everything that is available on internet can be
next section. described as web \emph{resources}. But precisely that is the part of the
internet that is discussed in the next sections and covers what browsers are,
what they do and how they relate to web security.
\n{2}{Browsers}\label{sec:browsers} \n{2}{Browsers}\label{sec:browsers}
The following subsection covers what browsers are, what they do and how they
relate to web security.
TODO: describe how browsers find out where the web page lives, get a webpage, TODO: describe how browsers find out where the web page lives, get a webpage,
parse it, parse stylesheets, run scripts, apply SAMEORIGIN restrictions etc. parse it, parse stylesheets, run scripts, apply SAMEORIGIN restrictions etc.
TODO: (privileged process running untrusted code on user's computer), history, TODO: (privileged process running untrusted code on user's computer), history,
present, security focus of the development teams, user facing signalling present, security focus of the development teams, user facing signalling
(padlock colours, scary warnings). (padlock colours, scary warnings).
Browsers, sometimes used together with the word that can serve as a real tell
for their specialisation - web browsers - are programs intended for
\emph{browsing} of \emph{the web}. In more technical terms, browsers are
programs that facilitate (directly or via intermediary tools) domain name
lookups, connecting to web servers, optionally establishing a secure
connection, requesting the web page in question, determining its \emph{security
policy} and resolving what accompanying resources the web page specifies and
depending on the applicable security policy, requesting those from their
respective origins, applying stylesheets and running scripts. Constructing a
program that can speak many protocols, securely runs untrusted code from the
internet is no easy task.
\n{3}{Complexity}
Browsers these days are also quite ubiquitous programs running on
\emph{billions} of consumer-grade mobile devices (which are also notorious for
bad update hygiene) or desktop devices all over the world. Regular users
usually expect them to work flawlessly with a multitude of network conditions,
network scenarios (café WiFi, cellular data in a remote location, home
broadband that is DNS-poisoned by the ISP), differently tuned (or commonly
misconfigured) web servers, a combination of modern and \emph{legacy}
encryption schemes and different levels of conformance to web standards from
both web server and website developers. Of course, if a website is broken, it
is the browser's fault. Browsers are expected to detect if \emph{captive
portals} (a type of access control that usually tries to force the user through
a webpage with terms of use) are active and offer redirects. All of this is
immense complexity and the combination of ubiquity and great exposure this type
of software gets is in the authors opinion the cause behind a staggering amount
of vulnerabilities found, reported and fixed in browsers every year.
\n{3}{Standardisation}
Over the years, a consortium of parties interested in promoting and developing
the web (also due to its potential as a digital marketplace, i.e.\ financial
incentives) and browser vendors (of which the most neutral participant is
perhaps \emph{Mozilla}, with Chrome being run by Google, Edge by Microsoft and
Safari/Webkit by Apple) has evolved a great volume of web standards, which are
also relatively frequently getting updated or deprecated and replaced by
revised or new ones, rendering the browser maintenance task into essentially a
cat-and-mouse game.
It is the web's extensibility that enabled this build-up and ironically has
been proclaimed by some to be its greatest asset. It has also been ostensibly
been criticised~\cite{ddvweb} in the past and the frustration with the status
quo of web standards has relatively recently prompted a group of people to even
create ``\textit{a new application-level internet protocol for the distribution
of arbitrary files, with some special consideration for serving a lightweight
hypertext format which facilitates linking between files}'':
Gemini~\cite{gemini}\cite{geminispec} that in the words of its authors can be
thought of as ``\textit{the web, stripped right back to its essence}'' or as
``\textit{Gopher, souped up and modernised just a little}'', depending upon the
reader's perspective, noting that the latter view is probably more accurate.
\n{3}{HTTP}
Originally, HTTP was also designed just for fetching hypertext
\emph{resources}, but it has evolved since then, particularly due to its
extensibility, to allow for fetching of all sorts of web resources a modern
website of today provides, such as scripts or images, or even to \emph{post}
content back to servers.
HTTP relies on TCP (Transmission Control Protocol), which is one of the
\emph{reliable} (mandated by HTTP) protocols used to send data across
contemporary IP (Internet Protocol) networks, to deliver the data it requests
or sends. When Tim Berners-Lee invented the World Wide Web (WWW) in 1989 while
working at CERN (The European Organization for Nuclear Research) with a rather
noble intent as a ``\emph{wide-area hypermedia information retrieval initiative
to give universal access to a large universe of documents}''~\cite{wwwf}, he
also invented the HyperText Markup Language (HTML) to serve as a formatting
method for these new hypermedia documents. The first website was written
roughly the same way as today's websites are, using HTML, although the markup
language has changed since, with the current version being HTML5.
It has been mentioned that the client \textbf{requests} a \textbf{resource} and
receives a \textbf{response}, so those terms should probably be defined.
A request is what the client sends to the server. A resource is what it
requests and a response is the answer provided by the server.
HTTP follows a classic client-server model whereby it is \textbf{always} the
client that initiates the request.
A web page is, to be blunt, a chunk of \emph{hypertext}. To display a web page,
a browser first needs to send a request to fetch the HTML representing the
page, which is then parsed and additional requests for sub-resources are made.
If a page defines a layout information in the form of CSS, that is parsed as
well.
A web page needs to be present on the local computer first \emph{before} it can
be parsed by the browser, and since websites are usually still served by
programs called \emph{web servers} as in the \emph{early days}, that presents a
problem of how tell the browser where from the resource should be pulled. In
today's browsers, the issue is sorted (short of the CLI) by the \emph{address
bar}, a place into which user types what they wish the browser to fetch for
them.
The formal name of this segment is a \emph{Universal Resource Locator}, or URL,
and it contains the schema (or the protocol, such as \texttt{http://}), the
host address or a domain name and a (TCP) port number.
Since a TCP connection needs to be established first, to connect to a server
whose only URL contains a domain name, the browser needs to perform a domain
name \emph{lookup} using system facilities, or as was the case for a couple of
notorious Chromium versions, send some additional and unrelated queries which
(with Chromium-based derivatives' numbers) ended up placing unnecessary load
directly at the root DNS servers~\cite{chromiumrootdns}.
If a raw IP address+port combination is used, the browser attempts to connect
to it directly and requests the user-requested page by default using the
\texttt{GET} \emph{method}. A \emph{well-known} HTTP port 80 is assumed unless
other port is explicitly specified and it can be omitted both if host is a
domain name or an IP address.
The method is a way for the user-agent to define what operation it wants to
perform. \texttt{GET} is used for fetching resources while \texttt{POST} is
used to send data to the server, such as to post the values of an HTML form.
A server response is comprised of a \textbf{status code}, a status message,
HTTP \textbf{headers} and an optional \textbf{body} containing the content. The
status code indicates if the original request was successful or not and the
browser is generally there to interpret these status codes to the user. There
is enough status codes to be confused by the sheer numbers but luckily, there
is a method to the madness and they can be divided into groups/classes:
\begin{itemize}
\item 1xx: Informational responses
\item 2xx: Successful responses
\item 3xx: Redirection responses
\item 4xx: Client error responses
\item 5xx: Server error responses
\end{itemize}
In case the \emph{user agent} (a web \emph{client}) such as a browser receives
a response with content, it has to parse it.
A header is additional information sent by both the server and the client.
\n{2}{Cross-site scripting}\label{sec:xss} \n{2}{Cross-site scripting}\label{sec:xss}
\n{2}{Content Security Policy}\label{sec:csp} \n{2}{Content Security Policy}\label{sec:csp}
Content Security Policy has been an important addition to the arsenal of Content Security Policy has been an important addition to the arsenal of
website administrators, even though not everybody has necessarily taken notice website operators, even though not everybody has necessarily been utilising it
or even utilised it properly. To understand what guarantees it provides and properly or even taken notice. To understand what guarantees it provides and
what kind of protections it employs, it is first necessary to grok how websites what kind of protections it employs, it is first necessary to grok how websites
are parsed and displayed, which has been discussed in depth in are parsed and displayed, which has been discussed in depth in previous
Section~\ref{sec:browsers}. sections.
\n{1}{Sandboxing}\label{sec:sandboxing} \n{1}{Sandboxing}\label{sec:sandboxing}
\n{2}{User isolation} \n{2}{User isolation}