bc-thesis/tex/text.tex

1126 lines
59 KiB
TeX
Raw Permalink Normal View History

2021-05-07 23:17:24 +02:00
% ============================================================================ %
\nn{Introduction}
In this day and age, there has probably been nothing else considered as
inevitable and impactful on one's personal or professional life, media,
politics, culture or science as reliable Internet connection, allowing seemless
interaction and access to information.
Networking infrastructure powering the Internet has been the target of various
attacks almost since day one. While the early network interconnections had
been based on mutual trust and strong sharing spirit for advancement of humanity, with
increasing amount of participants from different backgrounds (essentially
democratization) and proliferating deeds of mischief, trust could no longer be
taken for granted.
Although trolling has always been an integral part of the Internet culture, it
had not taken long until criminals found their way into the online world with
the intention of taking advantage of lacking legislation and collecting easy
(though huge) economic profit. Furthermore, another popular activity has become
causing real harm to anybody on the way with the sense of lawlessness.
It is this willful damage in its many shapes and forms brought upon the
Internet service providers and users that our thesis deals with. What we have
commonly observed as being used in the wild in recent years may be called
attack 'professionalizing', by which we mean that the tools typically available
10 years ago cannot match to practically anything you can get ready in minutes
to cause real harm today. Since the networks compose the medium of
communication, it is essential they remained secured and protected, which in
fact poses a challenging task for us of coming up with new solutions to rapidly
evolving problem.
On the side of the user, unavailability of the service often means inability to
proceed with fulfilling their tasks. For the provider, an attack that cripples
the network or its resources means inability to provide the promised service.
We would like to take a look on some of the conventional methods and tools
which attackers resort to as well as methods and tools both users and network
operators may (and should) utilise to protect the network. The main objectives
of our thesis aim at researching key ways of DoS attack mitigations and
performing a DoS attack and a BGP blackhole reaction simulation.
Our thesis comprises two main parts - theoretical and practical. The
theoretical part discusses relevant context, different types of attack methods
and tools, mitigation attack methods and tools. The practical part describes a
simulated simple topology and an attack in lab conditions.
In the first two sections of the theoretical part, we were focused on attack
definitions and providing context. Section \ref{sec:attack-methods} attempts to
categorise attack methods based on several metrics and explain some of the
mentioned. The fourth section presents a list of some of the popular DoS tools
that are available. Next on, the \autoref{sec:mitigation-methods} traces
various ways to mitigate attacks, both on the user and provider level and the
final section of the theoretical part, \autoref{sec:mitigation-tools}, marks
out some of the tools that can be used to mitigate an attack.
First of the practical part, \autoref{sec:infrastructure-set-up-description} gives an
overview of the tools used to construct the lab infrastructure and configure
the systems, as well as specifics of the configuration. It also focuses on
setting up the infrastructure and the tools and processes to achieve a
reproducible outcome. In \autoref{sec:mitigation-tools-set-up}, we set up
mitigation tools in preparation of an attack. In
\autoref{sec:attack-tools-set-up} we go through the process of preparing attack
2021-05-17 21:10:50 +02:00
tools. The final section \ref{sec:performing-an-attack} describes performing the attack
itself.
2021-05-07 23:17:24 +02:00
% ============================================================================ %
\part{Theoretical part}
2021-05-07 23:17:24 +02:00
\n{1}{Definition} \label{sec:definition}
While denial of service can be caused in a multitude of different ways and
impact any part of the stack, we are predominantly going to look at the ones
pertaining Internet networks. First, we shall define what a denial of service
(\it{DoS}) attack in fact \it{is} and we can achieve it by understanding what
it does.
A DoS attack is an action that harms a \it{service} in such a way that it can
no longer serve legitimate requests as a result of being occupied by bogus or
excessive requests from the attacker.
2021-05-14 06:44:16 +02:00
A DDoS is a DoS that is distributed among many participant devices (and
operators).
2021-05-17 21:10:50 +02:00
\begin{figure}[!hbt]
\centering
\begin{tikzpicture}
\draw (1,1)[color=red!60,thick,fill=red!15] circle (2.2cm);
\draw (0.4,0.2pt) node[below left] {$DoS$};
\draw (1,1)[dashed,fill=purple!25] circle (1.2cm);
\draw (1,5pt) node[above] {$DDoS$};
\end{tikzpicture}
\caption{Illustration of relationship between DoS and DDoS attacks.}
\end{figure}
2021-05-14 06:44:16 +02:00
The devices participating are generally also victims in this, most of the
attacks are performed with open DNS resolvers, home routers left to rot by
vendors, misconfigured web services or IoT devices as involuntary
participants. All one has to do is open Shodan and look for specific ports open
(ports of protocols with good reflection ratio such as DNS, CLDAP, or SSDP),
start probing and then reap the easy harvest. A quick search for devices
running with port 123 (NTP) open is certain to return a mind-blowing number
\cite{ShodanNTPd}.
\n{1}{Context} \label{sec:context}
In the last decade only we have witnessed many large DoS/DDoS attacks, some of
them against critical infrastructure services like cloud hosting, DNS, git
hosting services or even CCTV cameras. All of the attacks weaponized poorly
managed endpoints, unpatched IoT devices or
malware-infected-hosts-turned-botnet-zombies. The intensity and frequency have
also been sharply increasing with the latest of attacks passing over the Tbps
2021-05-14 06:44:16 +02:00
threshold (Akamai mitigated a 1.44Tbps DDoS in 2020
\cite{akamai2020ddosretrospect}), with data from Cisco Annual Internet Report
showing that overall there was a \textbf{776\%} growth in attacks between 100
Gbps and 400 Gbps from 2018 to 2019, and with predictions for the total number
of DDoS attacks to double from 7.9 million in 2018 to 15.4 million by 2023
\cite{cisco2020report}. The question is: why?
2021-05-14 06:44:16 +02:00
The motifs will probably more often than not stay a mystery; however, a
proliferation of DDoS-for-hire websites \cite{Santanna2018BooterLG}, even on
\emph{clearnet}\footnotemark, points us to a plausible answer.
\footnotetext{the surface web; i.e.\ not even attempting to hide}
2021-05-14 06:44:16 +02:00
Somebody is making money selling abusive services that are being used for
putting competitors out of business or just plain extortion. According to
Akamai, extortion attacks have seen a widespread return, with a new wave
launching in mid-August 2020 \cite{akamai2021ddos}.
Akamai went on to note that DDoS attackers are expanding their reach across
geographies and industries, with the number of targeted entities now being 57\%
2021-05-14 06:44:16 +02:00
higher than the year before that.
\n{1}{Attack methods} \label{sec:attack-methods}
There are generally several different ways to categorise a method of
2021-05-14 06:44:16 +02:00
attack:
\begin{description}
2021-05-14 06:44:16 +02:00
\item[By layers, in which the attacks are performed:]\
\begin{itemize}
2021-05-17 21:10:50 +02:00
\item Link layer
\item Internet layer
\item Transport layer
\item Application layer
\end{itemize}
\end{description}
\begin{description}
2021-05-14 06:44:16 +02:00
\item[By the nature of their distribution:]\
\begin{description}
\item[distributed] the effort is collectively advanced by a group of
2021-05-14 06:44:16 +02:00
devices
\begin{enumerate}
2021-05-14 06:44:16 +02:00
\item deliberate
\begin{enumerate}
\item remotely coordinated devices (IRC C\&C) - so called \it{voluntary botnets}
\item each operating their own computer, performing a premeditated operation
in a synchronized manner
\end{enumerate}
\item involuntary - hijacked devices
\end{enumerate}
\item[not distributed] there is a single source of badness
\end{description}
\end{description}
\begin{description}
2021-05-14 06:44:16 +02:00
\item [By the kind of remoteness necessary to successfully execute the
attack:]\
\begin{description}
\item[close-proximity] (physical engagement, i.e. sabotage) requires physical
presence in/near e.g. a datacenter, networking equipment (cutting cables,
playing a pyro)
\item[local network access] such as over a WiFi access point or on LAN
2021-05-17 21:10:50 +02:00
\item[remote] such as over the Internet
\end{description}
\end{description}
\begin{description}
2021-05-14 06:44:16 +02:00
\item[By specific features:]\
\begin{itemize}
\item IP fragmentation
\item SYN flood - a rapid sequence of TCP protocol SYN messages
\item volumetric DDoS attack
\item amplification attack (also called 'reflection attack')
\begin{itemize}
\item memcached (up to 1:51200)
\item DNS with a formula \cite{akamaidnsampl}
\rov[DNSamplificationformula]{R = answer size / query size}
\item SNMP (theoretically 1:650)
\item NTP
\end{itemize}
\item exploits
\begin{itemize}
\item 0days
\item simply running unpatched versions of software
\end{itemize}
\item physical network destruction/crippling
\end{itemize}
\end{description}
\n{2}{IP fragmentation}
2021-05-14 06:44:16 +02:00
This is the type of attack whereby an attacker attempts to send a fragmented
payload (TCP, UDP or even ICMP) that the client is supposed to reassemble at
the destination, by doing of which their system resources (CPU and mainly
memory) would quickly get depleted, ultimately crashing the host.
It is usually necessary for IP datagrams (packets) to get fragmented in order
to be transmitted over the network. If a packet being sent is larger than the
maximum transmission unit (MTU) of the receiving side (e.g. a server), it has
to be fragmented to be transmitted completely.
ICMP and UDP fragmentation usually involves packets larger than the MTU, a
simple attempt to overwhelm the receiver that is unable to reassemble such
packets, ideally even accompanied by a buffer overflow that the attacker can
exploit further. Fragmenting TCP segments, on the other hand, targets the TCP
mechanism for reassembly. Reasonably recent Linux kernel implements protection
against this \cite{linuxretransmission}.
In either case, this is a network layer attack, since it targets the way the
Internet Protocol requires data to be transmitted and processed.
\n{2}{SYN flood} \label{synfloodattack}
To establish a TCP connection, a \emph{three way handshake} must be
performed.\\
That is the opening sequence of a TCP connection that any two machines -
let's call them TCP A and TCP B - perform, whereby TCP A wanting to talk
sends a \emph{segment} with a SYN control flag, TCP B (assuming also willing to
communicate) responds with a segment with SYN and ACK control flags set and
finally, TCP A answers with a final ACK \cite{rfc793tcp}.
2021-05-14 06:44:16 +02:00
Using \texttt{tcpdump} we can capture an outgoing SYN packet on interface
\texttt{enp0s31f6}.
\begin{verbatim}
2021-05-14 06:44:16 +02:00
# tcpdump -Q out -n -N -c 1 -v -i enp0s31f6 "tcp[tcpflags] == tcp-syn"
\end{verbatim}
2021-05-14 06:44:16 +02:00
A malicious actor is able to misuse the handshake mechanism by posing as a
legitimate \emph{client} (or rather many legitimate clients) and sending large
number of SYN segments to a \emph{server} willing to establish a connection
(\it{LISTEN} state). The server replies with a [SYN, ACK], which is a combined
acknowledgement of the client's request \it{and} a synchronization request of
2021-05-14 06:44:16 +02:00
its own. The client responds back with an ACK and then the connection reaches
the \it{ESTABLISHED} state.
There is a state in which a handshake is in the process but connection has not
yet been ESTABLISHED. These connections are referred to as embryonic
(half-formed) sessions. That is precisely what happens when an attacker sends
many SYNs but stops there and leaves the connection hanging.
One particularly sly method aimed at causing as much network congestion near/at
the victim as possible is setting a private IP address (these are unroutable,
2021-05-17 21:10:50 +02:00
or rather, \it{should not be routed} over public Internet) or an address from
2021-05-14 06:44:16 +02:00
deallocated space as the source IP address. For the sake of the argument
suppose it is an address from deallocated space, what then ends up happening is
the server responds with a [SYN, ACK] and since no response comes from an address
that's not currently allocated to a customer (no response \it{can} come because
nobody is using it), TCP just assumes that the packets lost on the
way and attempts packet \it{retransmission} \cite{rfc6298}.
Obviously, this cannot yield a successful result so in the end the server just
2021-05-14 06:44:16 +02:00
added onto the already congesting network.
2021-05-14 06:44:16 +02:00
Current recommended practice as per RFC 3704 is to enable strict mode when
possible to prevent IP spoofing from DDoS attacks. If asymmetric routing or
other kind of complicated routing is used, then loose mode is recommended
\cite{rfc3704multihomed}.
That way the spoofed traffic never leaves the source network (responsibility of
2021-05-14 06:44:16 +02:00
the transit provider/ISP) and does not aggregate on a single host's interface.
For this to be a reality the adoption rate of the subject RFC recommendations
2021-05-17 21:10:50 +02:00
would need to see a proper increase.
As is true for anything, if countermeasures are set up improperly, legitimate
traffic could end up being blocked as a result.
\n{2}{Amplified Reflection Attack}
2021-05-14 06:44:16 +02:00
The name suggests this type of attack is based on two concepts: amplification and
reflection. The amplification part pertains the fact that certain protocols
answer even a relatively small query with a sizable response.
The reflection part is usually taking advantage of session-less protocols.
One such protocol is UDP with session-less meaning that hosts are not required
to first establish a \it{session} to communicate, a response is simply sent
back to the address that the packet arrives from (source address).
2021-05-14 06:44:16 +02:00
Except for the fact that if a malicious player is not interested in
communication but only wants to cause harm, a packet's source address does not
necessarily have to, in fact \it{cannot} (from an attacker's point of view)
correspond to the source address of their machine.
Since overwriting fields of the packet header (where the information important
to routing reside) is trivial and there's nothing easier than supplying a UDP
request with (either a bogus but more commonly) a victim IP address as the
source address instead of our own that's present there by default.
The response is then returned \it{back} - not to the actual sender, but simply
according to the source address.\\
Since UDP has no concept of a \it{connection} or any verification mechanism,
the response arrives at the door of the victim that has never asked for it - in
the worst case an unsolicited pile of them.
This is why the three-way handshake is used with TCP, which was developed later
than UDP, as it reduces the possibility of false connections.
2021-05-14 06:44:16 +02:00
The goal of the attacker is then clear: get the largest possible response and
have it delivered to the victim (in good faith of the server even).
Spoofing the source address is done with the purpose of evading detection as a
blocking or rate-limiting mechanism at the destination would likely identify any
above-threshold-number requests coming from a single IP and ban them, thus
decreasing the impact of the attack when the intent was to achieve congestion
at the designated destination - the victim.
A perfect example for how bad this can get is unpatched or misconfigured
2021-05-14 06:44:16 +02:00
\texttt{memcached} software, that is very commonly being used as e.g. a database
caching system and has an option to listen on UDP port.
2021-05-14 06:44:16 +02:00
Cloudflare say they have witnessed amplification factors up to 51200 times
\cite{cfmemcached}.
As has already been mentioned in ~\ref{synfloodattack}, this entire suite of issues
could be if not entirely prevented then largely mitigated if the very sound
recommendations of RFC 3704 gained greater adoption among ISPs.
Until then, brace yourselves for the next assault.
\n{2}{Slowloris} \label{slowloris}
The principle of this attack is to first open as many connections as
possible, aiming to fill the capacity of the server, and then keep them
open for as long as possible by sending periodic keep-alive packets.\\
This attack works at the application layer but the principle can easily be
reapplied elsewhere.
\n{2}{BGP hijacking}
BGP is an inter-Autonomous System routing protocol, whose primary function is
to exchange network reachability information with other BGP systems.
Furthermore, this network reachability information "includes information on the
list of Autonomous Systems (ASes) that reachability information traverses.
This information is sufficient for constructing a graph of AS connectivity for
this reachability, from which routing loops may be pruned and, at the AS level,
2021-05-14 06:44:16 +02:00
some policy decisions may be enforced. This information is sufficient for
constructing a graph of AS connectivity for this reachability, from which
routing loops may be pruned and, at the AS level, some policy decisions may be
enforced." \cite{rfc4271bgp4}.
BGP hijacking, in some places spoken of as prefix hijacking, route hijacking or
IP hijacking is a result of a intentional or unintentional misbehavior in
2021-05-14 06:44:16 +02:00
which a malicious or misconfigured BGP router originates a route to an IP
prefix it does not own and Zhang et al. find it is becoming an increasingly
serious security problem in the Internet \cite{Zhang2007PracticalDA}.
\n{2}{Low-rate DoS on BGP}
As shown by Zhang et al. in their "Low-Rate TCP-Targeted DoS Attack Disrupts
Internet Routing" paper, BGP itself is prone to a variation of slowloris due to
the fact that it runs over TCP for reliability. Importantly, this is a
low-bandwidth attack and a more difficult one to detect because of that.
Beyond the attack's ability to further slow down the already slow BGP
2021-05-14 06:44:16 +02:00
convergence process during route changes, it can cause a BGP session reset. For
the BGP session to be reset, the induced congestion by attack traffic needs to
2021-05-14 06:44:16 +02:00
last sufficiently long to cause the BGP Hold Timer to expire \cite{Zhang2007LowRateTD}.
On top of all that, this attack is especially hideous in that it can be launched
remotely from end hosts without access to routers or the ability to send
traffic directly to them.
\n{1}{Attack tools} \label{sec:attack-tools}
Believe it or not there actually exists a DDoS attack tools topic on
GitHub
2021-05-14 06:44:16 +02:00
\url{https://github.com/topics/ddos-attack-tools?o=desc\&s=stars}.
\n{2}{HOIC}
LOIC successor HTTP flooding High Orbit Ion Cannon, affectionately
2021-05-14 06:44:16 +02:00
referred to as 'HOIC' is a \emph{free software}\footnotemark tool which enables
one to stress-test the robustness of their infrastructure by applying
2021-05-14 06:44:16 +02:00
enormous pressure on the designated target in form of high number of requests.
It operates with HTTP and users are able to send 'GET' or 'POST' requests to as
many as 256 sites simultaneously.
While it is relatively easily defeated by a WAF (see \ref{waf}), the
possibility to target many sites at once makes it possible for users to
coordinate the attack, consequently making detection and mitigation efforts
more difficult.
\footnotetext{free as both in freedom and free beer}
\n{2}{slowloris.py}
2021-05-14 06:44:16 +02:00
\texttt{slowloris.py} is a python script available
from~\url{github.com/gkbrk/slowloris} that is able to perform a slowloris
attack. It seeks to extinguish file descriptors needed for opening new
2021-05-14 06:44:16 +02:00
connections on the server and then keeping the connections for as long as it can.\\
Legitimate requests cannot be served as a result, since there is no way for the
2021-05-14 06:44:16 +02:00
server to facilitate them until resources bound by bogus requests are freed,
i.e. the attack ceases to be.
2021-05-14 06:44:16 +02:00
\n{2}{Metasploit Framework}
Metasploit is a penetration testing framework with an open source community
version and a commercial version (Metasploit Unleashed) available. It enables
security researchers to automate workflows of probing vulnerable services or
devices via use of so called modules - smaller programs with definable inputs
that perform prefined actions. Modules are often community-contributed and one
can even write a module ourselves.a The SYN-flooding funtionality has been
implemented - \texttt{aux/synflood} an auxiliary module. Auxiliary modules do
not execute payloads and perform arbitrary actions that may not be related to
2021-05-14 06:44:16 +02:00
exploitation, such as scanning, fuzzing and denial of service attacks
\cite{metasploit}.
\n{2}{Web browser}
Depending on our point of view (more fittingly, our scaling
capabilities), sometimes all that is needed to cause a denial of
service is tons of people behind a web browser.\\
Numerous requests quickly overload a small server, eventually causing it
respond so slowly that the impact is indistinguishable from a DoS attack.
That is because in principle \it{a DoS attack} is practically the same
thing as discussed above, the only difference is the malicious intent,
imperceivable to a machine.
\n{1}{Mitigation methods} \label{sec:mitigation-methods}
Drastic times require drastic measures and since a DDoS attacks coming
at us practically every other month classify as
2021-05-14 06:44:16 +02:00
\it{drastic} quite easily, we're forced to act accordingly
\cite{akamai2021ddos}.
Still, it is more reasonable to prepare than to improvise, therefore the
following write-up mentions of commonly used mitigation methods at different
levels, from a hobbyist server to an e-commerce service to an ISP. The list is
inconclusive and of course if reading this at a later date, always cross-check
with the current best practices at the time.
\n{2}{Blackhole routing (black-holing, null routing)}
2021-05-14 06:44:16 +02:00
Black-holing is a technique that instructs routers that traffic for a specific
prefix is to be routed to the null interface, i.e. be dropped and is used to
cut attack traffic before it reaches the destination AS.\\
Assuming the router is properly configured to direct RFC 1918 destined traffic
to a null interface, traffic destined to the attacked network gets dropped,
making the attacked network unreachable to the attacker and everyone else.
Matter of factly, we actually conclude the DoS for the attacker
ourselves.\cite{rfc1918}\cite{rfc3882}
2021-05-14 06:44:16 +02:00
In case of a DDoS, the traffic is likely to come from all over the world
\cite{akamai2020ddosretrospect}.
The idea here is to announce to our upstream (ingress provider) that supports RTBH
2021-05-14 06:44:16 +02:00
(remotely-triggered black hole) signalling (critical) that we do not need any
traffic for the victim IP anymore. They would then propagate the announcement
further and in no time we'd stop seeing malicious traffic coming to a victim IP
in our network.
2021-05-14 06:44:16 +02:00
In fact, we would not see any traffic coming to the victim anymore, because we
just broadcast a message that we do not wish to receive traffic for it.
For the entire time we're announcing it, the victim host stays unreachable.
We should make sure to announce the smallest possible prefix to minimise the
collateral damage. Generally, a /21 or /22 prefix is assigned to an AS (the
average prefix per AS being 22.7866 as of 11 May 2021 \cite{prefixavgsize})
announcing a black hole for such a large space would likely cause more damage
than the attack itself.
To reduce BGP overhead, prefixes are usually announced aggregated, with the
exception of "a~situation", such as when we wish to only stop receiving traffic
for one IP address. Smallest possible accepted prefix size tends to be /24
2021-05-14 06:44:16 +02:00
(which is still a lot) with average prefix size updated being 23.11
\cite{prefixavgupdatedsize}, however, some upstream providers might even
support a /32 in case of emergency, effectively dropping traffic only for the
victim.
When an attack hits, all we have to do is:
\begin{enumerate}
\item deaggregate prefixes
2021-05-14 06:44:16 +02:00
\item withdraw hit prefixes.
\end{enumerate}
2021-05-14 06:44:16 +02:00
In case our upstream provider did not support RTBH and we could not lose them
(e.g. the only one around), we could still make use of Team Cymru's new
BGP-based solution that distributes routes to participating networks using only
vetted information about current and ongoing unwanted traffic - the \b{Unwanted
Traffic Removal Service} (UTRS). It is a free community service, currently only
available to operators who have an existing ASN assigned and publicly announce
one or more netblocks with their own originating ASN into the public Internet
BGP routing tables.
If only there was a way to just shut down the bad traffic but keep the good one
2021-05-14 06:44:16 +02:00
flowing\footnotemark!
\footnotetext{other than scrubbing}
Behold, this is what \it{selective black-holing} actually is. Some upstream
providers define multiple different blackhole communities each followed by a
predefined action on the upstream. One is able to announce to these communities
as needed.
Assume we would announce to a community that would in response announce the
2021-05-17 21:10:50 +02:00
blackhole to Internet exchanges in, say, North America and Asia and but allow
traffic coming from Europe, would be a perfect example of selective
black-holing.
This causes two things to happen. First, our customer's IP is still reachable
from our local area (e.g. Europe) and since our fictitious customer mostly
serves European customers that's fine. Second, outside of the prefedined radius
(Europe in this exercise) any traffic destined for our network (of which the
victim IP is a part of) is immediately dropped at the remote IXPs, long before
it ever comes anywhere near our geographical area, let alone our network.
I believe this approach is superior to indiscriminate black-holing and, given
2021-05-14 06:44:16 +02:00
it is reasonably automated and quick to respond, in combination with other
mitigation methods it can provide a viable protection for the network.
\n{2}{Sinkholing}
Moving on, this method works by diverting only malicious traffic away from its
target, usually using a predefined list of IP addresses known to be part of
malicious activities to identify DDoS traffic. False positives can occur more
rarely and collateral damage is lesser than with black-holing but since botnet
IPs can be also used by legitimate users this is still prone to false
positives.
Additionally, sinkholing as such is ineffective against IP
spoofing, which is a common feature in network layer attacks.
\n{2}{Scrubbing}
An improvement on arbitrary full-blown sinkholing, during the scrubbing process
all ingress traffic is routed through a security service, which can be
performed in-house or can even be outsourced. Malicious network
packets are identified based on their header content, size, type, point of
origin, etc. using heuristics or just simple rules. The challenge is to perform
scrubbing at an inline rate without impacting legitimate users.
If outsourced, the scrubber service has the bandwidth capacity (either
2021-05-14 06:44:16 +02:00
on-demand or permanently) to take the hit that we do not have. There are at
least two ways to go about this - the BGP and the DNS way, we will cover the
BGP one. Once an attack is identified, we stop announcing the prefix that is
currently being hit, contact our scrubbing provider (usually
automatically/programatically) to start announcing the subject prefix,
receiving all its traffic (including the attack traffic), the scrubbing service
does the cleaning and sends us back the clean traffic \cite{akamaiddosdefence}.
2021-05-14 06:44:16 +02:00
When performing the scrubbing in-house, we have to clean the traffic on our own
appliance that has to have sufficient bandwidth (usually on par with upstream).
A poor man's scrubber:
\begin{itemize}
\item utilizing hardware accelerated ACLs on switches,
\item switches can do simple filtering at \it{inline rate} (ASICs)
\item this can be effective when attack protocol is easily distinguishable from
real traffic
\item hit by NTP/DNS/SNMP/SSDP amplification attack
\end{itemize}
We should be performing network analysis and once higher rates of packets with
source ports of protocols known to be misused for DoS/DDoS start arriving to
our network (such as 123 or 53), we start signalling to our upstream
providers, since they can probably handle it better than us and have as much
interest in doing so as us (or should).
Volumetric attacks sending traffic in smaller packet sizes will, however,
still result in higher CPU utilization, especially on non-dedicated networking
equipment.
One thing we should do no matter whether we are currently suffering an attack
(and scrubbing it ourselves) is to rule out and drop and never receive traffic
appearing to come from \it{our own network}, since such traffic could not exist
naturally and is obviously spoofed.
Team Cymru has got now a long tradition of maintaining bogons lists called the
2021-05-14 06:44:16 +02:00
\textbf{Bogon Reference}. Bogon prefixes are routes that should never appear in the
Internet routing table. A packet with an address from a bogon range should
not be routed over the public Internet. These ranges are commonly found as the
source addresses in DoS/DDoS attacks.\\
2021-05-17 21:10:50 +02:00
Bogons are netblocks that have not been allocated to a regional Internet
registry (RIR) by the Internet Assigned Numbers Authority (IANA) and Martian packets
2021-05-14 06:44:16 +02:00
(private and reserved addresses defined by RFC 1918, RFC 5735, and RFC 6598
\cite{rfc1918}, \cite{rfc5735}, \cite{rfc6598}).
To get help with bogon ingress and egress filtering, we should set up automated
obtaining of updated and curated bogon lists via HTTP, BGP, RIRs and DNS from
2021-05-14 06:44:16 +02:00
Team Cymru \cite{teamcymru}.
2021-05-14 06:44:16 +02:00
In case we have our own ASN, are connected directly at an IXP, have no
RTBH support upstream and basically have no other choice, we just need
to find out who is sending the malicious traffic and, if possible, drop the
session and receive traffic from other peers.
\n{2}{IP masking} \label{ipmasking}
2021-05-14 06:44:16 +02:00
This is technique is widely used (e.g. CloudFlare flagship service), relying
solely on not divulging sensitive information - in this case server IP - to
attackers and the capacity of the \it{fronting} service to withstand the attack
due to having access to more badwidth than the attacker can produce. All traffic
- including potentially harmful traffic - flows through what is basically a giant
proxy. However, before declaring it a net win for us, it is important to
acknowledge that it also comes with heavy privacy implications, as now some
other service performs TLS termination in our behalf and \textbf{sees
everything} (that was encrypted only \emph{in transit} and is not additionally
encrypted) that \emph{anyone} sends us, before finally forwarding it back.
\n{2}{WAF} \label{waf}
WAF - \it{Web Application Firewall} - is an appliance used to protect
(as name suggests) web applications. In this day and age, this is
2021-05-14 06:44:16 +02:00
especially necessary and enables system administrators to craft protection
logic in one place and shield potentially vulnerable applications. This method
works on the application layer of the OSI model and is commonly deployed
2021-05-14 06:44:16 +02:00
as part of a web proxy or a module of a web proxy, which means network layer
attacks cannot be handled in this way. While not negligible, as always,
2021-05-14 06:44:16 +02:00
it is crucial to not have any assumptions and know exactly what
\it{layer} of protection using of WAF brings.
Generally or at least as per CBP (current best practices), applications are not
deployed with ports exposed directly to the Internet. A sane approach of having
access to resources \it{proxied} yields multiple possibilities in terms of
authentication/authorization and protection scenarios and also several ways to
more effectively use resources available. For one, where any web content
2021-05-14 06:44:16 +02:00
\it{caching} is required, it is easily achieved with a \it{caching} proxy
server. It commonly also enables specifying custom access policies.
2021-05-14 06:44:16 +02:00
There are also hosted (cloud) WAF offerings, however, they come with exactly
the same privacy implications as IP masking solutions (see \ref{ipmasking}).
\n{2}{Rate-limiting}
2021-05-14 06:44:16 +02:00
As a general precaution, it is sane to limit number of connections a client is
able to make in a predefined amount of time (based on the requirements of the
service). The same applies to a limit on how many connections a client can have
open simultaneously, which can even prevent Slowloris (see \ref{slowloris}).
2021-05-14 06:44:16 +02:00
Rate-limiting is usually set either on a proxy or a WAF, but some form of
rate-limiting can even be built into an app.
A well known rate-limiting pluggable solution that can be used with SSHd, HTTP
or multitude of other endpoints is \texttt{Fail2Ban}.
2021-05-14 06:44:16 +02:00
\n{2}{Decreased-TIME\_WAIT connection closing} \label{decreased-timewait}
This can help withstand a situation when conntrack table fills up and
the server refuses to accept any new connections. There is absolutely no
reason to keep connections in the conntrack table long after they become
inactive. The Linux kernel's NetFilter actually has a scrubbing mechanism, that
is supposed to be getting the conntrack table rid of the timed-out entries.
Except practice shows they can linger for much longer than necessary.
2021-05-14 06:44:16 +02:00
When dealing with massive amounts of traffic it is very reasonable not only to
increase the size of the conntrack table (memory trade-off), which is the
generally recommended solution, but also to decrease the TIME\_WAIT timeout to
force-evict connections that have stopped sending data.
It is also an easy way to mitigate slowloris (see \ref{slowloris}).
More on the workings of conntrack in \ref{netfilter}
2021-05-14 06:44:16 +02:00
Nginx is a widely used proxy. It uses two FDs (file descriptors) for each
connection. The limit of max open FDs can indeed be increased easily,
howerever, we might still just be delaying the inevitable (FD exhaustion)
and inefficiently wasting precious compute resources needed when an attack
comes. If Nginx is unable to allocate FDs necessary to track a connection, the
connection attempt will fail. By resetting connections that timed out we
prevent such a situation from occurring easily. In Nginx this is set with a
single line: \texttt{reset\_timedout\_connection on;}
\n{1}{Mitigation tools} \label{sec:mitigation-tools}
No tools are going to remedy for a bad design decision and that applies
2021-05-17 21:10:50 +02:00
equally to physical and Internet engineering.
\n{2}{Firewall}
No matter the specific implementation, it is presumably safe to say that
any firewall is better than no firewall.
There are two main types of firewalls:
\begin{itemize}
2021-05-14 06:44:16 +02:00
\item software,
\item appliance (hardware-accelerated).
\end{itemize}
A software firewall is just another program running on the operating
system, apart from the fact that it is typically running with system-level
privileges. It can be run on a general-purpose computer. In fact most of the
consumer-grade operating systems nowadays incorporate or by default enable a
firewall solution.
In contrast, an appliance firewall is a dedicated piece of hardware
purpose-build specifically for the sole role of behaving as a firewall and
is typically running a custom and very minimal operating system and no
2021-05-14 06:44:16 +02:00
userspace programs. Usually the system does not have a userspace, since it is
vendored to run as an appliance.
\n{3}{Software firewall}
Solutions available as software firewalls are typically specific to a given
operating system.
Usually, there exist several tools that enable communication with the core
implementing the logic, commonly by a way of embedding deeply in the networking
stack of the OS or utilizing kernel APIs. In Linux distributions, the Linux
kernel is the one that sees all. Each packet arriving at the network interface
is inspected by the kernel and a decision is made regarding it.
Historically, \texttt{ipset} and later \texttt{iptables} used to be the de-facto
standard, however, a more recent successor emerged quite some time ago
and is replacing (in modern distributions has replaced, although
backward compatibility has been preserved) the former two - the
\texttt{nftables} tool.
\n{3}{Netfilter} \label{netfilter}
2021-05-17 21:10:50 +02:00
The Linux kernel subsystem named \texttt{Netfilter} is part of the Internet
protocol stack of the kernel and is responsible for packet manipulation and
filtering \cite{Boye2012NetfilterCT}. The packet filtering and classification
rules framework frontend tools \texttt{iptables} as well as the newer
\texttt{nftables} can be interacted with via a shell utility and since they
2021-05-14 06:44:16 +02:00
also expose APIs of their own, it is common that they have graphical frontends
as additional convenience as well, most notably \texttt{firewalld}, which can
be used in conjunction with both of them.
Although newer versions of the Linux kernel support both \texttt{iptables} and
\texttt{nftables} just the same, only one of them can be used at a time. This
can be arbitrarily changed at runtime, a reboot is not necessary since they are
userspace tools) and interact with the kernel using \it{loadable kernel
modules}.
Part of the \texttt{Netfilter} framework responsible for connection tracking is
fittingly named Conntrack. Connection, or a \it{flow} is a tuple defined by a
unique combination of source address, destination address, source port,
destination port a and the transport protocol used.
2021-05-14 06:44:16 +02:00
Conntrack keeps track of the flows in a special fixed-size
(tunable\footnotemark) in-kernel
hash table structure with a fixed upper limit.
2021-05-14 06:44:16 +02:00
\footnotetext{via \texttt{net.netfilter.nf\_conntrack\_buckets}}
On Linux devices functioning as router devices (especially when you add NAT to
the mix) a common issue is the depletion of space in the conntrack table. Once
the maximum number of connection is reached, Linux simply logs an error message
\texttt{nf\_conntrack: table full, dropping packet} to the kernel log and
"all further new connection requests are dropped until the table is below the
maximum limit again." \cite{Westphal2017CT}. That, as Westphal further notes,
is indeed very unfortunate, especially in DoS scenarios.
Unless the router also functions as a NAT, this can be remedied in two ways:
decreasing the timeout until an established connection is closed and decreasing
the timeout until an inactive connection in the TIME\_WAIT state is evicted from
the conntrack table. By default, the TIME\_WAIT timeout is several hours long
and leaves the router vulnerable to packet floods or Slowloris.
Netfilter is here to help again with conntrack treating entries that have not
2021-05-14 06:44:16 +02:00
(yet) seen two-way communication specially – they can be evicted early if
the connection tracking table is full. In case insertion of a new entry fails
because the table is full, "...the kernel searches the next 8 adjacent buckets
of the hash slot where the new connection was supposed to be inserted at for an
entry that has not seen a reply. If one is found, it is discarded and the new
connection entry is allocated."\cite{Westphal2017CT}. Randomised source address
in TCP SYN floods becomes a non-issue because now most entries can be
early-evicted because the TCP connection tracker sets the "assured" flag only
once the three-way handshake has completed.
In case of UDP, the assured flag is set once a packet arrives after the
connection has already seen at least one packet in the reply direction, that is
the request/response traffic does not have the assured bit set and can
therefore be early-dropped at any time.
\n{2}{FastNetMon DDoS Mitigation toolkit}
2021-05-14 06:44:16 +02:00
Originally created by Pavel Odintsov, this program can serve as a helper on top
of analysis and metric collection tools, evaluate data and trigger configurable
2021-05-17 21:10:50 +02:00
mitigation reactions \cite{fastnetmonorig}, \cite{fastnetmonfork},
\cite{fastnetmonng}.
FastNetMon can run on most popular architectures an several different
general-purpose and specialised platforms such as Linux distributions, VyOS,
FreeBSD or Mikrotik devices. Most of the program is written in C++ but it uses
many C libraries for additional functionality and also Perl install scripts. It
has got a command line client and an API and analysis server. Its detection
engine is able to identify many types of flood attacks, fragmentation and
amplification attacks. The metrics can be exported into an InfluxDB engine for
visualization. It is capable of interacting directly with BGP-enabled equipment
and even supports Flow Spec.
2021-05-07 23:17:24 +02:00
% ============================================================================ %
\part{Practical part}
\n{1}{Infrastructure Set-Up Description} \label{sec:infrastructure-set-up-description}
2021-05-14 06:44:16 +02:00
The testing was performed in a virtual lab comprised of five virtual machines
(VMs) running on a KVM-enabled Fedora 34. Since the expectation was to
frequently tweak various system settings of the guests (VMs) as part of the
verification process, we decided to take the \emph{infrastructure as code}
approach. Every piece of infrastructure - down to the details of how many
virtual CPUs are allocated to a host, what is the disk size and the filesystem,
etc. - is declared as code, can be versioned and used to provision resources.
The industry standard tool \texttt{Terraform} was chosen due to a broad support
of infrastructure and providers, great documentation, large user base and the
tool being open source.
For bootstrapping, \texttt{cloud-init} has been used mainly because of the fact
that it can integrate with terraform quite smoothly, works on many Linux
distributions and allows us to pre-setup things like copy over SSH pubkeys so
that a secure connection can be established right after first boot, set VM
hostname, locale, timezone, add users/groups, install packages, run commands
and even create arbitrary files, such as program configurations.
The disk sizes of the VMs were determined by the size of their base image.
The VM naming convention is specified as follows: a prefix \texttt{r\_} for
routers and \texttt{h\_} for other hosts, in our case the attacker, victim and
2021-05-14 06:44:16 +02:00
defender machines.
\n{2}{VM specifications}
\tab{VM specifications}{tab:vmspecifications}{0.75}{ |c||rrrrc| }{
\hline
\bf{VM name} & \bf{vCPU(s)} & \bf{RAM} & \bf{disk space} & \bf{net ifaces} &
\bf{operating system} \\
\hline\hline
r\_upstream & 1 & 768MB & 4.3GB & {outer,DMZ} & Fedora 33 \\
\hline
r\_edge& 1 & 768MB & 4.3GB & {DMZ,inner} & Fedora 33 \\
\hline
h\_victim & 1 & 768MB & 11GB & {inner} & CentOS 8 \\
\hline
h\_attacker & 1 & 1GB & 5.37GB & {outer} & Fedora 34 \\
\hline
h\_defender & 1 & 1GB & 5.37GB & {DMZ} & Fedora 34 \\
\hline
}
The inner (our edge) and the upstream (our transit provider) routers are
each part of different \it{AS}. They are directly connected and communicate
using BGP. The outer router and the inner router are BGP peers.
We assume our upstream provider supports RTBH signalling.
In this scenario the attacker is directly connected to our UPSTREAM router,
and while in reality there would probably be a greater distance between us and
them, this is fine for our simulation purposes, since malicious traffic will be
cut before it reaches us.
2021-05-14 06:44:16 +02:00
If our upstream provider did not support RTBH signalling, in case we were
attacked we could still use a scrubbing service but it is preferred that such a
provider is picked that has the RTBH capabilities.
2021-05-17 21:10:50 +02:00
The cunning plan is to watch for traffic anomalies, when the attack comes
detect it as soon as possible, trigger a reaction (announce a black hole), wait
for some time, withdraw the black hole if the attack is gone and reintroduce it
if it is still going on. Automatically, of course.
2021-05-17 21:10:50 +02:00
\begin{figure}[!hbt]
\centering
\begin{tikzpicture}
\fill[even odd rule,gray!30] circle (2.3) circle (2.2);
\arcarrow{2}{2.25}{2.5}{165}{200}{5}{red!50,
draw = red!50!black, very thick}{attack}
\arcarrow{2}{2.25}{2.5}{210}{260}{5}{blue!50,
draw = blue!50!black, very thick}{detection}
\arcarrow{2}{2.25}{2.5}{270}{320}{5}{green!50,
draw = green!50!black, very thick}{blackhole}
\arcarrow{2}{2.25}{2.5}{330}{460}{5}{blue!50,
draw = blue!50!black, very thick}{wait {\&} withdraw blackhole}
\arcarrow{2}{2.25}{2.5}{470}{515}{5}{blue!50,
draw = blue!50!black, very thick}{analysis}
\end{tikzpicture}
\caption{The Cunning Plan} \label{fig:cunning-plan}
\end{figure}
2021-05-17 21:10:50 +02:00
Initially, two approaches for setting up infrastructure were considered.
While both proposed to use KVM and \texttt{Terraform} with \texttt{libvirt}
provider \cite{libvirt-tf-provider}, the first one planned to continue
configuring the VMs with \texttt{cloud-init}, which is compatible with any
GNU/Linux distribution. The finishing touches would be done using
\texttt{ansible}. The second one, on the other hand, considered a newer
technology - Fedora CoreOS, which uses a different paradigm where hosts are
only customised on initial boot via \texttt{ignition} (although he
configuration format used is called \it{Butane}, which uses YAML and is then
transpiled to Ignition's JSON) and ran as configured, by default without a
separate install disk. This provides the much needed system immutability during
runtime but also makes it more difficult to configure. Configuration changes
\it{can} be done during runtime, however, the CoreOS provisioning philosophy
teaches that there is no need to deploy once and endlessly configure (by hand
or e.g. with \texttt{ansible}) when boot times are short (Ignition runs before
the userspace begins to boot) and the system is as minimal as possible
(container-like).
2021-05-17 21:10:50 +02:00
\begin{description}
\item[approach 0:]\
\begin{itemize}
\item KVM
\item \texttt{terraform} with \texttt{libvirt} provider
\item \texttt{cloud-init}
\item \texttt{ansible}
\end{itemize}
\end{description}
2021-05-17 21:10:50 +02:00
\begin{description}
\item[approach 1:]\
\begin{itemize}
\item KVM
\item \texttt{terraform}
\item \texttt{ignition}
\item Fedora CoreOS
\end{itemize}
\end{description}
2021-05-17 21:10:50 +02:00
We decided to go with the approach 0 as we felt it was more appropriate for our
use case, which involves relatively lot of configuration.
\begin{description}
\item[VMs required:]\
\begin{itemize}
\item victim
\item router - inner
\item router - edge
\item attacker
\item defence machine
\end{itemize}
\end{description}
See tab.~\ref{tab:vmspecifications} for details.
2021-05-17 21:10:50 +02:00
To reiterate, simulating multiple physical devices performing different roles
(routing, attacking, playing victim) in our attack-defence/mitigation scenario
has been achieved by creating a test-lab virtual infrastructure.\\
The tried-and-true way, state-of-the-art Linux kernel-native virtualization
solution has been chosen to tackle the hypervisor task - the KVM technology.
Testing has been performed on our personal laptop - Dell Latitude 5480
machine equipped with a ULV dual-core Intel i5 Core 6300U processor with
\texttt{mitigations=off}, 24GB (8+16) of RAM and a 512GB SATA SSD (TLC).
The host operating system from the perspective of VMs was \texttt{Fedora\ 34}.
Both \texttt{updates} and \texttt{updates-testing} repositories have been
enabled, which allowed us to use latest (at the time) stable Linux kernel
Fedora had to offer directly without too much of a hassle, as of the time of
writing in version \texttt{5.11.20}.
2021-05-14 06:44:16 +02:00
File system in use on the host was Btrfs on top of LVM (LUKS+LVM to be
precise) and a Btrfs subvolume has been created specifically for the
2021-05-17 21:10:50 +02:00
libvirt storage pool for better logical isolation (subvolumes are omited during
snapshotting etc.; that way the lab does not end up in system snapshots. Since
all of the system images for our VMs have been downloaded in a QCOW2 format,
the CoW (Copy-on-Write) feature of Btrfs has been turned off for the subject
subvolume, just as recommended in the Brfs Sysadmin Guide \cite{linuxbtrfs}
for improved storage performance (and decreased flash wear).
Notably, the system has also been using the \texttt{nftables} backend of
\texttt{firewalld}, for which, luckily, \texttt{libvirt} was already
2021-05-17 21:10:50 +02:00
prepared and played nice together.
The whole infrastructure code resides in a git repository available at
\url{https://git.dotya.ml/mirre-bt/tf-libvirt}.
\n{2}{Terraform and Cloud-Init}
Thanks to the \texttt{libvirt} provider for Terraform, VMs could easily be
brought up and torn down. Terraform works by tracking state of the resources
that together create the infrastructure, which can be described in numerous ways.
Every resource that is absent needs to be created. Terraform uses a
\emph{plan} of the infrastructure as it wants to \emph{apply} changes to the
current \emph{state}. Every VM or a network, every backing storage is a
resource that is managed for us, which is very convenient especially for
testing.
The initial boot configuration has been performed by Cloud-Init
\cite{cloudinit}. It enabled us to configure users with SSH public keys to
secure further connections, install packages and create arbitrary files.
Scenario-role-specific configuration was made really easy thanks to this tool.
\n{2}{Ansible}
What has not been prepared by either Terraform or Cloud-Init was left to
Ansible. It was used to deploy configurations of GoBGPd and FastNetMon, as well
as to install and enable services.
\n{1}{Mitigation tools set-up} \label{sec:mitigation-tools-set-up}
2021-05-17 21:10:50 +02:00
\n{2}{FastNetMon}
An open-source DDOS mitigation toolkit named \texttt{fastnetmon} was
2021-05-14 06:44:16 +02:00
picked to serve as an attack detection tool. It supports analysing traffic from
multiple different exporter types, including Netflow (v5 and v9), sFlow and port
2021-05-14 06:44:16 +02:00
mirrors.
2021-05-17 21:10:50 +02:00
We tried the installer from projects website, however, the
installation did not appear to succeed. Therefore, we decided to build it from
sources. That is when we uncovered some curious information.
The project's master branch on GitHub, where it is hosted, appears to have been
modified on 22 June 2016 \cite{fnm-wayback} \cite{fastnetmonorig}. Since we
knew about some propagation activities \cite{fnm-freebsd-wayback} and we found
the repo as "updated" on 17 Feb 2021 (that is what GitHub shows when e.g. a
description is updated) \cite{fnm-search-wayback} at the same time as we saw
the last commit pushed with a 2016 date, we went digging over the project's
forks to find out what happened to the project. They perfectly preserve it,
from a point in time when the "fork" (essentially a clone) was created, up
until the latest updates that have either been integrated from upstream or
changed by the owner of the fork. While some forks have been abandoned a long
time ago, several more showed the same tree with the exact same commits,
referring to a common history. Furthermore, the one thing giving away what
apparently happened (not why) with almost certainty is the Pull Requests tab.
It showed some 164 Closed PRs, some of them \it{merged} as late as 23 Dec 2020
\cite{fnm-pulls-wayback}. That is, the history \it{has} indeed been overwritten
and we have no information as to why, but that was also a reason why we chose
to use one of the latest updated forks \cite{fnm-fork-wayback} \cite{fastnetmonfork}
as a base for our work \cite{fastnetmonng} instead of the original upstream.
The claim that the history has been overwritten is also supported by an earlier
grab (snapshot) of the project by the Internet Archive's awesome \emph{Wayback
Machine} \cite{fnm-early-wayback}, the rest have been triggered by us to help
support the arguments in case anything changed.
Ad setup itself, several changes had to be made to the project to make it even
compile on recent hosts (Fedora 33/34, Arch Linux). A Drone CI build job has
been set up to help make sure nothing breaks with our changes. Fastnetmon needs
quite a lot of dependencies for its full functionality and the way to go about
it in the project was using custom versions of the third party libraries to
link against, all downloaded using the Perl install script mentioned earlier.
While that might seem like a sound idea, we feel in the long run it is always
better to use a reasonably recent distribution with sizable repositories and
active community that, combined, are able to provide for most developer needs.
The only major overhaul that had to be done was patching the
\texttt{CMakeLists.txt} file to instruct CMake/Make to use system locations to
look for headers and (dynamic) libraries (\texttt{.so} files) instead of
download location from the install script. Further, to accomodate using newer
version of \texttt{nDPI} (Open Source Deep Packet Inspection Software Toolkit)
that aids FastNetMon with traffic sorting (if enabled), an interface and
\texttt{fast\_dpi.h} header had to be updated. The history of all changes can be
found in the repository at \url{https://git.dotya.ml/wanderer/fastnetmon-ng}.
For building and deployment to the defender host, an Ansible role has been
created. It makes sure that FastNetMon is built correctly, installed and its
\texttt{systemd} service is \it{enabled} (at boot time) and started.
\n{2}{GoBGPd}
We attempted to peer the two router VMs, however, we were not able to pin down
the proper configuration that would allow us to define a community tied to a
black-holing action.
\n{2}{Netflow}
FastNetMon supports collecting Netflow metrics, therefore the edge router has
been configured to listen on the upstream interface and to send traffic
information to this FastNetMon's collector. The ansible role
\url{github.com/juju4/ansible-fprobe} has been edited and used to deploy
the configuration to the edge router.
\n{1}{Attack tools set-up} \label{sec:attack-tools-set-up}
When considering the way to simulate an attack locally, we weren't
primarily looking for a tool, which would enable a decentralised (the
first "D" of DDOS) attack, instead the objective was mainly to congest
the weakest link, which would happen to live inside our network (that's
why we're concerned in the first place).
\n{2}{slowlorispy}
2021-05-17 21:10:50 +02:00
With this python script there are a couple of flags to tweak the intended
behaviour. Flag \texttt{-p} sets the port we want to target,
\texttt{--sleeptime} allows us to rate-limit the tool so that we open as many
connections as possible without getting banned. The \texttt{-s} flag specifies
the number of sockets we want to open.
\n{2}{iperf3}
This tool works in server-client mode. To receive traffic, it \texttt{runs in
server mode} and listens on port 5201 by default, although both the traffic
direction and port and multitude of other parameters can be configured.
To listen on the victim server, we simply ran \texttt{iperf3 -s -P 8} to
receive 8 parallel streams.
Then on the client - the attacker - we entered \texttt{iperf3 -c \{server
ip\} -P 8} to send 8 parallel streams to the victim. This filled the available
link bandwidth but, unfortunately, we haven not observed FastNetMon ban action
even though the threshold has been crossed.
\n{1}{Performing an attack} \label{sec:performing-an-attack}
As mentioned previously, the attack was planned to be performed in the
controlled environment of a virtual lab with practically no natural traffic
occuring, that would generally be present on a reasonably large ISP network and
thus with essentially no load on the network equipment/virtual devices.
However, it should not have had any major impact on the results of the subject
attack and our chosen way to mitigate.
As is the case when using e.g. a selective blackhole technique, the suspect
traffic first has to be identified as highly abnormal/malicious, either by
analysing network metrics over a period of time as collected by sFlow or
Netflow protocols, or by direct packet capture and inspection aided by tools
such as nDPI.
The analysis and detection mechanism in our scenario is left to FastNetMon and
so is the reaction (mitigation) logic.
With the first attack we performed, we basically attempted to naively overflow
the conntrack table of our server host.
2021-05-17 21:10:50 +02:00
We were not able to extinguish connections on the server using
\texttt{slowloris.py}, presumably because the inactive connections were quickly
being closed. While the number of used sockets steadily grew, after about 5000
this tool was not able to open any new connections and the server worked fine.
The key was to set the
\texttt{net.netfilter.nf\_conntrack\_tcp\_timeout\_time\_wait} parameter to a lower
value, such as 10 or 5 and the
\texttt{net.netfilter.nf\_conntrack\_tcp\_timeout\_established} parameter to
somewhere between 300 and 1200 (we chose 600). Both values are seconds and they
are set either in the global \emph{sysctl} file (\texttt{/etc/sysctl.conf} or
they can be placed in an arbitrary complementary file inside the sysctl include
directory at \texttt{/etc/sysctl.d/}. The \emph{TIME\_WAIT} timeout value
affects for how long the connections in the TIME\_WAIT (see
\ref{decreased-timewait}) TCP state are kept before they are completely cleared
and stop consuming resources. The ESTABLISHED timeout merely sets for
how long the \emph{established} connections are kept in the ESTABLISHED state
before processing them further.
2021-05-17 21:10:50 +02:00
As per running FastNetMon attack traffic detection, we have experienced
unexpected issues in the form of FastNetMon incorrectly reporting 0pps for
2021-05-17 21:10:50 +02:00
both outbound and inbound traffic. We were not able to uncover the root cause
of the issue, which might very well even be attributed to a configuration error.
Due to this and the fact that we could not establish BGP peering in our virtual
lab, neither the other attack nor the mitigation could be properly performed.
2021-05-07 23:17:24 +02:00
2021-05-07 23:17:24 +02:00
% ============================================================================ %
\nn{Conclusion}
Recent past has seen an immense increase in the number of DoS attacks of all
kinds. A large part of these attacks flood service resources and ultimately end
up blocking or delaying responses for legitimate user requests. As we touched
on in the beginning, the motivations for these attacks can range all from a
business model with extortion plans to hacktivism to simply somebody choosing a
wrong place to have fun.
2021-05-17 21:10:50 +02:00
The goal of our work was to describe some of the popular types of DoS attacks,
including DDoS, casually used attack methods, techniques and tools most popular
and most widely used among attackers to annoy Internet users, harm businesses
and worry Internet service providers with small to medium capacities. We have
dived a little into the workings of several potential attack vectors, but have
not stayed there.
Further in the theoretical part, we also outlined various mitigation methods
readily available to network operators and end users alike, along with their
scope and reach. Several pros and cons of black-holing, selective black-holing,
scrubbing and rate-limiting were considered in section
\ref{sec:mitigation-methods} - Mitigation methods. Next, we looked at some of
the concrete tools that aid mitigating DoS attacks.
In the practical part, we set out to build a virtual lab using tools like
Libvirt, Terraform, Cloud-Init and Ansible on top of KVM in an automated
manner by applying the \emph{infrastructure as code} principles. The virtual
lab was running on a Fedora 34 host, each virtual machine has been provisioned
using Terraform, pre-configured using Cloud-Init and further configured with
Ansible after the initial configuration has finished.
We explored setting up both the mitigation tools used to protect Internet
networks and tools used by adversaries.
Finally, we have attempted to perform several attacks in our controlled
virtual environment, which has been described in the practical part of this
work. The attempts were partially successful in that our proposed mitigation
methods showed that a certain kind of attack \emph{can} easily be mitigated and
the unhappy consequences averted.\\
Sadly, in the next part of attack simulation we have arrived at unexpected
results, where we were not able to fully simulate the black hole propagation
among ASes due to configuration inconsistencies, presumably on multiple levels.
This, as well as improving and potentially reworking the virtual lab stays as a
challenge for us for the future, and I believe performing these attack
simulations could aid us in better understanding the threats and preparing for
what we \emph{surely are} going to face.
2021-05-07 23:17:24 +02:00
% ============================================================================ %