tl;dr: This vulnerability is quite serious, but it doesn’t affect
the Tor network any more than it affects the rest of the internet.
In particular, the Tor-specific attacks mentioned in the paper will
not work as described.
Recently,
an excellent paper,
entitled “Off-Path TCP Exploits: Global Rate Limit Considered
Dangerous,” was published by Yue Cao, Zhiyun Qian, Zhongjie Wang, Tuan
Dao, Srikanth V. Krishnamurthy, and Lisa M. Marvel at USENIX Security
2016.
The paper describes
the 2012 modifications of RFC5961
to the specification of the Transmission Control Protocol (TCP), the
latter of which is
used to transport roughly 90% of our data
across the internet. The modification was meant to protect against
TCP “blind in-window” attacks.
When a TCP packet is sent, the sender and receiver both know a number,
called the sequence number, that this packet should have. If the
sequence number is not correct, various (complicated, boring) things
may happen, but the important part is that neither the sender nor the
receiver actually believes that this is a valid packet. Instead, they
assume something went wrong somehow, or that an active attacker is
attempting to inject packets into their communication stream. The
term blind simply means that an attacker is unable to directly
observe the packets going between the sender and receiver, but is
usually instead trying to use some side-channel to determine this
information. There’s another part of the TCP specification which
describes windowing — which simply means (did I mention that TCP is
very complicated and boring…) that the sequence number was “correct
enough” — that is, that the sequence number was within the right
range. Specification nerds have long argued over what “correct
enough” means, because apparently they find this topic absolutely
riveting.
The fix to the TCP blind in-window attack was to specify that, under
certain conditions, if the TCP sequence number doesn’t match what was
expected, the receiver of this messed up packet should send a
“challenge” ACK to the sender. Depending on the type of
messed-up-ness, the sender and receiver do one of a number of little
dances with each other, in the special way that TCP is so fond of
doing. When one party sends a challenge ACK, they increment a counter
stored in a global variable which is shared across all TCP
connections. This global variable is reset to 0 once per second, and
it has a maximum value of 100, i.e. no more than 100 challenge ACKs
will be sent per second (for all connections combined). If it wasn’t
obvious from the title of the paper, global variables (across
programming languages, frameworks, and contexts) are commonly known to
be a very bad, no good, horrible idea.
The attack described in the paper is elegant. In terms of its impact,
96.6% of the Alexa top one million
are running Linux kernels, and hence are likely vulnerable. The
previously described global ACK counter enables various side-channels
across TCP connections, meaning that a blind attacker can determine
information about:
- whether Alice and Bob are currently communicating over TCP,
- what the correct TCP sequence number is, and
- what the range of the valid window is.
The attacker does this by sending various crafted packets to the
receiver (i.e. via a side-channel) while the sender is simultaneously
sending valid packets to the receiver. The combined state of the
attacker’s and the sender’s effects upon the global counter, for each
of the above pieces of information, can be determined by whether the
attacker receives either 99 or 100 ACKs in response:
The authors go on to claim the attack can be used to influence a Tor
user’s path through the network. However, the authors seem to have a
misunderstanding regarding how Tor’s path selection algorithm functions.
Their idea is summarised in the last paragraph of §7.2 of the paper (emphasis mine):
In general, we believe that a DoS attack against Tor connections can
have a devastating impact on both the availability of the service as
a whole and the privacy guarantees that it can provide. The default
policy in Tor is that if a connection is down between two relay
nodes, say a middle relay and an exit relay, the middle relay will
pick a different exit relay to establish the next connection. If
an attacker can dictate which connections are down (via reset
attacks), then the attacker can potentially force the use of certain
exit relays.
This is is technically incorrect. The way Tor’s path selection
algorithm actually works — when a connection fails — is that the
client forgets the path of that circuit entirely, and goes back to
step #1 of the algorithm, effectively choosing an entirely new path
without any memory of the path chosen before. Since the selection of
the nodes in this new path (and in fact, any path) is dependent on
their bandwidth weight from the consensus, the client has just as much
probability to select the same exit as they did the last time.
Therefore, to use this attack to “funnel” (as the authors describe)
Tor users into using a particular exit node is of equal difficulty —
in terms of bandwidth of the nodes you would need to run — to
conducting a Sybil attack on the whole network.
Although, with a high-bandwidth exit in a sybil attack, the attacker
has a high (and importantly, to the attack’s benefit, independent)
probability that an exit it controls will get picked by the client.
Whereas with this attack, the bandwidth weighting is likely
detrimental to pulling off the attack, since the exits you’re
injecting RSTs into still have independently high probabilities of
being chosen again. In other words, knocking nodes out of the
network doesn’t do anything to change their probability of being
chosen, it merely makes them unavailable and thus only amounts to a
DoS attack, not a path bias attack.
While the attack on Tor — as stated in the paper — does not work,
the attack itself is impressive, and we encourage these (and other!)
researchers to think of ways the attack might apply to Tor (and other networks).
Their attack does work as a general denial-of-service against not
just Tor relays, but literally against anything running Linux.
The
accepted Linux kernel patch
solves the issue, and does so by randomising the time window that the
global variable applies to.