Denial of Service via Algorithmic Complexity Attacks

Authors: Scott A Crosby and Dan S Wallach

Abstract

We present a new class of low-bandwidth denial of service attacks that exploit algorithmic deficiencies in many common applications' data structures. Frequently used data structures have ``average-case'' expected running time that's far more efficient than the worst case. For example, both binary trees and hash tables can degenerate to linked lists with carefully chosen input. We show how an attacker can effectively compute such input, and we demonstrate attacks against the hash table implementations in two versions of Perl, the Squid web proxy, and the Bro intrusion detection system. Using bandwidth less than a typical dialup modem, we can bring a dedicated Bro server to its knees; after six minutes of carefully chosen packets, our Bro server was dropping as much as 71% of its traffic and consuming all of its CPU. We show how modern universal hashing techniques can yield performance comparable to commonplace hash functions while being provably secure against these attacks.

Accepted and to be presented at Usenix Security 2003, this August

PDF version for downloading or printing

HTML version for online reading

Slides for the USENIX Security 2003 talk and the Regular Expression WIP talk

Related and future work

Related work includes this, by M. D. McIlroy, which we were unaware of when writing the paper which describes how to attack quicksort by a carefully constructed input to make it experience O(n^2) time instead of O(n log n).

Tim Peters (on the python-dev mailing list) has stated "Many programs use (fixed) regexps to parse input, and it's often possible to construct inputs that take horribly long times to match, or, especially, to fail to match. Blowing the hardware stack (provoking a memory fault) is also usually easy. The art of writing robust regexps for use with a backtracking engine is obscure and difficult (see Friedl's "Mastering Regular Expressions" (O'Reilly) for a practical intro to the topic), and regexps are ubiquitous now. . He gives a contrived demonstration of this by using the regular expression: (x+x+)+y and when we search for strings of the form xxx...xxxy which do match and xxx...xxx which don't match, he shows exponential behavior in python's regular expression matcher in the second case.

Related work also includes the recent advisory on the linux route-table cache by Florian Weimer. A patch to fix this and similar issues in other places of the linux networking stack is being worked on by David Miller.

Related work on thwarting port scan detectors via hash collisions was done by Solar Designer in Phrack. Issue 53 July 8, 1998, article 13 of 15

Tim Peters and Solar Designer have claimed that it may be possible to force collisions even in the case of keyed cryptographic or universal hashing, by statistically attempting to identify, due to detectable processing latency, whether two items collide or not. If this attack is possible, it would probably require a signifigant number of probes, perhaps greater than the cost of the attack itself, but would have the potential to attack any hash table that didn't have its key altered regularily. It is unknown if this attack is practical.

M-J. Dominus (mjd-perl-badhash@plover.com) has a demonstration program in 1997 titled 'When Hashes Go Wrong', that shows a worst-case attack of 10,000 collisions in the old Perl 5.6.1 hash function.

Related work also includes Fast Content-Based Packet Handling for Intrusion Detectin by Mike Fisk and George Varghese which has a subsection that discusses 'Algorithmic Performance Attacks' where an attacker may intentionally provide inputs that will knowingly cause the worst-case performance for a network intrusion device.

Related work also includes: Andreas Gal, Christian W. Probst, and Michael Franz. A denial of service attack on the Java bytecode verifier. Technical Report 03-23, University of California, Irvine, School of Information and Computer Science, November 2003.

Demonstration of vulnerability

We include several demonstration files that attack the following applications. We include the version numbers which have been confirmed vulnerable. Other versions are untested.

Each file contains 10,000 strings, Notice the difference between using the correct 'attack' file and the incorrect one. Also notice the behavior on subsets of these files of shorter length.

These attacks are not known to be occuring in the wild. However, they are simple to implement, attack files take under an hour to generate. From the tests on CGI scripts below, the impact is noticable, on the order of a few kilobytes of data to cause a second of CPU time to be lost. Also, at least one independent implementation capable of generating all of the demonstration files below is known to exist. It was emailled to me approximately 12 hours after the first public posting of the vulnerability reports.

Perl 5.6.1

Insert all 10,000 strings into an associative array. See our simple demonstration victim program

On a thread on Perl Monks, user iburrell reports that 10,000 strings sent to a CGI POST request consumed 3 minutes of CPU time. Which perl version they attacked is unknown.

Perl 5.8.0

Insert all 10,000 strings into an associative array. See our simple demonstration victim program

On a thread on Perl Monks, user iburrell reports that 10,000 strings sent to a CGI POST request consumed 3 minutes of CPU time. Which perl version they attacked is unknown.

Linux dcache 2.4.20

Make a directory with filenames equal to those in this file. I recommend using a cut-down subset, quadratic behavior seems to cease at only a few thousand entries.

Bro IDS 0.8a20

Although the attack is rather severe, this file is configured to attack a destination address of 10.11.11.1 (which is what we used as a target during our research), so it is not useful on the open internet. To test, do "bro -r bro-attack.tcpdump POLICY"

Python 2.3b1

This corrected file is from a reimplementation of the attack by "Alexis S. L. Carvalho" , and demonstrates the vulnerability. See his simple demonstration victim program .

Jeff Epler on python-dev has benchmarked a sample attack on CGI scripts by way of CGI POST form handling. Although approximately 1MB of data was sent to the server, it consumed 240 seconds.

Although we are unaware of similar attacks on other CGI libraries, it is likely that similar vulnerabilities exist.

GLIB 2.2.1

A demonstration victim program, also by Alexis S. L. Carvalho shows a degradation on this file.

Java 1.4.1 (build 1.4.1-b21 on linux)

Insert all 10,000 strings into a HashSet. Performance changes from 1.5 seconds to 20+ seconds. See our for a simple demonstration victim program Unfortunately, the Java hash function for strings is defined as part of the Java language description. Thus, fixing java would appear to require changing this specification. Also, java supports serialization, which means that custom serialization code must be used that is hash-function invariant (as is done with HashSet). A universal hash function cannot just be plugged in; how will the appropriate key get stored?

Source Code Release

Universal hashing library source release Placeholder until polished source release (expected by ??????)

A source release of a universal hashing library that is not vulnerable to these attacks. This library has been put on the back burner because although mulitplication is only a minor slowdown on intel CPU's, David Miller has shown that on other architectures it is much more of a speed hit.

Also, the uh library I based my code on UHASH/UMAC has bugs on non-intel architectures.

Universal hashing library CVS checkout Raw, unpolished, but suitable for testing and careful use.

A snapshot of my development directory of the universal hashing library. As of Sat May 31 11:47:36 CDT 2003 , it compiles and passes its self tester on linux x86. I believe that it is underdocumented but mostly correct at this point. Please download, build, and test and see if it passes its self tests. Patches welcome. It is verified to work with GCC and G++ on X86 and partially on SPARC. Part of the self-test checks all possible alignments, and SPARC has certain pointer-alignment requirements. Is this a (fault in the library or self tester?) UHASH thus fails some robustness tests, but passes test vectors (with GCC and with the cc-5.0 with debugging set off). It is not suitable as a drop-in library, but it is ready for testing and auditing.

Other solutions

I highly reccomend not using ad-hoc colutions or simple keying of the existing hash function. The security landscape is littered with broken cryptographic operations and protocols. We suggest that if you do not use universal hashing, you use a trusted reliable algorithm borrowed from someone else rather than reimplement something likely to be broken. I wouldn't invent my own hash function; I advise the same of almost everyone else.

David Miller in the linux networking stack has chosen a keyed variant of Bob Jenkin's Hash for performance reasons. I do not know of any proof of immunity, but it is probably sufficient to thwart this attack.

Tim Peters gives a reference to Judy Trees as an alternative datastructure. They appears to be a type of a cache-aware trie. The website states "Judy arrays are both speed- and memory-efficient, with no tuning or configuration required, across a wide range of index set types (sequential, periodic, clustered, random). Judy's speed and memory usage are typically better than other data storage models such as skiplists, binary trees, b-trees, or even hashing, and improves with very large data sets."

Vulnerability Reports

The impact of this attack varies from inconsequential to high, depending on the appllication, its configuration, and how it is used.

Submitted Bugtraq post

High

Bro 0.8a20

Confirmed vulnerable. Bro is an Intrusion Detection System by Vern Paxson. Please see the paper. A patch exists and a fixed release will be out soon. Bro does not crash, but drop rates of 70% are experienced with an attack traffic as low as 16kbits/sec.

Medium-Low

These are scripting langauges or infrastructure libraries that are vulnerable, but the impact of attack depends on how they are used. In some cases, they may be important vulnerabilities, in other cases, irrelevant. It depends on the applilcation and configuration. Please see our analysis of Perl in the paper, the same applies to all of these.

Python 2.3b1

With 10,000 inputs, python goes from taking .2 seconds to 20 seconds. Aspects of this issue have been discussed on python-dev

TCL 8.4.3

Aspects of this have been discussed in comp.lang.tcl under the subject 'hash attack'.

Perl 5.6.1 and 5.8.0

With 10,000 inputs, perl goes from taking .14 seconds to 52 seconds.

glib 2.2.1

With 10,000 inputs, glib goes from taking .54 seconds to 11 seconds. Aspects of this have been discussed on gtk-devel

Mozilla 1.3.1

These are appear to be subject to a weak attack.

Low

Linux 2.4.20 dcache

Confirmed The degradation requires the ability to construct files and is about 200x, but appears small in absolute terms. Aspects of this have been discussed on linux-kernel

Retracted

DJBDNS 1.05

The DNS cache has an explicit limit to stop a degradation from being more severe than 100x on the hash table. NEW: Benchmarking the cost of the hashtable, the degradation would be at most 2x for the overall application
For questions or comments email me at Scott A. Crosby <scrosby@cs.rice.edu>

Changes: Sat May 31 02:26:28 CDT 2003. Classifications catagories downgraded for earlier overenthusiasm and underexperience, and tiredness. My apologies to the victims. Origional advisories kept unchanged.

Changes: Sat May 31 11:00:40 CDT 2003. Everyone wants concrete benchmarks before believing their essentially identical hash table is also vulnerable. Inserted numbers into the bullet points above.

Changes: Sat May 31 18:22:45 CDT 2003. Added in links to discussions on mailing lists.

Changes: Sat May 31 18:44:56 CDT 2003. Added in offline bro attack file. Elaborated description of Jenkin's hash to include a link.

Changes: Sat May 31 19:02:04 CDT 2003. Minor cleanups.

Changes: Sat May 31 19:37:25 CDT 2003. Slashdotting. Make attack files gzipped. Also made the destionation points of the slashdot links point to holder pages. (Why oh why didn't they link to this page directly, which has the info people want, instead of the actual paper?)

Changes: Sat May 31 20:07:44 CDT 2003. Added link to Bro.

Changes: Sun Jun 1 02:15:21 CDT 2003. The document sent to DJ Bernstein (which appears to have not gotten through to him in any case) should not have included the standard boilerplate. An embarassing mistake, It adds little and I've withdrawn it.

Changes: Sun Jun 1 11:33:08 CDT 2003. Added line about attacking any hash table based only on its processing latency. Also put in link about 'WHen Bad Hashes Go Wrong'.

Changes: Sun Jun 1 17:58:55 CDT 2003. Tossed in attack on java. Also added in reference to the CGI attack on python.

Changes: Tue Jun 3 13:45:07 CDT 2003. Put in section about attacks known to occur in the wild. Paragrph on broken hash functions.

Changes: Mon Jun 16 13:01:02 CDT 2003. Corrected description of serialization and hashsets. Hashsets are correctly serialized. Error pointed out by David Chase.

Changes: Sun Jan 11 14:23:20 CST 2004. Updated information on the source code release.

Changes: Wed Jun 2 02:51:06 CDT 2004. Retracted claim about djbdns. Benchmarked, the attack would at most double the CPU time.

Changes: Tue Aug 31 17:47:09 CDT 2004. Added on related work by Mike Fisk.


Last modified: Tue Aug 31 17:47:40 CDT 2004