Authors: Scott A Crosby and Dan S Wallach
Related work includes this, by M. D. McIlroy, which we were unaware of when writing the paper which describes how to attack quicksort by a carefully constructed input to make it experience O(n^2) time instead of O(n log n).
Tim Peters (on the python-dev mailing list) has stated "Many programs use (fixed) regexps to parse input, and it's often possible to construct inputs that take horribly long times to match, or, especially, to fail to match. Blowing the hardware stack (provoking a memory fault) is also usually easy. The art of writing robust regexps for use with a backtracking engine is obscure and difficult (see Friedl's "Mastering Regular Expressions" (O'Reilly) for a practical intro to the topic), and regexps are ubiquitous now. . He gives a contrived demonstration of this by using the regular expression: (x+x+)+y and when we search for strings of the form xxx...xxxy which do match and xxx...xxx which don't match, he shows exponential behavior in python's regular expression matcher in the second case.
Related work also includes the recent advisory on the linux route-table cache by Florian Weimer. A patch to fix this and similar issues in other places of the linux networking stack is being worked on by David Miller.
Related work on thwarting port scan detectors via hash collisions was done by Solar Designer in Phrack. Issue 53 July 8, 1998, article 13 of 15
Tim Peters and Solar Designer have claimed that it may be possible to force collisions even in the case of keyed cryptographic or universal hashing, by statistically attempting to identify, due to detectable processing latency, whether two items collide or not. If this attack is possible, it would probably require a signifigant number of probes, perhaps greater than the cost of the attack itself, but would have the potential to attack any hash table that didn't have its key altered regularily. It is unknown if this attack is practical.
M-J. Dominus (mjd-perl-badhash@plover.com) has a demonstration program in 1997 titled 'When Hashes Go Wrong', that shows a worst-case attack of 10,000 collisions in the old Perl 5.6.1 hash function.
Related work also includes Fast Content-Based Packet Handling for Intrusion Detectin by Mike Fisk and George Varghese which has a subsection that discusses 'Algorithmic Performance Attacks' where an attacker may intentionally provide inputs that will knowingly cause the worst-case performance for a network intrusion device.
Related work also includes: Andreas Gal, Christian W. Probst, and Michael Franz. A denial of service attack on the Java bytecode verifier. Technical Report 03-23, University of California, Irvine, School of Information and Computer Science, November 2003.
Each file contains 10,000 strings, Notice the difference between using the correct 'attack' file and the incorrect one. Also notice the behavior on subsets of these files of shorter length.
These attacks are not known to be occuring in the wild. However, they are simple to implement, attack files take under an hour to generate. From the tests on CGI scripts below, the impact is noticable, on the order of a few kilobytes of data to cause a second of CPU time to be lost. Also, at least one independent implementation capable of generating all of the demonstration files below is known to exist. It was emailled to me approximately 12 hours after the first public posting of the vulnerability reports.
On a thread on Perl Monks, user iburrell reports that 10,000 strings sent to a CGI POST request consumed 3 minutes of CPU time. Which perl version they attacked is unknown.
On a thread on Perl Monks, user iburrell reports that 10,000 strings sent to a CGI POST request consumed 3 minutes of CPU time. Which perl version they attacked is unknown.
Jeff Epler on python-dev has benchmarked a sample attack on CGI scripts by way of CGI POST form handling. Although approximately 1MB of data was sent to the server, it consumed 240 seconds.
Although we are unaware of similar attacks on other CGI libraries, it is likely that similar vulnerabilities exist.
Also, the uh library I based my code on UHASH/UMAC has bugs on non-intel architectures.
David Miller in the linux networking stack has chosen a keyed variant of Bob Jenkin's Hash for performance reasons. I do not know of any proof of immunity, but it is probably sufficient to thwart this attack.
Tim Peters Changes: Sat May 31 02:26:28 CDT 2003. Classifications catagories
downgraded for earlier overenthusiasm and underexperience, and
tiredness. My apologies to the victims. Origional advisories kept
unchanged.
Changes: Sat May 31 11:00:40 CDT 2003. Everyone wants concrete
benchmarks before believing their essentially identical hash table is
also vulnerable. Inserted numbers into the bullet points above.
Changes: Sat May 31 18:22:45 CDT 2003. Added in links to
discussions on mailing lists.
Changes: Sat May 31 18:44:56 CDT 2003. Added in offline bro attack
file. Elaborated description of Jenkin's hash to include a link.
Changes: Sat May 31 19:02:04 CDT 2003. Minor cleanups.
Changes: Sat May 31 19:37:25 CDT 2003. Slashdotting. Make attack
files gzipped. Also made the destionation points of the slashdot links
point to holder pages. (Why oh why didn't they link to this page
directly, which has the info people want, instead of the actual
paper?)
Changes: Sat May 31 20:07:44 CDT 2003. Added link to Bro.
Changes: Sun Jun 1 02:15:21 CDT 2003. The document sent to DJ
Bernstein (which appears to have not gotten through to him in any
case) should not have included the standard boilerplate. An
embarassing mistake, It adds little and I've withdrawn it.
Changes: Sun Jun 1 11:33:08 CDT 2003. Added line about attacking
any hash table based only on its processing latency. Also put in link
about 'WHen Bad Hashes Go Wrong'.
Changes: Sun Jun 1 17:58:55 CDT 2003. Tossed in attack on
java. Also added in reference to the CGI attack on python.
Changes: Tue Jun 3 13:45:07 CDT 2003. Put in section about attacks
known to occur in the wild. Paragrph on broken hash functions.
Changes: Mon Jun 16 13:01:02 CDT 2003. Corrected description of
serialization and hashsets. Hashsets are correctly serialized. Error
pointed out by David Chase.
Changes: Sun Jan 11 14:23:20 CST 2004. Updated information on the
source code release.
Changes: Wed Jun 2 02:51:06 CDT 2004. Retracted claim about
djbdns. Benchmarked, the attack would at most double the CPU time.
Changes: Tue Aug 31 17:47:09 CDT 2004. Added on related work by Mike Fisk.
Vulnerability Reports
The impact of this attack varies from inconsequential to high,
depending on the appllication, its configuration, and how it is used.
Submitted Bugtraq post
High
Bro 0.8a20
Confirmed vulnerable. Bro is an Intrusion
Detection System by Vern Paxson. Please see the paper. A patch
exists and a fixed release will be out soon. Bro does not crash, but
drop rates of 70% are experienced with an attack traffic as low as
16kbits/sec.
Medium-Low
These are scripting langauges or infrastructure libraries that are
vulnerable, but the impact of attack depends on how they are used. In
some cases, they may be important vulnerabilities, in other cases,
irrelevant. It depends on the applilcation and configuration. Please
see our analysis of Perl in the paper, the same applies to all of
these.
Python 2.3b1
With 10,000 inputs, python goes from taking .2 seconds to 20
seconds. Aspects of this issue have been discussed on
python-dev
TCL 8.4.3
Aspects of this have been discussed in comp.lang.tcl under the subject
'hash attack'.
Perl 5.6.1 and 5.8.0
With 10,000 inputs, perl goes from taking .14 seconds to 52 seconds.
glib 2.2.1
With 10,000 inputs, glib goes from taking .54 seconds to 11
seconds. Aspects of this have been discussed on
gtk-devel
Mozilla 1.3.1
These are appear to be subject to a weak attack.
Low
Linux 2.4.20 dcache
Confirmed The degradation requires the ability to construct
files and is about 200x, but appears small in absolute terms. Aspects
of this have been discussed on
linux-kernel
Retracted
DJBDNS 1.05
The DNS cache has an explicit limit to stop a degradation from being
more severe than 100x on the hash table. NEW: Benchmarking the cost of the hashtable, the degradation would be at most 2x for the overall application
For questions or comments email me at Scott A. Crosby
<scrosby@cs.rice.edu>
Last modified: Tue Aug 31 17:47:40 CDT 2004