Comp527: Final Project Ideas

You are strongly encouraged to come up with your own ideas, of course, but here are some ideas to get you started thinking about your final projects. Some of these ideas are fairly straightforward implementation projects. Others are open research problems. You're limited only by your ambition and your free time. Don't be scared of the harder problems, but if you take one on, your proposal should be focused on a narrow aspect of the problem. Start off narrow and easy and leave room to expand your work if you have time.

Operating Systems

Linux Security Modules: A good way of approaching ACLs, MLS, or whatnot would be to do it in the context of the Linux kernel's new security architecture, which is supposed to make it easier to add these kinds of patches to the kernel. You might consider studying adding special purpose modules that better support the kinds of privilege separation policies being implemented by new e-mail transport agents, SSH daemons, and so forth.
Firewalls: Add firewall-style support to an operating system. You could try developing application-level gateways for several of the common protocols. You could also look at building an efficient router. Rather than just hacking away, you might also look at the software engineering aspect and look at some kind of firewall architecture that has inherent strengths against attacks on any of its protocols. As above, you might look at building low-level OS mechanisms that better support the design and implementation of existing firewall software. There have been several system call interposition systems built over the years, but these are often vulnerable to time-of-check-to-time-of-use attacks. Maybe you could look at more radical architectures like using multiple privilege rings or other tricks not available in traditional monolithic kernel designs.
Key management: MacOS X has a cool feature called the "KeyChain" which can store all your other cryptographic keys in a single box. Plan9 and Kerberos have similar features. Can you design a general-purpose OS mechanism for handling all these different forms of key storage? Maybe you could look at integrating this with smartcards for external key storage.

Peer-to-peer networking

Time synchronization: In many distributed systems and cryptographic protocols, the clock is used to gain a wide variety of valuable properties. However, globally synchronized clocks are not necessarily a given. There's a protocol called NTP (the network time protocol), which is widely used on the Internet, and it's even built into the latest Windows XP and MacOS X. While some newer extensions add cryptography to the system, it's still fundamentally a top-down system, where more trusted servers help set the clocks on less trusted ones. You could investigate how, with a p2p network where some nodes know the "correct" time (perhaps via GPS or atomic-clock radios), and other nodes are willing to lie about the time, that everybody in the network can end up, somehow, synchronized. Note that this synchronization can take several potential forms. For some applications, it's more important for two computers to agree on the precise length of a second. For other applications, it's more important for a set of computers to agree to their own version of the time, whether or not it's connected to the real world.
Network positioning: GNP (and similar systems) try to map hosts on the Internet to coordinates in some higher dimensional space such that Euclidean distance over these coordinates says something useful about the network latency between the two nodes. GNP has not been designed with hostile nodes in mind. Can a small coalition of hostile nodes corrupt things globally? Can you build the system to be more robust against such attacks? (If you did this project, you'd start with Eugene Ng's existing simulators and codebase, plus you've got the expert right here to discuss your project.)
E-mail, instant messenger, etc.: Messaging applications are all the rage these days. You could investigate how to build decentralized chat systems (as in ICQ or AIM) or how to build e-mail systems. While you've got the "freedom" of starting from scratch, make sure you can address the inevitable spamming attacks. You could also revisit the whole PGP "web of trust" in the context of p2p networks. Could this be made to work?
Fair resource usage: Several of my graduate students are currently working on fair sharing of disk, bandwidth, and other resources in a p2p system (i.e., guaranteeing that a user cannot consume more network storage than he or she is providing to the network). How can you guarantee that a user doesn't consume more resources than he or she provides? Or, turned the other way around, how can you guarantee that the p2p node in your dorm room, with some popular file on it, won't be overwhelmed with requests, even if all the other official replicas of the file are offline? An interesting opportunity would be to design an experimental client for BitTorrent that tracks how fair sharing progresses within a given BitTorrent cloud. You could also build a hostile BitTorrent client that tries to take advantage of its peers and measure how the rest of the network reacts.
Read-write file storage: Several recent research trends have discussed the problems that occur with writing data to untrusted storage. Cryptographic filesystems aim to protect the confidentiality of a file, but they don't gain anything in availability against an adversary who can delete files. Other p2p systems have started to look at supporting writes, particularly with multiple, concurrent writers. There's plenty of work to do here, including deciding what the right file systems semantics should be, as observed by a user of the system. Should it "feel" like a normal filesystem, or if you relax that, are there performance and/or security benefits?
Traffic analysis: Even if the world starts encrypting its network traffic, you can learn a lot just by knowing who sent a message to whom. You could look at building traffic analysis systems, particularly in a p2p system, to see whether a small number of nodes can see enough routing traffic, or otherwise learn enough to get a good picture of what's happening inside the network. You could also investigate whether "standard" anti-censorship or anonymity-preserving systems can really hold up when some percentage of the nodes are colluding to figure out what's going on.

Many of these ideas would be fantastic opportunities to run on PlanetLab, of which Rice happens to be a member.

Cryptography / Networking

Extensions to cryptyc: You used cryptyc in your voting machine project. By now, you've probably got some ideas of how it could be improved. A good start would be extending the cryptyc implementation to support the public-key extensions described in their latest papers. Cryptyc also needs to have a decent GUI interface to show you a picture of the protocol you're studying and to give you a graphic understanding of how your system is or isn't working properly.
Login authentication: Modern Unix systems support pluggable authentication modules (PAM). Write a PAM that uses a smart card or a PalmPilot or some other interesting technique.
High performance ciphers: How fast can you go? RC4 can theoretically be tuned to go the speed of memcpy(3C). What if you wire together gzip (or some other compression system) with an encryption system? Can you go faster with the two systems optimized together than running them separately? Compare the performance of several different ciphers (either with code you get from the net or from Schneier's book). Fine tune the inner loop at the raw assembly level if you have to. It would be interesting to look more closely at the new AES (Rijndael) cipher, particularly implementing it in some kind of silicon (e.g., Xilinx parts).
Fast cracking: Rice has a remarkably fast Terascale Computing Facility. It's got some 70 quad-processor Itanium 2 servers with a combined total of half a terabyte of RAM and a big pile of disk. How long would it take to exhaustively guess a Unix password? Maybe you can go after one of the RSA challenges, although you might want to start with one of the already solved ones to see how long it takes. Maybe you could look at implementing one of these time-space cryptanalysis tradeoffs.
One-time pad management: Assume the worst case: all traditional cryptosystems have been broken, P=NP, and evil attackers are actively in control of every network. This means the only remaining cryptosystem is the one-time pad. How would you do the equivalent of digital signatures? How would you do the equivalent of public key infrastructure? How would you securely exchange pad bits? Work out the models on paper and write a program that implements them.
TCP/IP protocol stuff: Generic TCP/IP is vulnerable to all kinds of attacks. Design and implement other protocols that might be stronger, but still preserve the efficiency of TCP/IP. For example, an attacker can emit a RST packet which closes the connection. That's a denial-of-service attack. Can you build an efficient networking protocol that's resistant to this attack? What about session hijacking? What about Kuzmanovic and Knightly's shrew attack?

Language Security

Security through code rewriting: Many researchers are adding new security semantics to Java these days by rewriting Java bytecode. You can use this trick to force the program to observe some behaviors. In the case of Java, a number of nice libraries are available that make it relatively easy to manipulate Java bytecode. For something like Scheme, it's already pretty easy to manipulate the code. Pick a class of security policies you're interested in and investigate adding those semantics to Java, Scheme, or whatever.
Malicious code detection: Can you parse a program and statically detect if it will misbehave? Anti-virus programs usually use a long list of patterns that they try to match in the software. Build something similar.
Agent systems: In agent systems, multiple agents are running together in the same language runtime and interacting with each other. Example agent systems include MUDs (multi-user dungeons) and stock market trading systems. One possible project is working on resource management and security issues here (to control misbehaving agents). Another class of projects is to build an agent system that take advantage of somebody else's mechanisms.
Microsoft C#: Microsoft has a new language called C# that's part of Visual Studio.Net. For all intents and purposes, it's just Java except (of course) it's incompatible with Sun's system. The bytecode underneath C#, called the Common Language Runtime (CLR), is much more general than Java's bytecode: a C or C++ compiler can target it, so it supports general pointer manipulation and the works. It supports unsafe code, but it has a verifier. Possible projects include stress-testing Microsoft's verifier and looking for weaknesses in Microsoft's class libraries. There's fame and fortune in here for somebody who can blow Microsoft's security to smitherines. Another set of projects would be re-doing the work we've done on Java bytecode rewriting in the context of CLR.
Code obfuscation: You can mutate a program's control flow and it's dataflow. You can also write tools that may be able to reconstruct this information, even if it's been obfuscated. Build an obfuscation system and/or find some obfuscated code (perhaps buried in Windows XP or Office XP) and try to unobfuscate it. An interesting project, for example, might be to create a dataflow/control flow tool to study how Office XP detects changes in your hardware or otherwise tries to detect if it's been copied.
Static analysis: Many interesting systems in the past several years have used static analysis techniques to study C programs to look for security holes. You could pick up an existing tool, like CQUAL, or look at writing your own tools. You're more likely to get interesting results trying to apply these existing tools to detecting new kinds of security problems.
Dynamic analysis: Tools like Purify are designed to augment a program, before it starts running, to check for various buggy program behaviors (including buffer overflows and other common C pointer mishandling issues). You might be able to do something similar, whether through a compiler hack, an object-code rewriting hack, or a Java bytecode rewriting hack. The challenge is to collect enough information to be able to detect problems without requiring a huge memory or CPU time overhead.

PDA / Mobile computing platforms

Palm Pilots, cel phones, and other gizmos that have (a) CPUs, (b) screens, (c) user input, and (d) antennas are becoming increasingly ubiquitous as their prices and weight drop and their functionality increases. If they were truly everywhere, you could consider building systems around this.

E-cash systems: Beam money to your friends. Beam money to the Coke machine. Use cryptography to protect the data, of course, but you also need to worry about double-spending, non-repudiation, and all that. Your job gets easier because the phones are always on-line, but you should be able to have one of the parties be offline (e.g., the Coke machine), and have everything else still be secure.
Wireless Ethernet: Fundamentally, there isn't very much difference between a wireless network and a wired network. The only real difference is that people can potentially move around, changing who is within radio range at any given time. Likewise, you can no longer build a firewall to keep an intruder on the outside. The low-level protocols used by wireless Ethernet to help you find a channel and sign in have been found to have serious security holes. Maybe you can do some re-engineering. You might also look at some of the new proposals to replace the easily broken WEP encryption. You could also investigate mobile ad-hoc routing (Dave Johnson's work) and the security issues that arise in routing packets through nodes that you don't necessarily trust.

Privacy

People often have secrets. While they're straightward to protect on computers (i.e., mark a file as readable only by its owner), things get much more difficult on the net.

Web privacy: Cookies can be used to track you. URLs with funny extensions can be used to track you. Web pages load images from third parties like DoubleClick. They're not just advertisements, they're tracking you as well. There are many opportunities to build projects here. You could build Web proxies that do anything from ad filtering, like WebWasher to ad jamming (feeding back bogus cookie information). You could also build systems where privacy emerges as a property of a lot of people surfing at the same time, such as Crowds. On the flip side, you could analyze systems like these and try to systematically break them. (In the hallway outside DH3004, you can see the results of a previous semester's attempt to graph cookie usage on the web. This work really needs to be redone properly.)

Applications

Auditing infrastructure: The DIDS paper discussed a networked audit facility for intrusion detection. But what happens if the centralized audit machine is successfully attacked? Build a truly decentralized system where, even if some number of systems go down, you still have a complete record of what happened.
Spam filtering: Bayesian spam filtering has improved significantly over the past few years. In a recent message, Paul Graham suggests that a spam filter might follow links in the spam and use those to help with classifying the message. Adding that, or other features, to Mozilla would earn you the love of millions.
Chat systems: You're talking to somebody on the chat server today. Tomorrow, you get an e-mail from somebody claiming they're the same person. Build a way for people who don't know each other to be able to identify each other later with some kind of cryptographically strong authentication. A cool property of your system would be an ability for it to be deployed incrementally with existing chat systems, rather than requiring everybody to switch en-masse to something new.
Calendaring systems: I want to schedule a meeting with people from different companies. The computer should be able to look at my free time and the free time of everybody else and come up with the best time we can all meet. Can you do this without me being able to read the full calendars of everybody else? Can you do it without a central calendar server (i.e., using some cryptographic primitives to arrive at a time we can all meet without anyone revealing too much of their private schedule information)?

Voting Security

Add security to Hack-a-Vote: The Hack-a-Vote project codebase has no "real" security features in it. What if you wanted to use Hack-a-Vote for real, perhaps for student association elections? Do the design and engineering to perform Hack-a-Vote correctly.
Add paper to Hack-a-Vote: We've said that voting systems must have a voter-verifiable audit trail. Add one to Hack-a-Vote. See if you can find some OCR software and cheap scanners and get the whole thing working.
Analyze the Comp527 voting project: Given the voting project and all the groups that did it, write a grand summary of all the different projects. Measure code statistics on how groups approached the assignment. Draw conclusions about what this means for "real" voting systems. (This project is mostly writing, with minimal software development. Your final paper/presentation would be expected to be of suitably higher quality, perhaps directly ready to submit to a conference. Make sure your work is accessible to non-technical readers.)
Analyze/hack other voting systems: There are two "real" open source voting systems out there: EVACS (used in Australia) and EVM (under active development, not used anywhere yet). You can try a variation of the voting project's phase 1 or 2 (hacking or analyzing) against either of these systems.

Other Fun Stuff

Biometrics: Read Anderson, Chapter 13, then go implement your own biometric. Maybe start with something relatively easy like finger geometry recognition by putting your hand under a video camera. Get lots of people to try it and see how well you can do.
Copy protection: You can look at existing digital rights managment systems (those few that are available) and show how they can be broken. You might also look lower-level issues, such as how Macrovision works, or what it takes to get around the console protections on consumer video game boxes. Another related idea is to look at U.S. patent 6,018,374, which talks about using infra-red laser light to make a movie screen look washed out to a video camera. It would be interesting to see how well this actually works, and whether commercial IR filters could fix the problem.

Dan Wallach, CS Department, Rice University
Last modified: Wed 06-Oct-2004 16:26