Introduction
The following is a sort of "menu" of possible semester projects for COMP420. Naturally, these suggestions are somewhat biased by the interests and research of the teaching staff. You are not restricted to this set of topics, but, of course, more help will be available if you choose a topic of local interest. Please view the suggested topics as rough concepts, not as finalized proposals. You should try to develop your own topic to the extent possible.
Other sources of inspiration can be found in the project suggestions for recent iterations of this course. (You will discover significant overlap with this list.) COMP520 Fall 2004; COMP520 Fall 2003; COMP413 2002/2003.
Given the size of the class, projects should be done in groups of two students (or, if the project merits, a single group of four). A title for your project and the composition of your group is due by March 31. Your project proposal is due by April 12 and the project itself, including written report, is due at the end of the semester.
Remember: The purpose of this project is not to be an enormous programming assignment; it is foremost an opportunity to define and pursue a research project. Ask an interesting question, and form an interesting hypothesis related to the content of this course. Develop the minimal amount of framework to validate or disprove your hypothesis.
Outline of suggested projects
- P2POG
- Scalable Software Update
- Meta-DHT and DHT Range Queries
- Version Control
- Expo: Shared Whiteboard
- Content-Based Publish & Subscribe
- Peer-to-Peer Email: ePOST Improvements
- Scribe Security
- Something Completely Different
Project details
P2POG
Large-scale multiplayer online games have moved from primitive (MUDs) to popular (modern MMOGs). However, as MMOGs gain popularity, the difficulty of providing game service increases. [Recently, the eagerly-awaited World of Warcraft MMOG has been plagued with server instability and even total downtime (graph), forcing refunds to players and very bad press.]
Clearly, these games will become even more popular in the future, and yet the centralized servers which operate them are fundamentally difficult to scale without unfortunate compromises on gameplay (e.g., partitioned "worlds" preventing friends from playing together).
Develop a scalable multiplayer game which shares game state and simulation among peers. It's certainly out of scope of this course to develop a MMOG in the World of Warcraft sense, but a text-based adventure game between peers seems reasonable. Players and objects should be able to integrate and cause meaningful effects; players should be able to communicate with one another by proximity (i.e. "same room") as well as arbitrarily. Players may be offline for periods of time, and the game should (a) tolerate this and (b) maintain state between connections of a given player. Game/object state must be carefully maintained to avoid inconsistency between peers. In-game programmability is always nice, but not required.
There are many approaches to dividing/distributing the game state; implement a design you like and defend it in your report.
Note that the problem of cheating in a p2p game environment is an open one; pay careful attention to the ability of arbitrary clients to corrupt the game state, and even if you don't address this in your implementation, you should discuss this topic in your report.
[cites: "P2P support for MM Games", Knutsson et al., INFOCOM'04; all papers from ACM Network and System Support for Games, 2002; Rooney etc al., on federated game architectures, IEEE Communications 5/04]
Scalable Software Update
In peer-to-peer distributed systems, data is replicated among several nodes to achieve high availability. Ideally, these nodes will be dissimilar in location, software configuration, and administrative control, to reduce the risk of correlated failure impacting all replicas of a given object.
However, the homogeneity of the hosts connected to the Internet allows for extremely massive correlated failures. For example, Windows viruses and worms can take the vast majority of nodes offline. One approach to dealing with this problem is to "fight fire with fire": massive redundancy and aggregation [cite:Glacier] to combat massive correlated failure.
This problem might, however, be substantially decreased if software patches and bug fixes were more immediately and efficiently distributed to vulnerable hosts. Current software update mechanisms, such as Windows Update, suffer distribution delays (because hosts do not query the update server very frequently) and availability issues (due to overload of central update servers).
Develop a distributed software-update mechanism. End hosts should be made aware of software updates within minutes, not days or weeks; software should be distributed efficiently to end hosts, with peers sharing the workload.
Note that since software updates tend to be very large files, storing them in a DHT will likely not be sufficient; look at SplitStream and BitTorrent for ideas about how to more efficiently share large files fairly among many peers.
[cite recent efforts to distribute Windows XP SP2 via BitTorrent, and related legal issues]
Your solution should take into account some of the security implications of peer-to-peer software updates; in particular, hosts must not install, or even perpetuate as authentic, forged or invalid software updates.
[NB: A "flash crowd" capable version of PAST, with chunked content (possibly erasure coded?) cached along the routes to the nodeId closest to the fileId of each chunk, might be a good project on its own...]
Meta-DHT
Distributed hash tables provide a scalable way to store and retrieve data associated with a single key; for instance, the PAST DHT offers the API:put(fileId, object, credentials) get(fileId) → object
Extend this API to provide efficient association of arbitrary attributes with objects in PAST.
query(attrName, attrValue) → set<fileId> setAttr(fileId, attrName, attrValue, credentials) getAttr(fileId, attrName) → Object removeAttr(fileId, attrName, credentials) listAttrs(fileId) → set<attrName>
Build a sample application of Meta-PAST. Example: a keyword-based picture-sharing application in the spirit of Flickr, in which users may store image files and also assign arbitrary "tags" to help organize these pictures. Tag queries may be restricted to a given user's photos, or may be applied to the global photo set.
Other example apps:
- tag-based shared bookmarks [see http://del.icio.us]
- general file sharing and retrieval (with tag searching)
- p2p OpenPGP keyserver (using tags to allow searching by email address, fullname, key-fingerprint) [cite: Waldvogel et al., "EKI", 2001]
Related project: DHT range queries. (A possible approach is to embed a B-Tree in the DHT.)
[cites: Complex Queries in DHTs [UCB], IPTPS'02; Somo, IPTPS'03; others?]
Version Control
Design, implement and evaluate a version control system on top of a self-organizing, decentralized storage system, i.e., PAST/POST and Glacier. Conflict resolution is a key part of such a system; for example, you may wish to treat simultaneous edits and network partitions as implicit branching events.
Some VCS links of note: Subversion (classic client/server system, useful as a featureset example); GNU arch; darcs (an interesting approach to dealing with causality and patch theory); monotone (some ideas, such as versions identified purely by hash, with natural mappings to DHTs).
Expo: Shared Whiteboard
Design, implement and evaluate a shared whiteboard facility on top a self-organizing, decentralized storage system, i.e., PAST/POST and Glacier.
Content-based pub/sub
Design, implement and evaluate a content-based publish/subscribe facility on top of a self-organizing, decentralized even notification system like Scribe.
Improvements to ePOST
Background: ePOST, a current project of the FreePastry group at Rice (and originally a semester project for a previous incarnation of this course), is a sophisticated peer-to-peer email architecture. Messages between ePOST users are encrypted and secure; node failure is tolerated with replication and extreme correlated failure is recoverable through massive aggregation. ePOST includes IMAP and SMTP services, so it interoperates with existing MUAs (email clients) as well as MTAs (Internet mail handlers).
ePOST is a successful application that can be made even more successful. Suggested improvements include:
- Spam defenses in a distributed setting. Add spam classification to ePOST (perhaps using popular techniques such as Bayesian filtering). Messages which a user's ePOST node classifies as spam can be withheld from the user and possibly removed from DHT storage. Users might give ePOST hints about what is and is not spam by moving messages into and out of a virtual "spam" folder on the IMAP server, or using a web interface (a la DSPAM). This mechanism becomes most interesting in the peer-to-peer setting: users may elect to share the plaintext of their spam messages with other peers, allowing the entire ring to contribute to the spam-fighting effort. (Note that there may be security aspects to this feature: a malicious user may attempt to damage the spam filters of other users.)
- ePOST Webmail. The popularity of services such as GMail demonstrates the viability of sophisticated Web applications. An ePOST user away from his ePOST node may wish to access his email through a webmail interface on his node.
- Multiple nodes per user. Currently, ePOST only works well with one node for each ePOST user; this makes it difficult to use the system from multiple physical locations. (Definitely talk to Alan before tackling this.)
Scribe Security
Design, implement and evaluate a seucrity system for Scribe. The SCRIBE publish/subscribe system currently lacks security. The goal of this project is to design, implement, and evaluate a security model for Scribe.Students working on this project would need some background in security, so it would be a good idea if they had taken, or are concurrently taking COMP 527.
Something Completely Different
This is a wide-open project suggestion which appeals to your creativity. Invent, design, and implement a novel peer-to-peer application. The only stipulation is that there the application is decentralized and cooperative and that there is a clear potential for the application to prove useful to more than just a few people.