Bulletin Board Benchmark


RUBBoS is a bulletin board benchmark modeled after an online news forum like Slashdot. We originally considered using the Perl-based Slashcode, which is freely available, but we concluded that the code was too complex to serve as a benchmark.  Instead, we implemented the essential bulletin board features of the Slashdot site in PHP. In particular, as in Slashcode, we support discussion threads. A discussion thread is a logical tree, containing a story at its root and a number of comments for that story, which may be nested. Users have two different levels of authorized access: regular user and moderator. Regular users browse and submit stories and comments. Moderators in addition review stories and rate comments.

PHP (Hypertext Preprocessor)
PHP is a scripting language that can be seen as an extension of the HTML language: PHP code can be directly embedded into an HTML page. Built as an HTTP server module, PHP is executed within a Web server process and does not incur any inter-process communication overhead. When the HTTP server identifies a PHP tag, it invokes the PHP interpreter that executes the script. Requests to the database are performed using an ad hoc interface.
PHP scripts are easy to write and reasonably efficient, but the database interfaces are ad hoc. This makes code maintenance awkward, because new code needs to be written for each new database to which the scripts need to access. PHP scripts execute in the same process (address space) as the Web server, thereby minimizing communication overheads between the Web server and the scripts.

For more information about installing and configuring RUBBoS go here.
 

Database

The main tables in the database are the users, stories, comments, and submissions tables. The users table contains each user’s real name and nickname, contact information (email), password, level of authorized access, and rating. The stories table contains each story’s title and body, the nickname of the story’s author, the date the story was posted, the number of comments at the outermost nesting level, and the category the story fits under. The categories table contains the same categories as the Slashdot site. The comments table contains the comment’s subject and body, the nickname of the comment’s author, the date the comment was posted, the identifier of the story or the parent comment it belongs to, and a comment rating. Each submitted story is initially placed in the submissions table, unless submitted by a moderator. We maintain a moderator_log table, which stores the moderator ratings for comments. Regular user ratings are computed based on the ratings of the comments they have posted.

For efficiency reasons, we split both the stories and comments tables into separate new and old tables. In the new stories table we keep the most recent stories with a cut-off of one month. We keep old stories for a period of two years. The new and old comments tables correspond to the new and old stories respectively. The majority of the browsing requests are expected to access the new stories and comments tables, which are much smaller and therefore much more efficiently accessible. A daemon is activated periodically to move stories and comments from the new to the old tables as appropriate.

We generate the story and comment bodies with words from a given dictionary and lengths between 1KB and 8KB. Short stories and comments are much more common, so we use a Zipf-like distribution for story length. The database contains 2 years of stories and comments. We use an average of 15 to 25 stories per day and between 20 and 50 comments per story, as we observed on Slashdot. We emulate 500,000 total users, out of which 10% have moderator access privilege. With these parameters, the database size is 439MB. We also created a larger database of 1.4GB containing more old stories and comments. The results are very similar as the majority of the requests access the new stories and comments.
 

Benchmark tool

We implement a client-browser emulator. A session is a sequence of interactions for the same customer. For each customer session, the client emulator opens a persistent HTTP connection to the Web server and closes it at the end of the session. Each emulated client waits for a certain think time before initiating the next interaction. The next interaction is determined by a state transition matrix that specifies the probability to go from one interaction to another one.
The think time and session time for all benchmarks are generated from a negative exponential distribution with a mean of 7 seconds and 15 minutes, respectively. We vary the load on the site by varying the number of clients.

We have defined 24 Web interactions. The main ones are: generate the stories of the day, browse new stories, older stories, or stories by category, show a particular story with different options on filtering comments, search for keywords in story titles, comments and user names, submit a story, add a comment, review submitted stories and rate comments at the moderator level. Full text search is currently not supported. Without additional support, it requires a prohibitive processing time in a general-purpose relational database. Typically, an external search engine would be used to perform this task.

We use two workload mixes: a browsing mix and a submission mix. The browsing mix is a read-only workload that does not allow users to post stories or comments. The submission mix contains 85% read-only interactions, with the remaining 15% being story and comment submissions and moderation interactions.
 


 RUBBoS (C) 2001 - Rice University/INRIA