
RUBBoS is a bulletin board benchmark modeled after an online news forum like Slashdot. We originally considered using the Perl-based Slashcode, which is freely available, but we concluded that the code was too complex to serve as a benchmark. Instead, we implemented the essential bulletin board features of the Slashdot site in PHP. In particular, as in Slashcode, we support discussion threads. A discussion thread is a logical tree, containing a story at its root and a number of comments for that story, which may be nested. Users have two different levels of authorized access: regular user and moderator. Regular users browse and submit stories and comments. Moderators in addition review stories and rate comments.
PHP (Hypertext Preprocessor)
PHP is a scripting language
that can be seen as an extension of the HTML language: PHP code can be
directly embedded into an HTML page. Built as an HTTP server module, PHP
is executed within a Web server process and does not incur any inter-process
communication overhead. When the HTTP server identifies a PHP tag, it invokes
the PHP interpreter that executes the script. Requests to the database
are performed using an ad hoc interface.
PHP scripts are easy to write and reasonably efficient, but the database
interfaces are ad hoc. This makes code maintenance awkward, because new
code needs to be written for each new database to which the scripts need
to access. PHP scripts execute in the same process (address space) as the
Web server, thereby minimizing communication overheads between the Web
server and the scripts.
For more information about installing and configuring RUBBoS go here.
For efficiency reasons, we split both the stories and comments tables into separate new and old tables. In the new stories table we keep the most recent stories with a cut-off of one month. We keep old stories for a period of two years. The new and old comments tables correspond to the new and old stories respectively. The majority of the browsing requests are expected to access the new stories and comments tables, which are much smaller and therefore much more efficiently accessible. A daemon is activated periodically to move stories and comments from the new to the old tables as appropriate.
We generate the story and comment bodies with words from a given dictionary
and lengths between 1KB and 8KB. Short stories and comments are much more
common, so we use a Zipf-like distribution for story length. The database
contains 2 years of stories and comments. We use an average of 15 to 25
stories per day and between 20 and 50 comments per story, as we observed
on Slashdot. We emulate 500,000 total users, out of which 10% have moderator
access privilege. With these parameters, the database size is 439MB. We
also created a larger database of 1.4GB containing more old stories and
comments. The results are very similar as the majority of the requests
access the new stories and comments.
We have defined 24 Web interactions. The main ones are: generate the stories of the day, browse new stories, older stories, or stories by category, show a particular story with different options on filtering comments, search for keywords in story titles, comments and user names, submit a story, add a comment, review submitted stories and rate comments at the moderator level. Full text search is currently not supported. Without additional support, it requires a prohibitive processing time in a general-purpose relational database. Typically, an external search engine would be used to perform this task.
We use two workload mixes: a browsing mix and a submission mix. The
browsing mix is a read-only workload that does not allow users to post
stories or comments. The submission mix contains 85% read-only interactions,
with the remaining 15% being story and comment submissions and moderation
interactions.
