After opening remarks by the General and Program chairs, the IEEE TCOS award for best student contribution was given to Douglas Santry of the University of British Columbia. He was the lead author of the paper entitled ``Elephant: The File System that Never Forgets''.
Santry et al's paper was presented as the first of three in the session on file systems chaired by Mary Baker. The key insight in this paper is the recognition that the growth of disk capacity has been exceeding the growth rate of read-write user data for some time now, and this trend shows no signs of abating. The paper describes a way of translating this spare disk capacity into improved system usability by separating file operations from file retention policy. The result is a file system that automatically saves all versions of all files for the short term, provides user control over long-term version retention, and uses heuristics to guide retention in the absence of explicit user advice.
In the discussions that followed, Margo Seltzer asked whether directories were grow-only, and whether repeated recompilations would result in unwieldy object-code directories. Santry replied that this is indeed the case in the current implementation, although a refinement to move dead names into a separate directory segment is under consideration. The cost/benefit tradeoff of this refinement will not become clear until adequate usage experience is available. In response to a question from David Mosberger, Santry confirmed that Elephant could obviate the need for RCS and ad hoc tools for checkpointing. Satya asked whether Elephant provided facilities for applications, rather than end users, to guide retention policy. Santry replied that there were currently no specific provisions for this purpose, but agreed that application assistance could prove valuable. Responding to a question from Dickon Reed, Santry explained that renaming did not complicate the tracking of file versions: a rename operation was represented as an unlink in one directory and a hard link in the other.
Peter Chen asked a series of questions probing the performance impact of Elephant's retention strategy. Santry responded that the implementation made heavy use of log-structured file system techniques, that buffer cache usage would indeed increase because of the presence of multiple versions, and that each version of a file corresponded to a separate vnode in the system. Mitchell Tsai asked how user behavior was affected by a system like Elephant. Santry replied that it should make users less paranoid about the safety of their data, but that only a user-study could confirm this. Tsai then asked how Elephant handled cases where an application like Emacs created a new file rather than overwriting the original. Santry agreed that this was a problem for the current system, and that support for managing related groups of files has to be an important future capability. In response to a final question from Mary Baker, Santry explained that a user study was being planned and that it was going to involve both qualitative and quantitative aspects.
The second paper of the session, presented by Karin Petersen, explored the caching issues induced by the design of the Placeless Document System being built at Xerox PARC. This system enables per-user, dynamic customization of documents, allows sharing of customizations between users, and cleanly separates the bit-storage and customization layers. Maintaining the validity of cached copies of a customization is done through pieces of code called Verifiers and Notifiers. These are unique to each customization and serve, respectively, to re-validate a cached copy and to trigger a callback.
Armando Fox opened the question period by asking how naming was linked to customizations; for example, one doesn't wish to remember five totally different filenames corresponding to five formats of a document. Petersen replied that the example could be handled by creating an active property that created the five formats, thus requiring only a single name for that property. Mike Jones asked what kind of file system support was required for maintaining cache consistency; Petersen's response was that systems like AFS that provide callback-based consistency are particularly helpful, and that systems like NFS force reliance on heuristics such as file modification times.
Jon Howell asked how systems like emacs which changed file names were handled. Petersen replied that many dirty tricks have to be played in such situations to keep related items together, and that programs like Microsoft Word are even more complex to handle. To Satya's question regarding support for sharing, Petersen replied that the document space is partitioned into two levels, personal and universal; users can create links (similar to symbolic links) in their personal space to existing documents at either level. Jeff Chase asked how one installed active properties; Petersen replied that this is currently done using a modified browser, but that approaches requiring less human intervention will be needed in the future. In response to Mary Baker's question about how collections were identified, Petersen replied that a collection is a query over a document space plus explicit inclusions and exclusions.
Tom Kroeger presented the final paper in this session, ``The Case for Adaptive File Access Modeling''. Using long-term file reference traces from the Coda project, the paper compared the relative merits of a variety of approaches to predicting file accesses. The results show that Finite Multi-Order Context Models have the best predictive power, and that Partitioned Context Models come close, while incurring much less space cost.
Margo Seltzer opened the question period by complimenting the speaker on the completeness of the work. She then asked if there were any similarities between file access prediction and branch prediction in computer architecture; Kroeger replied that he hadn't done a deep enough study of the two problem domains to comment on that. Seltzer then asked whether improved predictive ability translated into substantially improved performance. In response, Kroeger pointed out that only a real implementation could provide this information, and that such an implementation was indeed being planned. He added that he did expect partitioned context models to offer a substantial performance improvement over last successor models, since they are able to eliminate nearly a third of the misses sustained by the latter.