next up previous contents
Next: 2 Background Up: The Effect of Deceptive Previous: List of Figures   Contents

Subsections


1 Introduction

Disk scheduling has been an integral part of operating system functionality since the early days. Many scheduling algorithms [SCO90,WGP94,BBG$^+$99,HC00,JW91] have been proposed for different purposes. The earlier ones targetted seek reduction to achieve high throughput and low response time. More recently, differentiated quality of service has become an important goal, and scheduling policies have been proposed to provide proportional-share and real-time disk service to applications.

These policies essentially define the manner in which available disk requests are to be reordered to meet the desired objective. Most operating systems enclose such policies in a work-conserving scheduling framework, which selects and schedules requests as soon as (or before) the previous request has finished. At the moment of request selection (i.e. at a decision point), a scheduler requires multiple outstanding requests from active applications to make an effective decision. This means that a work-conserving scheduler becomes capable of providing the desired type of service only to applications that actively and continuously generate disk requests.

Unfortunately, a common trend in application behaviour is to issue synchronous disk requests. Intrinsic dependencies between successive requests may force a process to issue a request only a few hundred microseconds after it has seen the outcome of the previous one. Often, many such processes on the system concurrently issue streams of synchronous disk requests. In such situations, a typical work-conserving scheduler becomes inherently incapable of consecutively servicing multiple requests from any process. Instead, at each decision point, it is forced to multiplex between requests issued by different processes.

Viewing this from a system-wide perspective, schedulers are expected to have an accurate understanding of whether a request-issuing entity, such as a process, is busy or idle. Concurrent streams of synchronous requests induce a condition wherein the scheduler assumes the last request issuing process to have become momentarily idle at decision points. In reality, the process has just not had the chance to issue the subsequent request before the scheduler makes its decision. The scheduler thus suffers from the phenomenon of deceptive idleness.

A seek reducing scheduler may try to exploit spatial locality between successive requests issued by each process. However, deceptive idleness can cause an overall degradation of performance due to inevitable head movement between requests issued by different processes. Seek-reducing schedulers may thus yield low performance under the influence of deceptive idleness. A proportional-share scheduler may be required to deliver proportions such as 2:1 between processes. To achieve this, it becomes inherently required to service two or more requests successively from the first process. Deceptive idleness may prevent this, and cause a significant deviation of achieved proportions from the assigned ones. In general, a work-conserving implementation of a scheduling policy reorders the available requests correctly, but can fail to meet overall system-wide objectives.

Some of these effects feature in commonly used systems and are somewhat well-known, while some others have not been examined previously. This thesis takes a more holistic approach towards the analysis of this problem. We demonstrate how deceptive idleness often happens in general-purpose operating systems for a variety of reasons, ranging from naively chosen programming paradigms to inevitable metadata dependencies. We then characterize its impact on two somewhat representative classes of disk schedulers: seek-reducing and proportional-share schedulers. These effects turn out to be fundamentally different in some ways, and well-known workarounds for some schedulers are ineffective for others.

We discuss workarounds based on prefetching, both from the application and from the kernel. Upon detailed examination, both these prove to be insufficient in many cases. Kernel-level prefetching requires precise prediction of subsequent requests, and incurs heavy costs on error. The more powerful application-based workarounds are not always possible either, and are obviously undesirable in situations where applications are expected to run unmodified.

We therefore propose a simple, transparent solution to deceptive idleness. This is based on a non-work-conserving scheduling (NWCS) framework, i.e. one that selectively lets the disk remain unutilized for short, controlled periods of time at decision points. Additional requests from the last request issuing process may arrive in this period, giving the scheduler the opportunity to make better decisions and transparently eliminate deceptive idleness. NWCS can be implemented in a general-purpose operating system without altering the actual scheduling policy. It automatically supplements any prefetching mechanisms already in place, along with any application-level measures that prevent deceptive idleness. It thus incurs minimal performance overhead (less than 1%) under normal conditions, and substantially improves throughput and adherence to QoS objectives whenever deceptive idleness sets in.

We evaluate this solution on a FreeBSD-4.0 system, using a variety of microbenchmarks and real workloads. Microbenchmarks indicate throughput improvements of up to 4 times, under an extensive range of workload characteristics. Real workloads often embody more random access, and thus yield smaller, though significant benefits. Loss in utilization due to the non-work-conserving scheduler is nearly always exceeded by throughput improvement due to seek optimization. By capitalizing on seek reduction within files, a disk-intensive Apache webserver is found to deliver 56% and 16% more throughput for two configurations (mmap and read). The Andrew Benchmark runs faster by 8%, (with its synchronous read intensive phase by 54%), by reducing seeks both within and between files. Variants of the TPC-B database benchmark exhibit improvements between 4% and 60%. Proportional-share schedulers like Stride become empowered to accurately distribute disk resources according to the assigned proportions, even to processes that synchronously issue disk requests.

1 Thesis contributions

The contributions of this thesis are twofold:

2 Structure of this thesis

We begin with a discussion of some background material in chapter 2 that would help in understanding the rest of this thesis. The chapters following that essentially constitute different cross-sections into the same basic problem. Chapter 3 talks about various reasons and scenarios that cause deceptive idleness. After that, chapter 4 analyzes its impact on a variety of disk scheduler types, at two levels of detail. Having sufficiently explored the problem, we discuss a variety of workarounds and solutions in chapter 5. We elaborate upon and evaluate our proposed non-work-conserving scheduling framework in chapter 6. Finally, we present some related work in this area and conclude.

This thesis differs from the associated paper [ID01b], primarily in its systematic approach to development of the problem. It describes all possible sources of deceptive idleness, and separately analyzes their effects on a variety of disk schedulers. Reading the paper first is recommended for quick understanding. The project website is hosted at [ID01a].


next up previous contents
Next: 2 Background Up: The Effect of Deceptive Previous: List of Figures   Contents
Sitaram Iyer ssiyer@cs.rice.edu