next up previous contents
Next: 5 Workarounds for deceptive Up: The Effect of Deceptive Previous: 3 Sources of deceptive   Contents

Subsections


4 Effects of deceptive idleness

We have seen how situations involving deceptive idleness can arise for various reasons. However, the question remains as to how it actually affects the behaviour of different disk schedulers. In this chapter, we examine how various manifestations of deceptive idleness cause fundamentally different kinds of problems with disk schedulers. In all forthcoming examples, the scheduler is forced to multiplex between request streams from different processes, at every decision point.

1 Seek-reducing disk schedulers

Consider an operating system equipped with any seek reducing scheduler, like Shortest positioning-time first (SPTF) or C-LOOK [WGP94]. These generally improve throughput by reordering available requests to minimize expensive head repositioning operations. Imagine an application base consisting of two processes $ p$ and $ q$, each issuing sequentially positioned 64 KB requests in different parts of the disk (as depicted in figure 4.1).

Figure 4.1: Deceptive Idleness on a seek reducing scheduler: (a) effective seek reduction, and (b) thrashing due to deceptive idleness
\begin{figure}\centerline{
\hspace{2em}
%%\psfig{figure=../fig/p_seekopt_good....
..._seekopt_bad.eps,width=0.36\linewidth}
\hspace{2em}} \vspace{1em}
\end{figure}

In one case, assume that both processes have initially issued all their requests. At almost every decision point, there are suitable requests pending for both processes. In the interest of seek reduction, the scheduler then services several requests for $ p$, moves the head across the disk, and services several requests for $ q$. This yields a throughput of 21 MB/s on our disk.

Instead, suppose both process issue a single synchronous stream of disk requests each, i.e. they issue a request only after the previous one has completed. The first request from $ p$ is serviced, after which the work-conserving scheduler does not wait for $ p$ to generate its subsequent request. It assumes $ p$ to have become idle, seeks across the disk to the pending request issued by $ q$, and services that instead. A few hundred microseconds after this happens, the next request from $ p$ arrives. It is now too late to service $ p$'s nearby followup request, since disk scheduling is non-preemptible. The scheduler is thus forced to alternate between servicing requests issued by the two processes, thereby degenerating to FCFS and achieving a throughput of just 5 MB/s (due to 9ms of average positioning time for every 3ms of data transfer time for 64 KB).

If we had more than two processes issuing requests, then C-LOOK would behave like a round-robin scheduler, servicing just the one available request from each process before moving on to the next. On the other hand, SPTF would converge to the two closest requests on disk and alternate between them, starving the third process. Deceptive idleness does not always cause the scheduler to degenerate to a FCFS scheduler; the effect is often more subtle.

If the disk scheduler receives just one stream of sequentially positioned requests, then despite deceptive idleness, there would be no reason to reposition the head between requests. Any scheduler (even FCFS) would yield high throughput. However, with concurrent, synchronous request streams, we would experience large head seeks for every request serviced - regardless of the seek reduction policy. This is the essence of how deceptive idleness causes seek-reducing schedulers to yield bad performance.

2 Proportional-share disk schedulers

Deceptive idleness can affect schedulers in ways other than just degrading performance. Consider a proportional-share scheduler like Stride [WW95] (with or without simultaneous seek reduction), assigned two resource containers $ A$ and $ B$ and assigned shares of 2:1. Assume disk-intensive applications are bound to these containers. These applications would thus expect the service that a busy request issuer deserves of the proportional-share scheduler. Assume for purposes of discussion that all requests target random blocks and are equally large, thus taking equal amounts of time to service.

Figure 4.2: Deceptive Idleness on a proportional-share scheduler: (a) effective 1:2 allocation, and (b) skewed 1:1 allocation due to deceptive idleness
\begin{figure}\centerline{
\hspace{2em}
%%\psfig{figure=../fig/prop_good.eps,w...
.../fig/prop_bad.eps,width=0.4\linewidth}
\hspace{2em}} \vspace{1em}
\end{figure}

A typical implementation of this scheduler may service a few requests for container $ A$, and correspondingly, a larger number of requests for container $ B$. However, if processes in these containers collectively maintain only one outstanding request at any time, then a work-conserving scheduler becomes incapable of meeting the desired proportions. Like before, it becomes forced to alternately service requests from the two containers, achieving a service ratio closer to 1:1 (see figure 4.2).


1 Experimental characterization on Stride

We now examine a set of experiments that illustrate the deviation incurred on the stride scheduler (with two resource containers), corresponding to different sources of deceptive idleness.

1 No deceptive idleness

Figure 4.3 depicts the desired behaviour of the scheduler, without the presence of deceptive idleness. Proportions of 2:1 are assigned to the two resource containers $ A$ and $ B$, and each container is associated with two processes. These two processes produce enough requests between them to avoid any delusions of idleness. Consequently, we observe container $ A$ always receiving 66% of disk service. For example, after 60 seconds into the experiment, the two containers have received 40 and 20 seconds of service time on the disk respectively.

Figure 4.3: Proportional-share scheduler delivering desired proportions
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/stride2proc2rndreadsmallGo...
...tride2proc2rndreadsmallGood_cum.eps,angle=270,width=0.5\linewidth}} \end{figure}

2 Explicit request dependencies

With only one process bound to each resource container, we experience explicit request dependencies as shown in figure 4.4. The number of active resource containers is almost always seen to be just 1, and we achieve close to 1:1 service allocated to the two containers. Application-level requirements of proportional allocation are not satisfied.

Figure 4.4: Deceptive idleness due to explicit request dependencies
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/stride2rndreadsmallFC_cum....
...i/eff/stride2rndreadsmallFC_cum.eps,angle=270,width=0.5\linewidth}} \end{figure}

3 Consequences of metadata sharing

As we have seen earlier, sharing metadata gives rise to deceptive idleness. Interspersed with file I/O, this leads to an average of about 1.5 resource containers that seem busy to the scheduler at decision points. We thus experience similar proportions being achieved by the stride scheduler (figure 4.5).

Figure 4.5: Deceptive idleness due to shared metadata
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/stride2proc2metadata_cum.e...
...di/eff/stride2proc2metadata_cum.eps,angle=270,width=0.5\linewidth}} \end{figure}

4 Resource contention for memory

As we have seen earlier, a heavily thrashing system can contract deceptive idleness due to one process needing memory that another process is trying to write out to swap. Figure 4.6 demonstrates the behaviour of a proportional-share scheduler under such circumstances - proportions are slightly violated on different occasions, and the stride scheduler tries to eventually compensate for such skew. However, the stride scheduler is expected to deliver accurate proportions on a fine timescale, so such temporary violation is not allowed. Many runs of such experiments produce markedly different graphs, due to unpredictable pageout daemon behaviour.

Figure 4.6: Deceptive idleness due to memory contention
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/stride2proc2readdirtyx5_cu...
...eff/stride2proc2readdirtyx5_cum.eps,angle=270,width=0.5\linewidth}} \end{figure}

This class of problems, though not very pronounced in its effect, actually illustrates the much bigger problem of global resource management. How much CPU or memory resources should the system allocate to a particular process, in order to enable it to issue its intended disk requests? How much CPU or memory blockage should a process tolerate before being declared really idle? These and other questions demand a deeper analysis and greater solutions that we don't attempt in this thesis.

One tentative approach towards solution for memory contention causing deceptive idleness, is that of self-paging as implemented in the Nemesis [Han99] operating system. When each process needs memory, the system writes out pages to swap from its own address space. Thus, blockage of a process for pageout I/O would affect the issue of its own requests only, and the problem of deceptive idleness should at least partially get solved. We have not explored this approach any further in this thesis.

2 More than two resource containers

Some of the above situations may look as though the scheduler has degenerated into a FCFS scheduler, due to complete lack of choice at every decision point. However, this is not always true; when scheduling from more than two resource containers, the effect of deceptive idleness can depend on specific idiosyncrasies of the scheduling algorithm. We examine this behaviour for two interesting and fundamentally different kinds of proportional-share schedulers.

1 Stride, virtual clock based

Assume all requests take approximately the same amount of time to service. Consider three active resource containers $ A,B$ and $ C$, assigned proportions of 1:2:3 respectively. Deceptive idleness prevents two requests from any container being serviced consecutively. However, this restriction still allows the scheduler to meet the desired proportions. The schedule $ C,A,C,B,C,B$, etc. (with any permutations of $ A$ and $ B$'s) suffices for this purpose. Moreover, this schedule will actually be achieved by Stride, since virtual clocks ``remember'' deserved service and compensate if possible.

In contrast, consider proportions of 1:1:3 assigned to the containers. The constraint imposed by deceptive idleness now becomes too restrictive to meet these proportions. Requests are scheduled from $ C,A,C,B,$ etc. and the skewed proportions of 1:1:2 are achieved instead.

Generalizing from these two examples, the Stride scheduler is capable of meeting proportions for a set of resource containers, if and only if the share on any one container does not exceed the sum of shares on the remaining containers. This condition is similar to the triangle inequality extended to polygons, where the length of a side never exceeds the sum of lengths of the remaining sides.

2 Lottery, randomized scheduling

A lottery scheduler [WW94] is a randomized proportional-share scheduling algorithm that operates by selecting a request from all runnable resource containers, with probabilities proportional to their assigned shares. On a somewhat coarse timescale, it intends to regulate the relative resource consumption rates of disk-intensive processes according to the assigned proportions. Though more meaningful for CPU scheduling, we analyze the effect of deceptive idleness on a lottery disk scheduler.

Figure 4.7: Lottery scheduler behaving as desired
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/lottery3proc2rndreadsmallG...
...tery3proc2rndreadsmallGood_inst.eps,angle=270,width=0.5\linewidth}} \end{figure}

Let proportions of $ \alpha_i$ be assigned to resource containers $ A_i$, with sum of shares $ \mathcal{S} = \alpha_1 + \alpha_2 +
\cdots + \alpha_n$. Let the scheduler actually achieve proportions of $ \omega_i$, averaged over reasonable periods of time. Ideally we would expect $ \omega_i = \alpha_i$, thus justifying correct behaviour. This does happen when all applications provide enough requests to the disk scheduler at the critical decision points. Figure 4.7 depicts an experiment wherein three resource containers are assigned proportions of $ \alpha_{1..3} =$ 1:2:3 (i.e. $ 17\%:33\%:50\%$), and each has two processes bound to it, where each process issues randomly positioned back-to-back requests. Allowing for vagaries imposed by the inherent randomness, we observe $ \omega_i = \alpha_i$.

However, this stops being true if deceptive idleness is caused by having only one process bound to each resource container. After each request completes, explicit request dependencies in request generation result in the scheduler falsely assuming that the container that generated the request has momentarily become idle. Consequently the scheduler achieves the distorted proportions of $ \omega_{1..3} =$ 1:1.6:1.8 (i.e. $ 23\%:36\%:41\%$) instead of 1:2:3, as shown in figure 4.8.

Figure 4.8: Lottery scheduler achieving skewed proportions due to deceptive idleness
\begin{figure}%%\centerline{\psfig{figure=../x/di/eff/lottery3rndreadsmallFC_ins...
...eff/lottery3rndreadsmallFC_inst.eps,angle=270,width=0.5\linewidth}} \end{figure}

This strange behaviour deserves a mathematical explanation in the general case. With probability $ \omega_i$, a request from $ A_i$ gets serviced at every decision point. At the next decision point, this $ A_i$ has no pending requests due to deceptive idleness, and is therefore not chosen. So for $ j \neq i$, the conditional probability that a request from $ A_j$ gets chosen now is $ \alpha_j/(\mathcal{S} -
\alpha_i)$. Summing up these conditional probabilities, we get

$\displaystyle \omega_j = \sum_{i=0\atop i\neq j}^n \frac{\alpha_j\;
\omega_i}{\mathcal{S}-\alpha_i} $

   or, $\displaystyle \frac{\omega_j}{\alpha_j} +
\frac{\omega_j}{\mathcal{S}-\alpha_j} =$   $\displaystyle \mbox{same for all $j$}$

Thus, equating for two different values of j, we finally get

$\displaystyle { \omega_i \over \omega_j } = { \alpha_i \; (\mathcal{S}-\alpha_i) \over \alpha_j\;(\mathcal{S}-\alpha_j) }$    or $\displaystyle \omega_i \;\propto \;\alpha_i\;(\mathcal{S}-\alpha_i)$ (1)

Thus, for assigned proportions of $ \alpha_{1..3} =$ 1:2:3, we get $ \mathcal{S} = 6$, which gives us $ \omega_{1..3} =$ 5:8:9 = 1:1.6:1.8, thus tallying quite well with the above experiment. To show that achieved proportions can be very different from assigned ones, $ \alpha_{1..3} =$ 1:10:100 gives us $ \omega_{1..3} =$ 1:9.1:10. When we introduce a very large number of resource containers, $ \mathcal{S}$ grows large, so achieved proportions tend to approach the corresponding assigned proportions. These have been verified through direct experiment.

Thus the impact of deceptive idleness on the lottery scheduler diminishes when we have many resource containers at any time. But even when this is the case, it is common in practice to observe only a small number of resource containers in a system that are actually busy at any time. Secondly, it may seem possible to synthesize a set of assigned proportions for which the achieved proportions are just what one wants: however, this would typically not be possible, since the scheduler becomes sensitive to the number of busy resource containers at any time. Workarounds of this kind are thus difficult, if at all possible.

3 Simultaneous seek reduction

Practical deployment of a proportional-share disk scheduler would typically incorporate seek reduction as described in section 2.1.3. Deceptive idleness may allow such a scheduler to deliver the desired proportions, but may prevent the seek reducing aspect from working as desired. This is because requests for sequential blocks are generated often not at the level of resource containers, but by some entity more fine-grained, like a process or a thread. When this happens, binding multiple processes to each container may prevent deceptive idleness at the level of a resource container, but may not be enough to prevent deceptive idleness for each process. Proportional-share resource allocation typically cares about the former, and seek reduction, if any, is dependent on the latter.


next up previous contents
Next: 5 Workarounds for deceptive Up: The Effect of Deceptive Previous: 3 Sources of deceptive   Contents
Sitaram Iyer ssiyer@cs.rice.edu