A decade ago, the impact of TLB misses on overally system performance ranged from 5-14% for nominally-sized applications. Memory capacities and application footprints have continued to grow rapidly, while TLB capacity has remained relatively constant in the past decade. Therefore, it is become critical to effectively utilize the available TLB capacity.
In the x86 architecture, the typical page size is 4KB, but the hardware supports both 2MB and 1GB “superpages” (or large pages). There are significant trade-offs in the use of these superpages. The benefit is two-fold. First, the operating system efficiencies in managing memory in larger units. Second, the address translation benefits of both shorter page walks (three levels instead of four), and greater TLB reach (by a factor of 512). However, there are several drawbacks to the use of superpages. First, if superpages are allocated, but not fully used, the additional overhead of faulting in the pages (from disk or by zeroing) is wasted. Second, if only a subset of the superpage is dirtied, then the entire page must be stored when it is evicted. Third, aggressive superpage use can lead to memory fragmentation. To mitigate that fragmentation, pages might be migrated, incurring additional overhead.
Our work aims to thoroughly analyze these trade-offs and explore the design space of superpage management policies within a modern operating system across a wide range of applications. To that end, we have developed a comprehensive data collection and address translation simulation framework. This framework allows us to quickly prototype and explore potential address translation policies and assess TLB hit rates, extraneous I/O and page zeroing, and false dirty pages.
- Weixi Zhu