Performance By Design A blog devoted to Windows performance, application responsiveness and scala

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 24 June 2013

Virtual memory management in VMware: Transparent memory sharing

Posted on 16:19 by Unknown
This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

In this installment, I will discuss the impact and effectiveness of transparent memory sharing, using the performance data that was gathered during a benchmark that stressed VMware's virtual memory management capabilities.


Transparent memory sharing.

Transparent memory sharing is one of the key memory management mechanisms that supports aggressive server consolidation. Dynamically, VMware detects memory pages that are identical within or across guest machine images. When identical pages are detected, VMware maps them to a single page in machine memory. When guest machines are largely idle, transparent memory sharing enables VM to pack guest machine images efficiently into a single hardware platform, especially for machines running the same OS, the same OS version, and the same applications. However, when guest machines are active, the benefits of transparent memory sharing are evidently greatly reduced, as will soon be apparent.

VMware uses a background thread that scans guest machine pages continuously, looking for duplicates. This process is illustrated in Figure 5. Candidates for memory sharing are found by calculating a hash value from the contents of the page and looking for a collision in a Hash Table that is built from the known hash values from other current pages. If a collision is found, then the candidate for sharing is compared to the base page byte by byte. If the contents of the candidate and the base pages match, then VMware points the PTE of the copy to the same page of machine memory backing the base page.

Memory sharing is provisional. VMware uses a Copy on Write mechanism whenever a shared page is modified and can no longer be shared. This is accomplished by flagging the shared page PTE as Read Only. Then, when an instruction executes that attempts to store data in the page, the hardware generates an addressing exception. VMware handles the exception by creating a duplicate, and re-executing the Store instruction that failed against the duplicate.
Transparent memory sharing has great potential benefits, but there is some overhead necessary to support the feature. One source of overhead is the processing by the background thread. There are tuning parameters to control the rate at which these background memory scans run, but, unfortunately, there are no associated performance counters reported that would help the system administrator to adjust these parameters. The other source of overhead results from the Copy on Write mechanism, which entails the handling of additional hardware interrupts associated with the soft page faults. There is no metric that provides the rate these additional soft page faults occur either.


Figure 5. Transparent memory sharing uses a background thread to scan memory pages, compute a hash code value from their contents, and compare to other Hash codes that have already been computed. In the case of a collision, the contents of the page that is a candidate for sharing are compared byte by byte to the collided page. If the pages contain identical content, VMware points both pages to same physical memory location.

In the case study, transparent memory sharing is initially extremely effective – when the guest machines are largely idle. Figure 6 renders the Memory Shared performance counter from each of the guest machines as a stacked area chart. At 9 AM, when the guest machines are still idle, almost all the 8 GB granted to three of the machines (ESXAS12C, ESXAS12D, and ESXAS12E) is being shared by pointing those pages to the machine memory pages that are assigned to the 4th guest machine (ESXAS12B). Together, these three guest machines have about 22 GB of shared memory, which allows VMware to pack 4 x 8-GB OS images into a machine footprint of about 10-12 GB.

However, once the benchmark programs start to execute, the amount of shared memory dwindles to near zero. This is an interesting result. With this workload of identically configured virtual machines, even when the benchmark programs are active, there should still be significant opportunities to share identical code pages. But VMware is apparently unable to capitalize much on this opportunity once the guest machines become active. A likely explanation for the diminished returns from memory sharing is simply that the virtual memory management performed by each of the active guest Windows machines leads to the contents of too many virtual memory pages changing too frequently, something which simply overwhelms the copy detection sharing mechanism.[1]




[1]Since the benchmark programs are also consuming CPU resources, another possible explanation for the lack of memory sharing is severe processor contention that prevents the memory scanning thread from being dispatched while the benchmark programs were active. However, the VMware Host reported overall processor utilization of only about 40-60% throughout most of the active benchmarking period, so this hypothesis was rejected. Here is where some resource accounting that can report the memory scan rate or the amount of time the scan thread was active would be quite helpful.
Figure 6. The impact of transparent memory sharing dwindles to near zero when the benchmarking workloads were active.
In the next post in this series: we will dig into VMware's use of ballooning.
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in memory management, VMware | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Using QueryThreadCycleTime to access CPU execution timing
    As a prelude to a discussion of the Scenario instrumentation library, I mentioned in the previous post that a good understanding of the cloc...
  • Using xperf to analyze CSwitch events
    Continuing the discussion from the previous blog entry on event-driven approaches to measuring CPU utilization in Windows ... Last time arou...
  • Virtual memory management in VMware: memory ballooning
    This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is  here . Ballooning Ballooni...
  • Correcting the Process level measurements of CPU time for Windows guest machines running under VMware ESX
    Recently, I have been writing about how Windows guest machine performance counters are affected by running in a virtual environment, includi...
  • Virtual memory management in VMware: Swapping
    This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is  here . Swapping VMware has...
  • Deconstructing disk performance rules: final thoughts
    To summarize the discussion so far: While my experience with rule-based approaches to computer performance leads me to be very skeptical of ...
  • Rules in PAL: the Performance Analysis of Logs tool
    In spite of their limitations, some of which were discussed in an earlier blog entry , rule-based bromides for automating computer performan...
  • Measuring application response time using the Scenario instrumentation library.
    This blog post describes the Scenario instrumentation library, a simple but useful tool for generating response time measurements from insi...
  • High Resolution Clocks and Timers for Performance Measurement in Windows.
    Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrument...
  • Page Load Time and the YSlow scalability model of web application performance
    This is the first of a new series of blog posts where I intend to drill into an example of a scalability model that has been particularly in...

Categories

  • artificial intelligence; automated decision-making;
  • artificial intelligence; automated decision-making; Watson; Jeopardy
  • hardware performance; ARM
  • Innovation; History of the Internet
  • memory management
  • VMware
  • Windows
  • Windows 8
  • windows-performance; application-responsiveness; application-scalability; software-performance-engineering
  • windows-performance; context switches; application-responsiveness; application-scalability; software-performance-engineering

Blog Archive

  • ▼  2013 (14)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (3)
    • ▼  June (5)
      • Virtual memory management in VMware: Transparent m...
      • Virtual memory management in VMware: a case study
      • Virtual memory management in VMware.
      • VMware virtual memory management
      • Performance Management in the Virtual Data Center:...
    • ►  May (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2012 (11)
    • ►  December (1)
    • ►  November (2)
    • ►  October (2)
    • ►  July (1)
    • ►  May (1)
    • ►  April (2)
    • ►  March (2)
  • ►  2011 (14)
    • ►  November (3)
    • ►  October (2)
    • ►  May (1)
    • ►  April (1)
    • ►  February (3)
    • ►  January (4)
Powered by Blogger.

About Me

Unknown
View my complete profile