Performance By Design A blog devoted to Windows performance, application responsiveness and scala

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Tuesday, 18 June 2013

Virtual memory management in VMware: a case study

Posted on 10:42 by Unknown
This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is here.

Case Study.

The case study reported here is based on a benchmark using a simulated workload that generates contention for machine memory. VMware ESX Server software was installed on a Dell Optiplex 990 with an Intel i7 quad-core processor and 16GB of RAM. (Hyper-Threading was disabled on the processor through the BIOS.) Then four identical Windows Server 2012 guest machines were configured, each configured to run with 8 GB of physical memory. Each Windows guest is running a 64-bit application benchmark program designated as the ThreadContentionGenerator.exe, which the author developed.

The benchmark program was written using the .NET Framework. The program allocates a very large block of private memory and accesses that memory randomly. The benchmark program is multi-threaded, and updates the allocated array using explicit locking to maintain the integrity of its internal data structures. Executing threads also simulate IO waits periodically by going to sleep, instead of executing reads or writes against the file system to avoid exercising the machine’s physical disks. Performance data from both VMware and the Windows guest machines was gathered at one minute intervals for the duration of the benchmark testing, approximately 2 hours. For comparison purposes, a single guest machine was activated to execute the same benchmark in a standalone environment where there was no contention for machine memory. Running standalone, with no memory contention, the benchmark executes in about 30 minutes.


Memory allocation on demand

Figure 2 tracks three key ESX memory performance metrics during the test: the total Memory Granted to the four guest Windows machines, the total Memory Active for the same four guest Windows machines, and the VMware Host’s Memory Usage counter, reported as a percent of the total machine memory available. The total Memory Granted counter increases at the outset in 8 GB steps as each of the Windows guest machines spins up. The memory benchmarking programs were started at just before 9 AM, and continued executing over the next 2 hour period, finally winding down execution near 11 AM. The benchmark programs drive Active Memory to almost 15 GB, shortly after 9 AM, and overall Memory Usage to 98%. (In this configuration, 2% free memory translates into about 300 MB of available physical memory.) 

Notice that the Memory Active counter that purports to measure the guest OS working set of resident pages exhibits some anomalies, presumably associated with the way it is estimated using sampling. There are periodic spikes in the counter when the guest machines have just been activated, but are not active yet in the beginning of the testing period. Toward the end of the benchmark period, after many of the benchmark worker threads have completed, there is another spike, resembling the earlier ones. This later spike shows total guest machine active memory briefly reaching some 20 GB, which, of course, is physically impossible.


Figure 2. Memory Granted, Memory Active and % Memory Used during the benchmark.

As the benchmark programs execute in each of the guest machines, the Memory Granted counter takes a downward plunge from 32 GB down to about 15 GB. The vCenter Performance Counters documentation provides this definition of the counter: “The amount of memory that was granted to the VM by the host. Memory is not granted to the [guest] until it is touched one time and granted memory may be swapped out or ballooned away if the VMkernel needs the memory.” Evidently, during initialization of a Windows Server machine, the OS initially touches every page in physical memory, so initially 8 GB of RAM are granted to each guest machine. But in this case study, there is only 16 GB total physical RAM available. As VMware detects memory contention, the memory granted to each guest machine is evidently reduced through page replacement, using the ballooning and swapping mechanisms.

Figure 3 attempts to show the breakdown of machine memory allocated by adding allotments associated with the VMKernel to the sum of the Active memory consumed by each of the guest machines. The anomalous spike in Active Memory near the end of the benchmark test pushes overall machine memory usage beyond the amount of RAM actually installed, which, as noted above, is physically impossible. This measurement anomaly, possibly associated with a systematic sampling error, is troubling because it makes it difficult in VMware to obtain a precise breakdown of machine memory allocation and usage reliably.

Figure 3. Machine memory allocations, including the areas of memory allocated by the VMKernel.
Figure 3 also shows a dotted line overlay that reports the value of the Memory State counter. The Memory State counter reports the value of memory state at the end of each measurement interval, so these values should be interpreted as sample observations. There were three sample observations when the memory state was “Soft,” indicating ballooning taking place. And there is an earlier sample observation where the memory state was “Hard,” indicating that swapping was triggered. 

Figure 4 shows the same counter data as Figure 3, without the Memory Active counter data. We see that the VMware Host management functions consume about 1.5 GB of RAM altogether. This includes the Memory Overhead counter, which reports the space the shadow page tables occupy. The amount of machine memory that the VMware hypervisor consumes remains flat through out the active benchmarking period.


Figure 4. Machine memory areas allocated by the VMKernel, including memory management “overhead.”

In the next post in this series, we will look at the effectiveness of another VMware memory management feature, transparent memory sharing. 
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in memory management, VMware | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Using QueryThreadCycleTime to access CPU execution timing
    As a prelude to a discussion of the Scenario instrumentation library, I mentioned in the previous post that a good understanding of the cloc...
  • Using xperf to analyze CSwitch events
    Continuing the discussion from the previous blog entry on event-driven approaches to measuring CPU utilization in Windows ... Last time arou...
  • Virtual memory management in VMware: memory ballooning
    This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is  here . Ballooning Ballooni...
  • Correcting the Process level measurements of CPU time for Windows guest machines running under VMware ESX
    Recently, I have been writing about how Windows guest machine performance counters are affected by running in a virtual environment, includi...
  • Virtual memory management in VMware: Swapping
    This is a continuation of a series of blog posts on VMware memory management. The previous post in the series is  here . Swapping VMware has...
  • Deconstructing disk performance rules: final thoughts
    To summarize the discussion so far: While my experience with rule-based approaches to computer performance leads me to be very skeptical of ...
  • Rules in PAL: the Performance Analysis of Logs tool
    In spite of their limitations, some of which were discussed in an earlier blog entry , rule-based bromides for automating computer performan...
  • Measuring application response time using the Scenario instrumentation library.
    This blog post describes the Scenario instrumentation library, a simple but useful tool for generating response time measurements from insi...
  • High Resolution Clocks and Timers for Performance Measurement in Windows.
    Within the discipline of software performance engineering (SPE), application response time monitoring refers to the capability of instrument...
  • Page Load Time and the YSlow scalability model of web application performance
    This is the first of a new series of blog posts where I intend to drill into an example of a scalability model that has been particularly in...

Categories

  • artificial intelligence; automated decision-making;
  • artificial intelligence; automated decision-making; Watson; Jeopardy
  • hardware performance; ARM
  • Innovation; History of the Internet
  • memory management
  • VMware
  • Windows
  • Windows 8
  • windows-performance; application-responsiveness; application-scalability; software-performance-engineering
  • windows-performance; context switches; application-responsiveness; application-scalability; software-performance-engineering

Blog Archive

  • ▼  2013 (14)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  July (3)
    • ▼  June (5)
      • Virtual memory management in VMware: Transparent m...
      • Virtual memory management in VMware: a case study
      • Virtual memory management in VMware.
      • VMware virtual memory management
      • Performance Management in the Virtual Data Center:...
    • ►  May (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2012 (11)
    • ►  December (1)
    • ►  November (2)
    • ►  October (2)
    • ►  July (1)
    • ►  May (1)
    • ►  April (2)
    • ►  March (2)
  • ►  2011 (14)
    • ►  November (3)
    • ►  October (2)
    • ►  May (1)
    • ►  April (1)
    • ►  February (3)
    • ►  January (4)
Powered by Blogger.

About Me

Unknown
View my complete profile