Realistic Virtualization Benchmarks

May 6, 2008 – 10:01 pm

A recent comment about ‘bench-marketing’ caught my attention. I much prefer to see performance analysis of real-world benchmarks, because well-designed studies server as reference examples to guide solid decisions when planning for virtualization. The specific comments were about wanting to see more examples of scale-out performance – a term I use to describe when multiple VMs are stacked onto a single server.

Given my background, one of the first things I asked about when I arrived at VMware was about scaling. I was highly skeptical about how well the scheduler and virtual machine monitor could scale up and out. We ran some simple microbenchmarks using components of the SPECcpu suite on a 16-core Sun x4600 system, using 4-vcpu guests. The results were a surprising to me at least, since I know how hard it is to scale operating system algorithms.

It wasn’t however much of a surprise to the people who had done a lot of the hard work on making the scheduler scale — there is a significant amount of differentiated scheduling technology in the ESX kernel. The primary goal of the scheduler is to provide as close to linear as possible scaling — scale up (adding virtual CPUs to a single guest), and scale-out (adding more guest VMs).

In ESX 1.x, scheduling is done based on single virtual processor (vcpu) guests. In ESX 2.x, the scheduler can provide multiple virtual CPUs in the guest, and uses gang scheduling to co-schedule multiple CPUs at the same time. This ensures that virtual SMP performance in the guest works well (specifically, to avoid having synchronization inside the guest having to wait for another virtual CPU to be ready…). In addition, the ESX scheduler provides a facility for relaxed co-scheduling, so that if a virtual CPU is idle in the guest, the physical CPU can be released for other guests to use.

You can see from the above graph that the scheduler handles multiple VMs handily. For 1-4 VMs, we are under-commited — because we are placing 1-4 x 4vcpu guests, the workload fits nicely on 1-16 physical CPU cores; we see linear performance increase from 1 to 4 VMs, using 4-16 cores. For 5-8 we are overcomitted; i.e. we are attempting to schedule 20-32 virtual cpus on 32 physical cpus. A less optimial scheduler would exhibit a drop-off in performance in the overcommitted system. The ESX scheduler is linear to 16 cores, and then once all CPUs are saturatued, performance is capped. It’s important to note that the throughput doesn’t go down once we over-commit…

My next question was about real workloads. It’s fine to run a microbenchmark with simple CPU intensive jobs, but one might argue that that is an easy target to schedule. I suggested we try Microsoft Exchange and then Oracle, since there are many intervals of sleep/wakeup as these workloads have many wait-events, such as waiting for disk I/O completion and waiting for work via the network ports they listen on.

The scaling study shows that 1 through 8 Microsoft Exchange VMs on a single server provides excellent scale-out performance — we see over 95% scaling from 2 to 16 cores, with a near linear latency.

Also, since Microsoft exchange can be easily horizontally scaled, it leaves a wide set of configuration options open — we can pick and choose the best of scale-up and scale-out to achieve the best performance on a given server configuration. Also, we noticed that the gains for scale-out provide much bigger oppportunity for performance improvement than trying to scale-up a single instance — so when we run Exchange on a virtualized instance we actually get much better performance than native! This is possible because we can take the best scale-up point and then consolidate multiple instances of that on a single server — if the ESX scheduler does a good job we should see better performance!

The other recent study is for scaling 1 through 7 heavy Oracle VMs on a single server. We ran 1 through 7 2vcpu Oracle instances of the DVDstore benchmark on a single Sun x4600 server. At 7 VMs, the machine was using ~15 cores, and scaled to 256GB of RAM… Once again, the latency was almost flat as the VMs scaled out. I use this study often when talking to customers about planning Oracle consolidation on VMware…


With the availability of multi-core systems it’s even more important to have good scale-out performance. Today, the commonly available servers have 16-cores. Just around the corner we’ll see 24, 32 core systems, and if multi-threading makes a return, a single server will soon have 64 logical CPUs. Taking advantage of all those logical CPUs is becoming an additional major reason for using virtualization.

Post a Comment