Apr
27
2010

Terminal Server Virtualization and Migration- Performance Results of Consolidation

It’s been a while since I have had a chance to make a post. I thought it would be good to post on some of the performance experiences that I have endured over the last few weeks with regards to virtualizing tier 1 applications. Over the last few months there have been many blog posts and articles throughout the community regarding virtualizing terminal server applications, especially regarding Terminal Servers. There were some tests performed by Project VRC (http://www.projectvrc.nl) in early January that disclosed a performance problem with hyper threading and terminal servers that negatively effected performance, although this is not seen in all environments and depends on the workload. VMware did acknowledge the problem and provided Project VRC a patch to test and this did improve the HT performance. Although this patch has not been released to the general public yet, it is on the way and could possibly be included in ESX/ESXi Patch Update 2.

Since the launch of vSphere and Intel Nehalem processors, I have been targeting my cluster of Windows terminal servers for consolidation. Prior to Nehalem and vSphere, Terminal Server virtualization was being used by a very small base of users and environments. Throughout the last few months, administrators have been targeting terminal servers more aggressively, primarily because of the improvements in vSphere 4.

The environment that I was consolidating was a very over subscribed configuration, but was configured this way because in the past ‘physical’ world you always had to ‘oversubscribe’ your environment for that possible growth or change potential that may occur down the road. One of the biggest mistakes made in the virtualization consolidation process is assigning too many resources to a virtual machine. In the virtual world I stand by the utmost basic rule - “Start small and add more as your application requires”. I have come across so many users that just assign the equal amount of resources that their physical counterparts have and end up with disappointing results. Granted there are environments that will need a large amount of resources, but after careful planning, research, and analysis you can make the proper decision.  I strongly recommend starting with 1vCPU and the basic 4GB of ram and increase your resources from this point by adding more vCPU’s as the server requires. You will be surprised by the amount of performance you can get out of a single vCPU on a Nehelam or Westmere processor and the amount of scaling that you can do.

A brief breakdown about the environment:

Physical – 2Dell PowerEdge 1950′s Dual Socket – Quad Core Intel Xeon X5460 3.16 Ghz 120 Watt TDP – (Major Heat Generators!!)

The  Servers did not present user with full desktop, instead they were presented with three custom in house developed programs. These programs simply call Adobe Acrobat and Internet Explorer sessions.

Virtual1 - VM running on a Dell R610- Dual Quad 2.66 X5550 Nehalem Procs – W2K03 32Bit OS – 4GB Ram

VMware ESXi 4.0 Update 1 244038 (HA/DRS cluster of R610′s) running on a Equallogic PS4000X Array

If you compare the Physical configuration to the Virtual configuration you may be thinking that this is too much of a consolidation. However, I had the utmost confidence that there would be enough power in the Dell R610′s AND that the physical machines were oversubscribed. Again this was NOT designed incorrectly as this configuration is still popular in physical environments for fail-over and redundancy instead of performance of the environment.

Migration Process:

Considering there were two pterminal servers configured identically, I was able to convert one of the physical terminal servers to a VM and use this VM as my primary. Now,  not only do I have the ability to test and monitor the performance of the VM prior to shutting down the second terminal server, but I also have ability to fail back over to the original server if performance does not meet standards. I converted this VM after hours and assigned it 1vCPU and 4Gb Ram as previously discussed. From my early analysis of the physical machines, I predicted the biggest bottleneck would be the 32bit OS. Since I will only have about 120-150 TS connections when this project is completed, memory usage will be the most taxed resource on this VM.  However, the users are NOT loading a FULL TS server desktop and instead running very LOW memory use programs. If the scenario was different, for example:  full desktops and multiple software variants, then this configuration will not be sufficient, but for this SPECIFIC environment I felt confident that the single VM would handle the load.

In my environment, I utilize three primary tools for my host performance monitoring which all have their specific purpose.

  1. VMware vCenter – We are all aware of the performance metrics that we can gain from vCenter, although they are clear and concise, it tends to lack the visual ‘jazz’ as I like to call it.
  2. Veeam Monitor – This is my primary monitoring software which connects to my vCenter Server and pulls monitoring data via the VMware API. The quality of the information reported is, in my opinion, the BEST on the market. I have tried many other vendors monitoring software and nothing can touch Veeam’s product.
  3. VMware VMA4 Appliance + RESXTOP – This is my more advanced monitoring tool primarily used to look “in depth” at the hyper visor performance such as disk latency, core ready and wait timers and more real-time analysis of performance. I don’t use this tool everyday, but after a new conversion or migration I utilize this tool to monitor the VM for at least 48 hours.

I utilize these three monitoring tools to gain an in depth system analysis to make educated decisions on where to adjust the system resources that are assigned to VM’s. Let’s take a look at two graphs from my Veeam monitor below.  The first graph is a simpler graph that displays overall CPU Average Usage for the VM while the second graph displays CPU Usage Average (%) and CPU Usage Percentage (%).

As you can see in the above graph, I spread the migration over a few weeks to monitor and evaluate the performance of the TS VM under a single vCPU while still maintaining two active servers. As I expected, the performance was well within check and the usage was low as reported in the graph. Things really changed once I switched off the second terminal server and ran a single server under a single vCPU with approx. 120-140 connections. I ran this setup for 4 production business days as shown in the graph above. Although performance is acceptable  and very impressive, the single vCPU was utilizing Intel’s Turbo Boost technology during high peaks. During this period of analysis I was able to gain a better understanding of how the TS environment really performed and what it required. I noticed under a single vCPU the TS would struggle with multi-threaded applications or with users that ran multiple programs at the same time. I also noticed that if a single user had a locked session or frozen program it negatively impacted everyone.

After evaluating the output from Veeam monitor and vCenter along with rESXTOP, I was able to determine that another vCPU wouldn’t hurt. I was also very curious to see how a second vCPU would affect the performance of the server. In the graph above, you can see the two days where the usage went down.  These are the days when the server was switched to a 2vCPU configuration.  CPU Usage dropped and performance gain was immediately noticeable.  I have left the server in this configuration as it gives the best performance/consolidation ratio while not negatively affecting the virtual environment and not over allocating the resources.

The above graph is displaying the CPU Usage in Mhz to better show the use of the vCPU’s. The X5550 processors are clocked at 2.66Ghz with the ability of overclocking to 3.2Ghz when there are enough resources available. There were a few times that the VM did use this technology, but since these graphs are averages they don’t always register those fast bursts of CPU usage. By utilizing esxtop, you can see the impact of turbo boost and how much is being utilized across each individual core in the server.

It has now been almost 3 weeks since the migration of the terminal servers to a virtual machine and I am happy to report that the conversion was a huge success. There have been no performance problems or complaints since migration. Utilization has continued at the same level and I have kept the server configured at 2vCPU. This configuration allows for some future growth with server while closely monitoring the memory usage of the VM. Now, go virtualize your terminal server!!