Tuesday, October 23, 2012

Virtualized 3D Gaming on VMware View Realized!!

Virtual Shared Hardware Accelerated 3D Gaming

This year at VMworld 2012 we ran a 3D gaming lab in the Green Room for the Hands-On Labs. It was to show off VMware's new vSGA (Virtual Shared Graphics Acceleration) technology in a fun way. vSGA has been introduced in vSphere 5.1 and will be available through View with our next major release due out in the first half of 2013.

VMworld in San Francisco was our first attempt at 3D gaming before vSphere 5.1 became generally available and we were using an Alpha build of View. We (myself, Todd Dayton and Tommy Walker) put the whole environment together in the week leading up to VMworld using loaner hardware donated by SuperMicro, along with internals for those hosts donated by Randy Keener and Nick Geisler, endpoints donated by Dell/Wyse, and Nvidia Quadro 6000 GPUs donated by Aaron Blasius. Aaron Blasius' and Warren Ponder's teams were invaluable in getting us the bits and new fixes along the way to improve performance.

Possibly the hardest part of setting up the "gaming" lab was finding games that would actually *run* on ESXi. Many games would simply dump us back to the desktop with no error (this was particularly true for any racing games we tried). We assume this is because those games are either looking for a supported GPU directly or are trying to use a specific feature on a specific GPU. vSGA supports DirectX 9 and OpenGL 2.1 (?), so it is also possible that some games are trying to use features that we don't support at this time.

For VMworld 2012 San Francisco we settled on two games that had decent performance in the short time we had to try and get things running. Those two games were Minecraft and Borderlands. We ran 6 sessions of each game for a total of 12 desktops across two hosts with a single Nvidia Quadro 6000 in each host. We found, at that time, that we couldn't get adequate performance at resolutions over 800x600, but a late VMtools build got our frame rate up to an average of ~25 fps at 800x600 resolution. We were happy at this point because it was a vast improvement over previous performance.

Here is a very short clip showing the VMworld 2012 San Francisco stations.

After the show was over and we started shutting down stations we found that the Minecraft sessions were crushing the Nvidia Quadro 6000 GPU. We found one session alone could use as much as 76% of the GPU at times. This meant that we were most likely bumping into GPU performance limitations that impacted frames-per-second (FPS) on the Minecraft sessions. The Borderlands sessions used far less GPU resources with a single session generally not pulling more than 35% of a GPU as a max. Still, this shows that the application being used has a wide varying impact on GPU utilization and needs to be tested in any given use case.

Fast Forward to VMworld 2012: Barcelona

Fast forward a month and we were in Barcelona trying to setup the same environment with different host hardware. In the meantime the engineers had been given our feedback and had been looking into what else might be causing performance bottlenecks.

Simon Long was able to get Counter-Strike:Source running in a VM on ESXi and brought this along on an external drive that we added to the environment. At first we seemed to be bumping into some of the same limitations and were only successful at running this game at 800x600 at a fairly consistent 30FPS… but we were noticing a consistent drop in FPS every few seconds that would only last about 1 second. Warren Ponder happened to drop by and got this information to Lawrence Spracklen who was able to turn around and provide us with a change to an advanced setting that removed this bottleneck. That was when we decided to start trying higher resolutions and is the result of what you see in the video at the top of this post. Counter-Strike:Source running at 1920x1080 at an average of 30FPS using View and PCoIP being delivered to Dell/Wyse P25s.

Leaps and Bounds From Where We Started

Here is a clip of our initial test after enabling the setting that cured our performance issues.

Finally, here is a short clip of the process starting with connecting to View through the P25 to actually playing Counter-Strike:Source.

This effort, while short in time, involved so many different individuals that I haven't even called out (sorry to anyone who's name I missed). It's funny how gaming piques everyone's interest. ;-) Still, it was wonderful to have the help of so many different people and get so much feedback. I truly believe this 3D gaming effort has helped springboard the shared virtual 3D effort within VMware and will ultimately benefit our customers in their critical business use cases. Now that people have seen what is possible they are starting to approach me with real-world use case questions from their customers. That is probably what is the most exciting thing that has come from this.

Thanks to Dino Cicciarelli for taking the lead to fund this "science project"! The outcome far outweighs the means to make it happen.  

**** NEWS FLASH ****  

Here is a new post from Lawrence Spracklen that notes just one setting that will help with View video performance. Because this setting is native to Microsoft Windows I don't believe it is something we will control through View or GPOs.

**** UPDATE****

Forgot to add the juicy techno details that made this happen. I can't release too much around unreleased software unfortunately, but here you go:

  • ESXi Host Information (all loaner gear)
    • San Francisco
      • Two (2) SuperMicro - Dual socket 6-Core Xeon's @ 2.0GHz
      • 128GB of RAM per host (though the VMs didn't need more than 2GB each)
      • A single Nvidia Quadro 6000 GPU w/6GB of VRAM in each host (but the motherboard had four (4) PCIe x16 slots in it... not sure if the PSU could have handled four actively cooled GPUs though)
    • Barcelona
      • Two (2) Dell T620s - Single 6-Core Xeon E5-2640 @ 2.5GHz (motherboard was dual socket, but it only had one CPU installed)
      • 32GB of RAM per host
      • A single Nvidia Quadro 6000 GPU w/6GB of VRAM in each host (the motherboard had four (4) PCIe x16 slots in it, although only 2 slots per CPU socket. The PSU did not have PCIe power cables, so we had to get creative to make things work for the show... like I said, this was a "science project")
  • Endpoint Client Devices
    • Dell/Wyse donated P25 Zero Clients with the new Tera2 chipset as well as Z90 dual-core Windows Embedded Thin Clients.
  • Software
    • vSphere / ESXi 
      • San Francisco was an RTM build (I can't remember the build number)
      • Barcelona was the GA build (5.1.0-834536)
    • View
      • San Francisco and Barcelona were the same build, but this was an alpha build that I am not able to list. However, I can say that this version will be labeled 5.2 and is due to release in the first half of 2013.
  • Network
    • In each case we had a flat 1Gbps local flat network switch. We saw sessions reach as high as 70Mbps once we had them running at 1080p @ 30fps. This is due to the very high number of pixel changes each second at that resolution and framerate. Trying to get this type of performance is not a good use case for the WAN. ;-)
My colleague, Simon Long, has just posted on this topic as well. If you don't already follow Simon's blog, you definitely should start! He has loads of great content.


  1. Could you comment on how to get vSGA up and running in the ESXi host? I'm running the latest (patched as of 10/28...838463) version of 5.1 and have a Quadro 4000 card in the box. Every time I fire up my Win7 guest (128MB video RAM) and check the log it shows no hardware resources found and falls back to software rendering. This is driving me batty...I'm spending way too much time Googling for a solution. TIA!

  2. @Minnephibian - I guess there are a couple things that could be causing this.

    1) The Nvidia ESXi VIB has not been publicly released yet, so if you do not have access to the VIB, do not have it installed on ESXi, or if it is not properly loaded, you will never see the GPU.

    2) Even if you *have* the Nvidia VIB, if the GPU you intend to use is the *only* GPU in the host (e.g., if there is no embedded graphics on the motherboard and no other GPU installed) then it will be utilized for the ESXi Console session. One of my test hosts did not contain embedded graphics so I had to purchase a secondary low-end GPU to add in to enable the high-end GPU to be used for hardware 3D acceleration.

    Hope this helps. Please feel free to ask additional questions or clarity!

  3. And there we go...I don't have the nVidia VIB, that's what I thought the problem was (unfortuantly there are multiple Community posts with people saying they've got vSGA...I suspected they didn't know what they were talking about but wasn't able to pin anyone down). FWIW I'm attempting this in a legit box...DL160 G6 with onboard video *AND* the PCI Quadro (I've successfully done DirectPath for the video card with this setup in the past). If I had the VIB I'm pretty sure it'd be working. We (my company) have been lucky enough to work with VMware a bit on the "non-stop clinical desktop" stuff for VMware View so I'm trying to leverage that relationship into a copy of the VIB.

    1. @Minnephibian - were you ever able to track down the VIB? I am curious to hear from others what experience/performance is being seen.

      Without releasing any names, I have been helping out with some customer PoCs regarding hardware accelerated GPU for CAD type solutions and the response/performance has been wonderful so far. I don't have any solid numbers on consolidation ratios yet, but of course this will always depend on the GPU being used as well as the application.

  4. Hi can u ple comment on how have u shared GPU memory / have bypassed it.

    1. I think the post explains it mostly, but we used a new VMware vSphere technology that was released with vSphere 5.1. For using this over View, we used a build of VMware View that has not been released yet but is NOW in Beta and will have the version of 5.2. This version of View will be released within the first half of 2013. The technology that allows VMs to share a hardware GPU is called "vSGA" and you specify the amount of video RAM when configuring the VM.

  5. Tim, congratulations on this wonderful blog. My company is also very much interested in deploying cloud compute gaming platform based on vSGA. There is one information that I couldn't find anywhere: Will be there any possibility to limit the usage of the GPU per VM? Just the way that we limit some CPU and memory resources? Thanks for answers!

    1. Thanks for the comment!

      At this time there is no way to limit how much of the GPU any given VM accesses. I have not seen this on the roadmap, but I am also not in every meeting on the development of this technology within VMware.

      I will try to find out how the GPU scheduling is actually accomplished and how graphic compute resources are scheduled.

  6. @Minnephibian did anybody got these mysterious nvidia VIBs? :)

    1. The NVIDIA VIBs are not publicly available at this time. When they are actually released they will be distributed through NVIDIA.

      Currently, as far as I know, the only non-vmware/non-NVIDIA people who have access to the VIBs are those customers that were approved by VMware for the View Nashville Beta program (this will eventually be released as VMware View v5.2).

      So, for now anyway, the VIBs are NOT publicly available.

  7. Apologies for not keeping up with the comment moderation. I was on vacation over the last 10 days. :-)

  8. Thank you for responding. It's been hard getting good info about this :) Your blog is a great source for VDI vSGA info.

  9. Sorry for double posting- I commented on your youtube video, and then noticed there was a link to this blog post. How did you prevent cursor lag? I am trying to get 3D gaming to work in our View 5.2 environment, however, I am running into a wall. The controls spin wildly on any 3D game I have tried. I have looked through the PCoIP Session Variables ADM template, but I haven't been able to find anything related to this. I feel like I am looking for a "local cursor mode" or "mouse optimized for gaming" (from VMWorkstation), but I can't seem to find it. Any ideas?

    1. There was a setting (I haven't messed with any of this since it went GA from VMware) called "Relative Mouse" that was only available using the Windows View Client (meaning, not on a zero client or Mac client). Certain games do not require relative mouse and you will not run into that problem. Counter-Strike:Source was one of the games that did not require Relative Mouse. That way we wereable to use Wyse P25 zero clients that used the Tera2 chipset and could handle the resolution and framerate