Monday Hot Takes [Episode 5]: Yes, You Can Virtualize GPUs for Compute!

Here are the show notes from Monday Hot Takes Episode 5, recorded on 7/13/2020. Did you know you can virtualize GPUs for compute (for ML/DL/AI)? You can! On Google Cloud, as well as on-premises with VMware (Bitfusion makes that pretty easy!). More in this episode!

Hot Take 1: Virtualize GPUs with Google Compute Engine VMs

Google had big announcements last week leading up to this Google Next this week, including announcing a Compute Engine A2 (Accelerator Optimized) Virtual Machines with NVIDIA Ampere A100 GPUs in the google enterprise cloud. This is the 1st A100 based offering in the public cloud. You can get up to 16 GPUs with 1 VM. They make the multi-GPU workloads faster with NVIDIA HGX server platform which offers NVLink GPU-to-GPU bandwidth. So not only can you virtualize GPUs, you can do it in the public cloud.

Who would want that infrastructure? Any organization running CUDA based ML training, or even traditional HPC workloads. These are just infrastructures, you need to understand the app if you’re the one who is architecting the solution. If you need a 101 on how ML or HPC apps work (both are underpinnings of AI apps), and architectures for them, check out the VMworld presentation I did a couple of years ago with Tony Foster.

These types of apps need lots and lots of server to server or VM to VM communication – and that communication needs to happen much quicker than traditional networking transports can handle, and process lots and lots more data compute in a lower amount of time than CPU architectures can handle. Once you understand that data scientists will figure out which algorithm is to be used, as an ops person you can figure out the architectures to support them.

Hot Take 2: Virtualize GPUs with VMware Bitfusion

VMware Bitfusion is now GA and available to VMware customers as a part of vSphere. This means if you are a current ENT+ customer, you can download Bitfusion as an “add-on” that is accessed via an OVA ( check the install guide on the HardwareAccelerators page). There is also a nominal fee per GPU.

So what does Bitfusion bring to a ML/AI architecture? Bitfusion can virtualize GPUs, meaning an app can run anyplace and access the GPU when it’s time to do the processing. I’m sure they will have some great presentations at VMworld on how all this works.


Info Sources

YouTube is a great data source, all the kids are using and so should you!

Real Talk

If you don’t understand what ML/DL/AI is, and you’re responsible for architecting solutions, you need to get your learn on. Also, if you’re a traditional server vendor, you need to really understand that HPC is happening in the cloud, using virtual machines.

Times are changing, I think we’re going to need to hold onto our hats for the next few years as we try to keep up!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

A note to our visitors

This website has updated its privacy policy in compliance with changes to European Union data protection law, for all members globally. We’ve also updated our Privacy Policy to give you more information about your rights and responsibilities with respect to your privacy and personal information. Please read this to review the updates about which cookies we use and what information we collect on our site. By continuing to use this site, you are agreeing to our updated privacy policy.