www.stephenwagner.com Open in urlscan Pro
104.26.14.95 Public Scan

Back to summary

Submitted URL:
http://www.stephenwagner.com/
Effective URL:
https://www.stephenwagner.com/
Submission: On December 28 via api (December 28th 2023, 4:38:02 am UTC) from US — Scanned from DE

Form analysis
1 forms found in the DOM

GET https://www.stephenwagner.com/

<form method="get" class="searchform " action="https://www.stephenwagner.com/">
  <input type="text" name="s" class="searchfield" placeholder="Search">
  <input type="submit" class="searchsubmit" value="" name="searchsubmit">
</form>

Text Content

THE TECH JOURNAL

* Blog
* Video
* About Stephen Wagner
* Hire Stephen Wagner
* Hire Stephen Wagner
* IT Consulting and Services
* Microsoft Consulting
* VDI Consulting and Services
* VMware Consulting
* IT Sales: Hardware, Licensing, and Solution Design

BlogVideoAbout Stephen Wagner Hire Stephen Wagner — Hire Stephen Wagner— IT
Consulting and Services— Microsoft Consulting— VDI Consulting and Services—
VMware ConsultingIT Sales: Hardware, Licensing, and Solution Design

HOW TO INSTALL THE VSPHERE VCENTER ROOT CERTIFICATE

Certificates, ESXi, vCenter vCSA, VMware 1 Response »
Dec 082023

Today we’ll go over how to install the vSphere vCenter Root Certificate on your
client system.

Certificates are designed to verify the identity of the systems, software,
and/or resources we are accessing. If we aren’t able to verify and authenticate
what we are accessing, how do we know that the resource we are sending
information to, is really who they are?

Installing the vSphere vCenter Root Certificate on your client system, allows
you to verify the identity of your VMware vCenter server, VMware ESXi hosts, and
other resources, all while getting rid of those pesky certificate errors.

Certificate warning when connecting to vCenter vCSA

I see too many VMware vSphere administrators simply dismiss the certificate
warnings, when instead they (and you) should be installing the Root CA on your
system.

WHY INSTALL THE VCENTER SERVER ROOT CA

Installing the vCenter Server’s Root CA, allows your computer to trust, verify,
and validate any certificates issued by the vSphere Root Certification authority
running on your vCenter appliance (vCSA). Essentially this translates to the
following:

* Your system will trust the Root CA and all certificates issued by the Root CA
* This includes: VMware vCenter, vCSA VAMI, and ESXi hosts
* When connecting to your vCenter server or ESXi hosts, you will not be
presented with certificate issues
* You will no longer have vCenter OVF Import and Datastore File Access Issues
* This includes errors when deploying OVF templates
* This includes errors when uploading files directly to a datastore

File Upload in vCenter to ESXi host operation failed

In addition to all of the above, you will start to take advantage of certificate
based validation. Your system will verify and validate that when you connect to
your vCenter or ESXi hosts, that you are indeed actually connecting to the
intended system. When things are working, you won’t be prompted with a
notification of certificate errors, whereas if something is wrong, you will be
notifying of a possible security event.

HOW TO INSTALL THE VCENTER ROOT CA

To install the vCenter Root CA on your system, perform the following:

1. Navigate to your VMware vCenter “Getting Started” page.
* This is the IP or FQDN of your vCenter server without the “ui” after the
address. We only want to access the base domain.
* Do not click on “Launch vSphere Client”.
2. Right click on “Download trusted root CA certificates”, and click on save
link as.

3. Save this ZIP file to your computer, and extract the archive file
* You must extract the ZIP file, do not open it by double-clicking on the
ZIP file.
4. Open and navigate through the extracted folders (certs/win in my case) and
locate the certificates.

5. For each file that has the type of “Security Certificate”, right click on it
and choose “Install Certificate”.
6. Change “Store Location” to “Local Machine”
* This makes your system trust the certificate, not just your user profile
7. Choose “Place all certificates in the following store”, click Browse, and
select “Trusted Root Certification Authorities”.

8. Complete the wizard. If successful, you’ll see: “The import was
successful.”.
9. Repeat this for each file in that folder with the type of “Security
Certificate”.

Alternatively, you can use a GPO with Active Directory or other workstation
management techniques to deploy the Root CAs to multiple systems or all the
systems in your domain.

Posted by Stephen Wagner at 6:08 PM

TEAMS PHONE AND VDI – A MATCH MADE IN THE CLOUD

Microsoft 365, Microsoft Teams, Office 365, VDI No Responses »
Dec 012023

Every organization is looking for ways to equip their mobile workforce, whether
remote employees, travelling sales staff/representatives, or just providing more
ways employees can work efficiently. Today I want to talk about Microsoft Teams
Phone and VDI – a match made in the Cloud.

I’m one of those people who travel frequently and rely not only on having a
reliable working environment, but also having access to telecommunications.

Running Teams Phone on VDI is a clear win in these regards!

VDI AND VOIP, A COMMON STRUGGLE

As most of you know, VDI and VoIP applications can be a major struggle with 3rd
party applications not providing audio optimizations for environments that use
VDI. This commonly results in in sluggish, jolty, delayed, and/or poor audio
quality, in addition to audio processing in your VDI environment which uses
resources on your VDI cluster.

For years, the most common applications including Microsoft Teams, Zoom, and
even Skype for Business provided VDI optimizations to allow high quality
(optimized) audio processing, resulting in almost perfect video/audio
telecommunications via VDI sessions, when implemented properly.

Teams Phone running on a VMware Horizon VDI Session

I was tired of using a 3rd party VoIP app, and wanted a more seamless
experience, so I migrated over to Teams Phone for my organization, and I’m using
it on VDI with VMware Horizon.

MICROSOFT TEAMS PHONE

While I’ve heard a lot about Teams phone, Microsoft’s Phone System, and PSTN
capabilities, I’ve only ever seen it deployed once in a client’s production
environment. This put it on my list of curiosities to investigate in the future
a few years back.

This past week I decided to migrate over to Microsoft Teams Phone for my
organizations telephony and PSTN connectivity requirements. Not only did this
eliminate my VoIP app on my desktops and laptops, but it also removed the
requirement for a problematic VoIP client on my smartphone.

TEAMS PHONE BENEFITS

* Single app for team collaboration and VoIP
* Single phone number (eliminates multiple extensions for multiple computers
and devices)
* Microsoft Phone System provides PBX capabilities
* Cloud Based – No on-premise infrastructure required (except device & internet
for client app)

I regularly use Microsoft Teams on all my desktops, laptops, and VDI sessions,
along with my mobile phone, so the built-in capabilities for VoIP services, in
an already fairly reliable app was a win-win!

I’ll go in to further detail on Teams Phone in a future blog post.

TEAMS PHONE ON VDI

Microsoft Teams already has VDI optimizations for video and audio in the
original client and the new client. This provides an amazing high quality
experience for users, while also offloading audio and video processing from your
VDI environment to Microsoft Teams (handled by the endpoints and Microsoft’s
servers).

When implementing Teams Phone on VDI, you take advantage of these capabilities
providing an optimized and enhanced audio session for voice calls to the PSTN
network.

This means you can have Teams running on a number of devices including your
desktop, laptop, smartphone, VDI session, and have a single PSTN phone number
that you can make and receive calls from, seamlessly.

Pretty cool, hey?

THE FINAL RESULT

In my example, the final result will:

* Reduce my corporate telephony costs by 50%
* Eliminate the requirement for an on-prem PBX system
* Remove the need for a 3rd party VoIP app on my workstations and mobile phone
* Provide a higher quality end-user experience
* Utilize existing VDI audio optimizations for a better experience

Posted by Stephen Wagner at 9:38 AM

DEPLOY AND INSTALL THE NEW MICROSOFT TEAMS FOR VDI

Microsoft, Microsoft Teams, VDI, VMware, VMware Horizon View 15 Responses »
Oct 072023

In this guide we will deploy and install the new Microsoft Teams for VDI
(Virtual Desktop Infrastructure) client, and enable Microsoft Teams Media
Optimization on VMware Horizon.

This guide replaces and supersedes my old guide “Microsoft (Classic) Teams VDI
Optimization for VMware Horizon” which covered the old Classic Teams client and
VDI optimizations. The new Microsoft Teams app requires the same special
considerations on VDI, and requires special installation instructions to
function VMware Horizon and other VDI environments.

You can run the old and new Teams applications side by side in your environment
as you transition users.

Switch between New Teams and old Teams on VDI

Let’s cover what the new Microsoft Teams app is about, and how to install it in
your VDI deployment.

Please note: VDI (Virtual Desktop Infrastructure) support for the new Teams
client went G.A. (Generally Availabile) on December 05, 2023. Additionally,
Classic teams will go end of support on June 30, 2024.

THE NEW MICROSOFT TEAMS APP

On October 05, 2023, Microsoft announced the availability of the new Microsoft
Teams application for Windows and Mac computers. This application is a complete
rebuild from the old client, and provides numerous enhancements with
performance, resource utilization, and memory management.

New Microsoft Teams app VDI optimized with Toggle for new/old version

Ultimately, it’s way faster, and consumes way less memory. And fortunately for
us, it supports media optimizations for VDI environments.

My close friend and colleague, mobile jon, did a fantastic in-depth Deep Dive
into the New Microsoft Teams and it’s inner workings that I highly recommend
reading.

Interestingly enough, it uses the same media optimization channels for VDI as
the old client used, so enablement and/or migrating from the old version is very
simple if you’re running VMware Horizon, Citrix, AVD, and/or Windows 365.

INSTALL NEW MICROSOFT TEAMS FOR VDI

While installing the new Teams is fairly simple for non-VDI environment (by
simply either enabling the new version in the Teams Admin portal, or using your
application manager to deploy the installer), a special method is required to
deploy on your VDI images, whether persistent or non-persistent.

Do not include and bundle the Microsoft Teams install with your Microsoft 365
(Office 365) deployment as these need to be installed separately.

Please Note: If you have deployed non-persistent VDI (Instant Clones), you’ll
want to make sure you disable auto-updates, as these should be performed
manually on the base image. For persistent VDI, you will want auto updates
enabled. See below for more information on configurating auto-updates.

You will also need to enable Microsoft Teams Media Optimization for the VDI
platform you are using (in my case and example, VMware Horizon).

CONSIDERATIONS FOR NEW TEAMS ON VDI

* Auto-updates can be disabled via a registry key
* New Teams client app uses the same VDI media optimization channels as the old
teams (for VMware Horizon, Citrix, AVD, and W365)
* If you have already enabled Media Optimization for Teams on VDI for the old
version, you can simply install the client using the special bulk installer
for all users as shown below, as the new client uses the existing media
optimizations.
* While it is recommended to uninstall the old client and install the new
client, you can choose to run both versions side by side together, providing
an option to your users as to which version they would like to use.

ENABLE MEDIA OPTIMIZATION FOR MICROSOFT TEAMS ON VDI

If you haven’t previously for the old client, you’ll need to enable the Teams
Media Optimizations for VDI for your VDI platform.

For VMware Horizon, we’ll create a GPO and set the “Enable HTML5 Features” and
“Enable Media Optimization for Microsoft Teams”, to “Enabled”. If you have done
this for the old Teams app, you can skip this.

Please see below for the GPO setting locations:

Computer Configuration -> Policies -> Administrative Templates -> VMware View Agent Configuration -> VMware HTML5 Features -> Enable VMware HTML5 Features

Computer Configuration -> Policies -> Administrative Templates -> VMware View Agent Configuration -> VMware HTML5 Features -> VMware WebRTC Redirection Features -> Enable Media Optimization for Microsoft Teams

When installing the VMware Horizon client on Windows computers, you’ll need to
make sure you check and enable the “Media Optimization for Microsoft Teams”
option on the installer if prompted. Your install may automatically include
Teams Optimization and not prompt.

VMware Horizon Client Install with Media Optimization for Microsoft Teams

If you are using a thin client or zero client, you’ll need to make sure you have
the required firmware version installed, and any applicable vendor plugins
installed and/or configurables enabled.

INSTALL NEW MICROSOFT TEAMS CLIENT ON VDI

At this time, we will now install the new Teams app on to both non-persistent
images, and persistent VDI VM guests. This method performs a live download and
provisions as Administrator. If running this un-elevated, an elevation prompt
will appear:

1. Download the new Microsoft Teams Bootstrapper:
https://go.microsoft.com/fwlink/?linkid=2243204&clcid=0x409
2. On your persistent or non-persistent VM, run the following command as an
administrator: teamsbootstrapper.exe -p
3. Restart the VM (and/or seal your image for deployment)

Install the new Teams for VDI (Virtual Desktop Infrastructure) with
teamsbootstrapper.exe

See below for an example of the deployment:

C:\Users\Administrator.DOMAIN\Downloads>teamsbootstrapper.exe -p
{
"success": true
}

You’ll note that running the command returns success equals true, and Teams is
now installed for all users on this machine.

INSTALL NEW MICROSOFT TEAMS CLIENT ON VDI (OFFLINE INSTALLER USING MSIX PACKAGE)

Additionally, you can perform an offline installation by also downloading the
MSI-X packages and running the following command:

teamsbootstrapper.exe -p -o "C:\LOCATION\MSTeams-x64.msix"

New Teams admin provisioned offline install for VDI

For the offline installation, you’ll need to download the appropriate MSI-X file
in additional to the bootstrapper above. See below for download links:

* New Teams MSIX for x86
* New Teams MSIX for x64
* New Teams MSIX for Arm64

DISABLE NEW MICROSOFT TEAMS CLIENT AUTO UPDATES

For non-persistent environments, you’ll want to disable the auto update feature
and install updates manually on your base image.

To disable auto-updates for the new Teams client, configure the registry key
below on your base image:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Teams

Create a DWORD value called “disableAutoUpdate”, and set to value of “1”.

CONFIRM NEW MICROSOFT TEAMS VDI OPTIMIZATION IS WORKING

To confirm that VDI Optimization is working on New Teams, open New Teams, click
the “…” in the top right next to your user icon, click “Settings”, then click on
“About Teams” on the far bottom of the Settings menu.

New Teams showing “VMware Media Optimized”

You’ll notice “VMware Media Optimized” which indicates VDI Optimization for
VMware Horizon is functioning. The text will reflect for other platforms as
well.

UNINSTALL NEW MICROSOFT TEAMS ON VDI

The Teams Boot Strap utility can also remove teams for all users on this machine
as well by using the “-x” flag. Please see below for all the options for
“teamsbootstrapper.exe”:

C:\Users\Administrator.DOMAIN\Downloads>teamsbootstrapper.exe --help
Provisioning program for Microsoft Teams.

Usage: teamsbootstrapper.exe [OPTIONS]

Options:
-p, --provision-admin Provision Teams for all users on this machine.
-x, --deprovision-admin Remove Teams for all users on this machine.
-h, --help Print help

INSTALL NEW MICROSOFT TEAMS ON VMWARE APP VOLUMES / CITRIX APP LAYERING

Using the New Teams bootstrapper, it appears that it evades and doesn’t work
with App Packaging and App attaching technologies such as VMware App Volumes and
Citrix Application layering.

The New Teams bootstrapper downloads and installs an MSIX app package to the
computer running the bootstrapper.

To deploy and install new Teams on VMware App Volumes or Citrix App Layering (or
other app technologies), you’ll most likely need to download and import the MSIX
package in to the application manager and deploy using that.

CONCLUSION

It’s great news that we finally have a better performing Microsoft Teams client
that supports VDI optimizations. With new Teams support for VDI reaching GA, and
with the extensive testing I’ve performed in my own environment, I’d highly
recommend switching over at your convenience!

ADDITIONAL LINKS

* Bulk deploy the new Microsoft Teams desktop client – Microsoft Teams |
Microsoft Learn
* New Microsoft Teams for Virtualized Desktop Infrastructure (VDI) – Microsoft
Teams | Microsoft Learn
* Teams for Virtualized Desktop Infrastructure – Microsoft Teams | Microsoft
Learn (Old Teams Client)
* Media Optimization for Microsoft Teams (vmware.com)

Posted by Stephen Wagner at 1:42 PM

MANAGE YOUR NVIDIA VGPU DRIVERS WITH THE NVIDIA GPU MANAGER FOR VMWARE VCENTER

NVIDIA, VDI, vGPU, VMware, vSphere 3 Responses »
Jul 282023

In May of 2023, NVIDIA released the NVIDIA GPU Manager for VMware vCenter. This
appliance allows you to manage your NVIDIA vGPU Drivers for your VMware vSphere
environment.

Since the release, I’ve had a chance to deploy it, test it, and use it, and want
to share my findings.

In this post, I’ll cover the following (click to skip ahead):

1. What is the NVIDIA GPU Manager for VMware vCenter
2. How to deploy and configure the NVIDIA GPU Manager for VMware vCenter
* Deployment of OVA
* Configuration of Appliance
3. Using the NVIDIA GPU Manager to manage, update, and deploy vGPU drivers to
ESXi hosts

Let’s get to it!

WHAT IS THE NVIDIA GPU MANAGER FOR VMWARE VCENTER

The NVIDIA GPU Manager is an (OVA) appliance that you can deploy in your VMware
vSphere infrastructure (using vCenter and ESXi) to act as a driver (and update)
repository for vLCM (vSphere Lifecycle Manager).

In addition to acting as a repo for vLCM, it also installs a plugin on your
vCenter that provides a GUI for browsing, selecting, and downloading NVIDIA vGPU
host drivers to the local repo running on the appliance. These updates can then
be deployed using LCM to your hosts.

In short, this allows you to easily select, download, and deploy specific NVIDIA
vGPU drivers to your ESXi hosts using vLCM baselines or images, simplifying the
entire process.

SUPPORTED VSPHERE VERSIONS

The NVIDIA GPU Manager supports the following vSphere releases (vCenter and
ESXi):

* VMware vSphere 8.0 (and later)
* VMware vSphere 7.0U2 (and later)

The NVIDIA GPU Manager supports vGPU driver releases 15.1 and later, including
the new vGPU 16 release version.

HOW TO DEPLOY AND CONFIGURE THE NVIDIA GPU MANAGER FOR VMWARE VCENTER

To deploy the NVIDIA GPU Manager Appliance, we have to download an OVA (from
NVIDIA’s website), then deploy and configure it.

See below for the step by step instructions:

DOWNLOAD THE NVIDIA GPU MANAGER

1. Log on to the NVIDIA Application Hub, and navigate to the “NVIDIA Licensing
Portal” (https://nvid.nvidia.com).
2. Navigate to “Software Downloads” and select “Non-Driver Downloads”
3. Change Filter to “VMware vCenter” (there is both VMware vSphere, and VMware
vCenter, pay attention to select the correct).
4. To the right of “NVIDIA GPU Manager Plug-in 1.0.0 for VMware vCenter”, click
“Download” (see below screenshot).

NVIDIA GPU Manager Download Page

After downloading the package and extracting, you should be left with the OVA,
along with Release Notes, and the User Guide. I highly recommend reviewing the
documentation at your leisure.

DEPLOY AND CONFIGURE THE NVIDIA GPU MANAGER

We will now deploy the NVIDIA GPU Manager OVA appliance:

1. Deploy the OVA to either a cluster with DRS, or a specific ESXi host. In
vCenter either right click a cluster or host, and select “Deploy OVF
Template”. Choose the GPU Manager OVA file, and continue with the wizard.
2. Configure Networking for the Appliance
* You’ll need to assign an IP Address, and relevant networking information.
* I always recommend creating DNS (forward and reverse entries) for the IP.
3. Finally, power on Appliance.

We must now create a role and service account that the GPU Manager will use to
connect to the vCenter server.

While the vCenter Administrator account will work, I highly recommend creating a
service account specifically for the GPU Manager that only has the required
permissions that are necessary for it to function.

1. Log on to your vCenter Server
2. Click on the hamburger menu item on the top left, and open
“Administration”.
3. Under “Access Control” select Roles.
4. Select New to create a new role. We can call it “NVIDIA Update Services”.
5. Assign the following permissions:
* Extension Privileges
* Register Extension
* Unregister Extension
* Update Extension
* VMware vSphere Lifecycle Manager Configuration Priveleges
* Configure Service
* VMware vSphere Lifecycle Manager Settings Priveleges
* Read
* Certificate Management Privileges
* Create/Delete (Admins priv)
* Create/Delete (below Admins priv)
* ***PLEASE NOTE: The above permissions were provided in the documentation
and did not work for me (resulted in an insufficient privileges error).
To resolve this, I chose “Select All” for “VMware vSphere Lifecycle
Manager”, which resolved the issue.***
6. Save the Role
7. On the left hand side, navigate to “Users and Groups” under “Single Sign
On”
8. Change the domain to your local vSphere SSO domain (vsphere.local by
default)
9. Create a new user account for the NVIDIA appliance, as an example you could
use “nvidia-svc”, and choose a secure password.
10. Navigate to “Global Permissions” on the left hand side, and click “Add” to
create a new permission.
11. Set the domain, and choose the new “nvidia-svc” service account we created,
and set the role to “NVIDIA Update Services”, and check “Propagate to
Children”.
12. You have now configured the service account.

Now, we will perform the initial configuration of the appliance. To configure
the application, we must do the following:

1. Access the appliance using your browser and the IP you configured above (or
FQDN)
2. Create a new password for the administrative “vcp_admin” account. This
account will be used to manage the appliance.
* A secret key will be generated that will allow the password to be reset,
if required. Save this key somewhere safe.
3. We must now register the appliance (and plugin) with our vCenter Server.
Click on “REGISTER”.
4. Enter the FQDN or IP of your vCenter server, the NVIDIA Service account
(“nvidia-svc” from example), and password.
5. Once the GPU Manager is registered with your vCenter server, the remainder
of the configuration will be completed from the vCenter GPU.
* The registration process will install the GPU Manager Plugin in to VMware
vCenter
* The registration process will also configure a repository in LCM (this
repo is being hosted on the GPU manager appliance).

We must now configure an API key on the NVIDIA Licensing portal, to allow your
GPU Manager to download updates on your behalf.

1. Open your browser and navigate to https://nvid.nvidia.com. Then select
“NVIDIA LICENSING PORTAL”. Login using your credentials.
2. On the left hand side, select “API Keys”.
3. On the upper right hand, select “CREATE API KEY”.
4. Give the key a name, and for access type choose “Software Downloads”. I
would recommend extending the key validation time, or disabling key
expiration.
5. The key should now be created.
6. Click on “view api key”, and record the key. You’ll need to enter this in
later in to the vCenter GPU Manager plugin.

And now we can finally log on to the vCenter interface, and perform the final
configuration for the appliance.

1. Log on to the vCenter client, click on the hamburger menu, and select
“NVIDIA GPU Manager”.
2. Enter the API key you created above in to the “NVIDIA Licensing Portal API
Key” field, and select “Apply”.
3. The appliance should now be fully configured and activated.
4. Configuration is complete.

We have now fully deployed and completed the base configuration for the NVIDIA
GPU Manager.

USING THE NVIDIA GPU MANAGER TO MANAGE, UPDATE, AND DEPLOY VGPU DRIVERS TO ESXI
HOSTS

In this section, I’ll be providing an overview of how to use the NVIDIA GPU
Manager to manage, update, and deploy vGPU drivers to ESXi hosts. But first,
lets go over the workflow…

The workflow is a simple one:

1. Using the vCenter client plugin, you choose the drivers you want to deploy.
These get downloaded to the repo on the GPU Manager appliance, and are made
available to Lifecycle Manager.
2. You then use Lifecycle Manager to deploy the vGPU Host Drivers to the
applicable hosts, using baselines or images.

As you can see, there’s not much to it, despite all the configuration we had to
do above. While it is very simple, it simplifies management quite a bit,
especially if you’re using images with Lifecycle Manager.

To choose and download the drivers, load up the plugin, use the filters to
filter the list, and select your driver to download.

NVIDIA GPU Manager downloading vGPU Driver

As you can see in the example, I chose to download the vGPU 15.3 host driver.
Once completed, it’ll be made available in the repo being hosted on the
appliance.

Once LCM has a changed to sync with the updated repos, the driver is then made
available to be deployed. You can then deploy using baselines or host images.

LCM Image Update with NVIDIA vGPU Driver from NVIDIA GPU Manager

In the example above, I added the vGPU 16 (535.54.06) host driver to my clusters
update image, which I will then remediate and deploy to all the hosts in that
cluster. The vGPU driver was made available from the download using GPU Manager.

Posted by Stephen Wagner at 3:35 PM

VMWARE VSPHERE VM PLACEMENT RULES

ESXi, VMware, vSphere No Responses »
Jul 252023

When it comes to virtualized workloads, one thing I commonly see overlooked in
the design of the solution, is the placement of workloads. In this post, I want
to cover VMware vSphere VM placement rules using the “VM/Host Rules” feature.

This is a feature that I commonly see overlooked and not configured, especially
in smaller single cluster environments, however I’ve also seen this happen in
very large scale environments as well.

Let’s cover the why, what, who, and how…

VM WORKLOADS

While VMware vSphere does have a number of technologies built in for redundancy,
load-balancing, and availability, as part of the larger solution we often find
our workloads, specifically 3rd party platforms, with their own solutions that
accomplish the same thing.

We need to identify which HA (High Availability) or redundancy solution to use,
based on the application, service, and how it works.

For example, using VMware vSphere HA, or High Availability, if vCenter (and/or
vCLS) detects a host goes offline, it can restart the workload on other online
hosts. There is time associated with the detection and boot time, resulting in a
loss of service during this period.

Third party solutions often have their own high availability or redundancy built
in to the solution, such as Microsoft Active Directory. In this case with a
standard configuration, at any time, any domain controller can respond to a
clients request for resources. If one DC goes offline, other DCs can respond to
the request resulting in no downtime.

Obviously, in the case of Active Directory Domain Controllers, you’d much prefer
to have multiple DCs in your environment, instead of using one with vSphere HA.

Additionally, if you did have multiple domain controllers, you’d want to make
sure they aren’t all placed on the same ESXi host. This is where we start to
incorporate VM placement in to our solution.

VM PLACEMENT

When it comes to 3rd party solutions like mentioned above, we need to identify
these workloads and factor them in to the design of the solution we are either
implementing, maintaining, or improving.

EXAMPLE OF VM WORKLOADS USED WITH VM PLACEMENT

A few examples of these workloads with their own load-balancing and availability
technologies:

* Microsoft Windows Active Directory Domain Controllers
* Microsoft Windows Servers running DNS/DHCP Servers
* Virtualized Active/Active or Active/Passive Firewall Appliances
* VMware Horizon UAG (Unified Access Gateway) configured in HA mode
* Other servers/services that have their own availability systems

As you can see, the applications all have their own special solution for
availability, so we must insure the different “nodes” or “instances” are running
on different ESXi hosts to avoid a host failure bringing down the entire
solution.

Unless otherwise specified by the 3rd party vendor, I would recommend using
VM/Host Rules in combination with vSphere DRS and HA.

CONFIGURING VM PLACEMENT WITH VM/HOST RULES

To configure these rules, follow the instructions below:

1. Log on to your VMware vCenter Server
2. Select a Cluster
3. Click on the “Configure” tab, and then “VM/Host Rules”
* Here you can Add/Edit/Delete VM Host Rules
4. Click on “Add”, and give the rule a new name (Example: Domain Controllers)
5. For “Type”, select “Separate Virtual Machines”
6. Click “Add” and select your Domain Controllers and add them to the rule.

Domain Controller VM Placement VM Host Rule

After you click “OK”, the rule should now be saved, and DRS will make sure these
VMs are now running on separate hosts.

Below you can see another example of a configured system, separating 2
Active/Passive Firewall appliances.

VM/Host Rules for Firewall Appliances

As you can see, VM placement with VM/Host Rules is very easy to configure and
deploy.

ADDITIONAL CONSIDERATIONS

Note, if you implement these rules and do not have enough hosts to fullfill the
requirements, the hosts may fail to be evacuated by DRS when placing in
maintenance mode, or remediating with vLCM (Lifecycle Manager).

In this case, you’ll need to manually vMotion the VM’s to other hosts (to
violate the rule) or shut some down.

Posted by Stephen Wagner at 3:54 AM

ESXI 8.0 ON HPE PROLIANT DL360P GEN8

ESXi, HPE, VMware, vSphere 7 Responses »
Jul 242023

A few months ago, you may have seen my post detailing my experience with ESXi
7.0 on HP Proliant DL360p Gen8 servers. I now have an update as I have
successfully loaded ESXi 8.0 on HPE Proliant DL360p Gen8 servers, and want to
share my experience.

It wasn’t as eventful as one would have expected, but I wanted to share what’s
required, what works, and stability observations.

Please note, this is NOT supported and NOT recommended for production
environments. Use the information at your own risk.

A special thank you goes out to William Lam and his post on Homelab
considerations for vSphere 8, which provided me with the boot parameter required
to allow legacy CPUs.

ESXI ON THE DL360P GEN8

With the release of vSphere 8.0 Update 1, and all the new features and
functionality that come with the vSphere 8 release as a whole, I decided it was
time to attempt to update my homelab.

In my setup, I have the following:

* 2 x HPE Proliant DL360p Gen8 Servers
* Dual Intel Xeon E5-2660v2 Processors in each server
* USB and/or SD for booting ESXi
* No other internal storage
* NVIDIA A2 vGPU (for use with VMware Horizon)
* External SAN iSCSI Storage

Since I have 2 servers, I decided to do a fresh install using the generic
installer, and then use the HPE addon to install all the HPE addons, drivers,
and software. I would perform these steps on one server at a time, continuing to
the next if all went well.

I went ahead and documented the configuration of my servers beforehand, and had
already had upgraded my VMware vCenter vCSA appliance from 7U3 to 8U1. Note,
that you should always upgrade your vCenter Server first, and then your ESXi
hosts.

To my surprise the install went very smooth (after enabling legacy CPUs in the
installer) on one of the hosts, and after a few days with no stability issues, I
then proceeded and upgraded the 2nd host.

I’ve been running with 100% for 25+ days without any issues.

THE PROCESS – INSTALLING ESXI 8.0

I used the following steps to install VMware vSphere ESXi 8 on my HP Proliant
Gen8 Server:

1. Download the Generic ESXi installer from VMware directly.
1. Link: Download VMware vSphere
2. Download the “HPE Custom Addon for ESXi 8.1”.
1. Link: HPE Custom Addon for ESXi 8.0 U1 June 2023
2. Other versions of the Addon are here: HPE Customized ESXi Image.
3. Boot server with Generic ESXi installer media (CD or ISO)
* IMPORTANT: Press “Shift + o” (Shift key, and letter “o”) to interrupt the
ESXi boot loader, and add “AllowLegacyCPU=true” to the kernel boot
parameters.
4. Continue to install ESXi as normal.
* You may see warnings about using a legacy CPU, you can ignore these.
5. Complete initial configuration of ESXi host
6. Mount NFS or iSCSI datastore.
7. Copy HPE Custom Addon for ESXi zip file to datastore.
8. Enable SSH on host (or use console).
9. Place host in to maintenance mode.
10. Run “esxcli software vib install -d
/vmfs/volumes/datastore-name/folder-name/HPE-801.0.0.11.3.1.1-Jun2023-Addon-depot.zip”
from the command line.
11. The install will run and complete successfully.
12. Restart your server as needed, you’ll now notice that not only were HPE
drivers installed, but also agents like the Agentless management agent, and
iLO integrations.

After that, everything was good to go… Here you can see version information from
one of the ESXi hosts:

VMware ESXi version 8.0.1 Build 21813344 on HPE Proliant DL360p Gen8 Server

WHAT WORKS, AND WHAT DOESN’T

I was surprised to see that everything works, including the P420i embedded RAID
controller. Please note that I am not using the RAID controller, so I have not
performed extensive testing on it.

HPE P420i RAID Controller with VMware vSphere ESXi 8

All Hardware health information is present, and ESXi is functioning as one would
expect if running a supported version on the platform.

ADDITIONAL INFORMATION

Note that with vSphere 8, VMware is deprecating vLCM baselines. Your focus
should be to update your ESXi instances using cluster image based update images.
You can incorporate vendor add-ons and components to create a customized image
for deployment.

Posted by Stephen Wagner at 1:24 PM

HYBRID AZURE AD JOIN WITH AZURE AD CONNECT FOR NON-PERSISTENT VDI WITH VMWARE
HORIZON

Azure AD, VDI, VMware, VMware Horizon View 6 Responses »
Jul 232023

With the release of VMware Horizon 2303, VMware Horizon now supports Hybrid
Azure AD Join with Azure AD Connect using Instant Clones and non-persistent VDI.

So what exactly does this mean? It means you can now use Azure SSO using PRT
(Primary Refresh Token) to authenticate and access on-premise and cloud based
applications and resources.

What else? It allows you to use conditional access!

WHAT IS HYBRID AZURE AD JOIN, AND WHY WOULD WE WANT TO DO IT WITH AZURE AD
CONNECT?

Historically, it was a bit challenging when it came to Understanding Microsoft
Azure AD SSO with VDI (click to read the post and/or see the video), and special
considerations had to be made when an organization wished to implement SSO
between their on-prem non-persistent VDI deployment and Azure AD.

Hybrid Azure AD Joined Login

AZURE AD SSO, THE OLD WAY

The old way to accomplish this was to either implement Azure AD with ADFS, or
use Seamless SSO. ADFS was bulky and annoying to manage, and Seamless SSO was
actually intended to enable SSO on “downlevel devices” (older operating systems
before Windows 10).

For customers without ADFS, I would always recommend using Seamless SSO to
enable SSO on non-persistent VDI Instant Clones, until now!

AZURE AD SSO, THE NEW WAY WITH AZURE AD CONNECT AND AZURE SSO PRTS

According to the release notes for VMware Horizon 2303:

> Hybrid Azure Active Directory for SSO is now supported on instant clone
> desktop pools. See KB 89127 for details.

This means we can now enable and use Azure SSO with PRTs (Primary Refresh
Tokens) using Azure AD Connect and non-persistent VDI Instant Clones.

AZURE SSO WITH PRT AND NON-PERSISTENT VDI

This is actually a huge deal because not only does it allow us to use the
preferred method for performing SSO with Azure, but it also allows us to start
using fancy Azure features like conditional access!

REQUIREMENTS FOR HYBRID AZURE AD JOIN WITH NON-PERSISTENT VDI AND AZURE AD
CONNECT

In order to utilize Hybrid Join and PRTs with non-persistent VDI on Horizon,
you’ll need the following:

* VMware Horizon 2303 (or later)
* Active Directory
* Azure AD Connect (Implemented, Configured, and Functioning)
* Azure AD Hybrid Domain Join must be enabled
* OU and Object filtering must include the non-persistent computer objects
and computer accounts
* Create a VMware Horizon Non-Persistent Desktop Pool for Instant Clones
* “Allow Reuse of Existing Computer Accounts” must be checked

When you configure this, you’ll notice that after provisioning a desktop pool
(or pushing a new snapshot), that there may be a delay for PRTs to be issued.
This is expected, however the PRT will be issued eventually, and subsequent
desktops shouldn’t experience issues unless you have a limited number available.

*Please note: VMware still notes that ADFS is the preferred way for fast
issuance of the PRT.

While VMware does recommend ADFS for performance when issuing PRTs, in my own
testing I had no problems or complaints, however when deploying this in
production I’d recommend that because of the PRT delay after deploying the pool
or a new snapshot, to do this after hours or SSO will not function for some
users who immediately get a new desktop.

ADDITIONAL CONSIDERATIONS

Please note the following:

* When switching from ADFS to Azure AD Connect, the sign-in process may change
for users.
* You must prepare the users for the change.
* When using locally stored identifies and/or cached credentials, enabling
Azure SSO may change the login process, or cause issues for users signing in.
* You may have to delete saved credentials in the users persistent profile
* You may have to adjust GPOs to account for Azure SSO
* You may have to modify settings in your profile persistent solution
* Example: “RoamIdentity” on FSLogix
* I recommend testing before implementing
* Test Environment
* Test with new/blank user profiles
* Test with existing users

If you’re coming from an environment that was previously using Seamless SSO for
non-persistent VDI, you can create new test desktop pools that use newly created
Active Directory OU containers and adjust the OU filtering appropriately to
include the test OUs for synchronization to Azure AD with Azure AD Connect. This
way you’re only syncing the test desktop pool, while allowing Seamless SSO to
continue to function for existing desktop pools.

HOW TO TEST AZURE AD HYBRID JOIN, SSO, AND PRT

To test the current status of Azure AD Hybrid Join, SSO, and PRT, you can use
the following command:

dsregcmd /status

To check if the OS is Hybrid Domain joined, you’ll see the following:

+----------------------------------------------------------------------+
| Device State |
+----------------------------------------------------------------------+

AzureAdJoined : YES
EnterpriseJoined : NO
DomainJoined : YES
DomainName : DOMAIN

As you can see above, “AzureADJoined” is “YES”.

Further down the output, you’ll find information related to SSO and PRT Status:

+----------------------------------------------------------------------+
| SSO State |
+----------------------------------------------------------------------+

AzureAdPrt : YES
AzureAdPrtUpdateTime : 2023-07-23 19:46:19.000 UTC
AzureAdPrtExpiryTime : 2023-08-06 19:46:18.000 UTC
AzureAdPrtAuthority : https://login.microsoftonline.com/XXXXXXXX-XXXX-XXXXXXX
EnterprisePrt : NO
EnterprisePrtAuthority :
OnPremTgt : NO
CloudTgt : YES
KerbTopLevelNames : XXXXXXXXXXXXX

Here we can see that “AzureAdPrt” is YES which means we have a valid Primary
Refresh Token issued by Azure AD SSO because of the Hybrid Join.

Posted by Stephen Wagner at 1:31 PM

HPE NIMBLE AND HPE ALLETRA 6000 SAN IP ZONING

HPE, HPE Alletra, HPE Nimble No Responses »
Mar 122023

Are you running an HPE Nimble or HPE Alletra 6000 SAN in your VMware environment
with iSCSI? A commonly overlooked component of the solution architecture and
configuration when using these SAN’s is HPE Nimble and HPE Alletra 6000 SAN IP
Zoning with an ISL (Inter-Switch Link).

When it comes to implementing these SANs, it’s all about data availability,
performance, optimizations, and making sure it’s done properly.

I want to share with you some information, as I feel this important and required
configuration consideration is often ignored, with many IT professionals not
being aware it exists.

HPE Alletra 6000 SAN

I recently had a customer that purchased and deployed two HPE Alletra 6010 SANs
for their VMware environment, where I was contracted to implement these SANs.
Even though the customer purchased HPE Technical Installation and Startup
Services, the HPE installer was not aware of IP Address Zoning and it’s purpose,
advising us to disable it.

I actually had to advise the technician that numerous HPE technical documents
recommended to enable and configure it when you have an ISL. He then researched
it, and confirmed we should have it enabled and configured.

IP ADDRESS ZONING

When you have SAN switches that include an ISL (inter-switch link) that are
connected to an HPE Nimble or HPE Alletra SAN, it’s preferred not to have
traffic go across that interlink, as it creates additional hops for packets, as
well as increases latency.

However, in the event of a switch, NIC, and/or path failure, we do want to have
the interlink available to facilitate data access and be available when
required.

Using NCM (Nimble Connection Manager) and SCM (Storage Connection Manager) on
your VMware ESXi hosts, the HPE Nimble and HPE Alletra storage solution can
intelligently choose when to use the interlink depending on paths available, and
the current health of SAN connectivity. It does this through IP Address Zones.

You must have the NCM or SCM plugin installed on your ESXi hosts to be able to
use IP Address Zones, and use the HPE Nimble Storage path selection policy
(NIMBLE_PSP_DIRECTED).

IMPLEMENTING IP ADDRESS ZONES

To implement this, you’ll need to assign an IP Zone to each of your switches.
Please see below for a table from HPE Alletra documentation:

HPE IP Address Zone Types for ISL Configuration

You can choose to either bisect the subnet, or use a method of dedicating even
numbered IPs to one switch/zone, and dedicating odd numbered IPs to the other
switch/zone.

This allows you to zone each switch, and keep traffic in the zone avoiding use
of the interlink which would cause additional hops and latency. You’ll need to
configure on the storage array the Zone Type you selected.

In the event of a failure, the interlink will be available for non-optimized
path access to ensure continued data access.

ADDITIONAL (EXTERNAL) LINKS

* HPE NimbleOS – IP Address Zones in Subnets
* HPE Nimble Storage Deployment Considerations for Networking (IP Zoning on
Page 17 & 18)
* HPE Installation Guide for HPE Alletra 6000 (IP Address Zones in Subnets on
Page 19)

Posted by Stephen Wagner at 5:43 AM

PROCEDURE FOR UPDATING ENHANCED LINKED MODE VCENTER SERVER INSTANCES

ESXi, vCenter vCSA, VMware, vSphere 4 Responses »
Mar 062023

You might ask if/what the procedure is for updating Enhanced Linked Mode vCenter
Server Instances, or is there even any considerations that apply?

vCenter Enhanced Link Mode is a feature that allows you to link a total of 15
vCenter Instances in to a single, Single Sign On (SSO) vSphere domain. This
allows you to have a single set of credentials to manage all 15 instances, as
well as the ability to manage all of them from a single pane of glass.

When it comes to environments with multiple vCenter instance and/or vCSA
appliances, this really helps manageability, and visibility.

ENHANCED LINKED MODE UPGRADE CONSIDERATIONS

To answer the question above: Yes, when you’re running Enhanced Linked Mode
(ELM) to link multiple vCenter Server, special considerations and requirements
exist when it comes to updating or upgrading your vCenter Server instances and
vCSA appliances.

Multiple VMware vCenter Server Instances (vCSA) Running in Enhanced Link Mode
(ELM)

Not only have these procedures been documented in older VMware documentation,
but I recently reviewed and confirmed the best practices with VMware GSS while
on a support case.

PROCEDURE FOR UPDATING VCENTER WITH ELM

1. Configure/Confirm that the vCenter File-Based Backup in VAMI is configured,
functioning, and that you are creating valid file based backups.
2. Create a manual file-based backup with VAMI
3. Power down all vCenter Instances and vCSA Appliances in your environment
4. Perform a cold snapshot of all vCenter Instances and vCSA appliances
* *This is critical* – You need a valid offline snapshot taken of all
appliances powered off at the same point in time
5. Power on the vCenter/vCSA Virtual Machines (VMs)
6. Perform the update or upgrade

RECOVERING FROM A FAILED UPDATE

IMPORTANT: In the event that an update or upgrade fails, you must revert all
vCenter Instances and/or vCSA appliances back to the previous snapshot!

You cannot selectively choose single or individual instances, as this may cause
mismatches in data and configuration between the instances as they have
databases that are not in sync, and are from different points in time.

Additionally, if you are in a situation where you’re considering or planning to
restore previous snapshots to recover from a failed update, you should do so
sooner than later. As time progresses, service accounts and identifiers update
in the VMware vSphere infrastructure. Delaying the restore too long could cause
this information to get out of sync with the ESXi hosts after performing a
snapshot restore/revert.

ADDITIONAL (EXTERNAL) LINKS AND RESOURCES

* File-Based Backup and Restore of vCenter Server
* vCenter Enhanced Linked Mode

Posted by Stephen Wagner at 3:19 AM

NVIDIA VGPU TROUBLESHOOTING GUIDE – HOW TO TROUBLESHOOT VGPU ON VMWARE

End User Computing, NVIDIA, VDI, vGPU, VMware, VMware Horizon View, vSphere 1
Response »
Mar 052023

In this NVIDIA vGPU Troubleshooting Guide, I’ll help show you how to
troubleshoot vGPU issues on VMware platforms, including VMware Horizon and
VMware Tanzu. This guide applies to the full vGPU platform, so it’s relevant for
VDI, AI, ML, and Kubernetes workloads.

This guide will provide common troubleshooting methods, along with common issues
and problems associated with NVIDIA vGPU as well as their fixes.

Please note, there are numerous other additional methods available to
troubleshoot your NVIDIA vGPU deployment, including 3rd party tools. This is a
general document provided as a means to get started learning how to troubleshoot
vGPU.

NVIDIA VGPU

NVIDIA vGPU is a technology platform that includes a product line of GPUs that
provide virtualized GPUs (vGPU) for Virtualization environments. Using a vGPU,
you can essentially “slice” up a physical GPU and distribute Virtual GPUs to a
number of Virtual Machines and/or Kubernetes containers.

NVIDIA vGPU Installed in VMware ESXi Host

These virtual machines and containers can then use these vGPU’s to provide
accelerated workloads including VDI (Virtual Desktop Infrastructure), AI
(Artificial Intelligence), and ML (Machine Learning).

While the solution works beautifully, when deployed incorrectly or if the
solution isn’t maintained, issues can occur requiring troubleshooting and
remediation.

At the end of this blog post, you’ll find some additional (external) links and
resources, which will assist further in troubleshooting.

TROUBLESHOOTING INDEX

Below, you’ll find a list of my most commonly used troubleshooting methods.

Please click on an item below which will take you directly to the section in
this post.

* Using “nvidia-smi”
* Running “nvidia-smi” on the ESXi host
* Running “nvidia-smi” on the VM guest
* Virtual Machine log files
* ESXi Host log files
* Using “dxdiag” in the guest VM
* Using the “VMware Horizon Performance Monitor”
* NVIDIA vGPU Known Issues

COMMON PROBLEMS INDEX

Below is a list of problems and issues I commonly see customers experience or
struggle with in their vGPU enabled VMware environments.

Please click on an item below which will take you directly to the section in
this post.

* XID Error Codes
* vGPU Licensing
* vGPU Type (vGPU Profile) mismatch
* vGPU or Passthrough with 16GB+ of Video RAM Memory
* vGPU VM Freezes during VMware vMotion
* “ERR!” State
* vGPU Driver Mismatch
* Upgrading NVIDIA vGPU
* VMware Horizon Black Screen
* VM High CPU RDY (CPU Ready)
* vGPU Profiles Missing from VMware Horizon

TROUBLESHOOTING

USING “NVIDIA-SMI”

The NVIDIA vGPU driver comes with a utility called the “NVIDIA System Management
Interface”. This CLI program allows you to monitor, manage, and query your
NVIDIA vGPU (including non-vGPU GPUs).

NVIDIA vGPU “nvidia-smi” command

Simply running the command with no switches or flags, allow you to query and
pull basic information on your vGPU, or multiple vGPUs.

For a list of available switches, you can run: “nvidia-smi -h”.

RUNNING “NVIDIA-SMI” ON THE ESXI HOST

To use “nvidia-smi” on your VMware ESXi host, you’ll need to SSH in and/or
enable console access.

When you launch “nvidia-smi” on the ESXi host, you’ll see information on the
physical GPU, as well as the VM instances that are consuming a virtual GPU
(vGPU). This usage will also provide information like fan speeds, temperatures,
power usage and GPU utilization.

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2108966 C+G VM-WS02 3904MiB |
| 0 N/A N/A 2108989 C+G VM-WS01 3904MiB |
+-----------------------------------------------------------------------------+

This will aid with troubleshooting potential issues specific to the host or the
VM. The following pieces of information are helpful:

* Driver Version
* GPU Fan and Temperature Information
* Power Usage
* GPU Utilization (GPU-Util)
* ECC Information and Error Count
* Virtual Machine VMs assigned a vGPU
* vGPU Type (C+G means Compute and Graphics)

Additionally, instead of running once, you can issue “nvidia-smi -l x” replacing
“x” with the number of seconds you’d like it to auto-loop and refresh.

Example:

nvidia-smi -l 3

The above would refresh and loop “nvidia-smi” every 3 seconds.

For vGPU specific information from the ESXi host, you can run:

nvidia-smi vgpu

This command shows information on the vGPU instances currently provisioned.

There are also a number of switches you can throw at this to get even more
information on vGPU including scheduling, vGPU types, accounting, and more. Run
the following command to view the switches:

nvidia-smi vgpu -h

Another common switch I use on the ESXi host with vGPU for troubleshooting is:
“nvidia-smi -q”, which provides lots of information on the physical GPU in the
host:

[root@ESXi-HOST:~] nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Sat Mar 4 21:26:18 2023
Driver Version : 525.85.07
CUDA Version : Not Found
vGPU Driver Capability
Heterogenous Multi-vGPU : Supported

Attached GPUs : 1
GPU 00000000:04:00.0
Product Name : NVIDIA A2
Product Brand : NVIDIA
Product Architecture : Ampere
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
vGPU Device Capability
Fractional Multi-vGPU : Not Supported
Heterogeneous Time-Slice Profiles : Supported
Heterogeneous Time-Slice Sizes : Not Supported
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : XXXN0TY0SERIALZXXX
GPU UUID : GPU-de23234-3450-6456-e12d-bfekgje82743a
Minor Number : 0
VBIOS Version : 94.07.5B.00.92
MultiGPU Board : No
Board ID : 0x400
Board Part Number : XXX-XXXXX-XXXX-XXX
GPU Part Number : XXXX-XXX-XX
Module ID : 1
Inforom Version
Image Version : G179.0220.00.01
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x04
Device : 0x00
Domain : 0x0000
Device Id : 0x25B610DE
Bus Id : 00000000:04:00.0
Sub System Id : 0x157E10DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Device Current : 1
Device Max : 4
Host Max : N/A
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : 0 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 16380 MiB
Reserved : 264 MiB
Used : 7808 MiB
Free : 8306 MiB
BAR1 Memory Usage
Total : 16384 MiB
Used : 1 MiB
Free : 16383 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 64 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 37 C
GPU T.Limit Temp : N/A
GPU Shutdown Temp : 96 C
GPU Slowdown Temp : 93 C
GPU Max Operating Temp : 86 C
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 8.82 W
Power Limit : 60.00 W
Default Power Limit : 60.00 W
Enforced Power Limit : 60.00 W
Min Power Limit : 35.00 W
Max Power Limit : 60.00 W
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 795 MHz
Applications Clocks
Graphics : 1770 MHz
Memory : 6251 MHz
Default Applications Clocks
Graphics : 1770 MHz
Memory : 6251 MHz
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 1770 MHz
SM : 1770 MHz
Memory : 6251 MHz
Video : 1650 MHz
Max Customer Boost Clocks
Graphics : 1770 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 650.000 mV
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2108966
Type : C+G
Name : VM-WS02
Used GPU Memory : 3904 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2108989
Type : C+G
Name : VM-WS01
Used GPU Memory : 3904 MiB

As you can see, you can pull quite a bit of information in detail from the vGPU,
as well as the VM processes.

RUNNING “NVIDIA-SMI” ON THE VM GUEST

You can also run “nvidia-smi” inside of the guest VM, which will provide you
information on the vGPU instance that is being provided to that specific VM,
along with information on the guest VM’s processes that are utilizing the GPU.

“nvidia-smi” Running on Guest VM

This is helpful for providing information on the guest VM’s usage of the vGPU
instance, as well as processes that require GPU usage.

VIRTUAL MACHINE LOG FILES

Each Virtual Machine has a “vmware.log” file inside of the VM’s folder on the
datastore.

To identify logging events pertaining to NVIDIA vGPU, you can search for the
“vmiop” string inside of the vmware.log file.

Example:

cat /vmfs/volumes/DATASTORE/VirtualMachineName/vmware.log | grep -i vmiop

The above will read out any lines inside of the log that have the “vmiop” string
inside of them. The “-i” flag instructs grep to ignore case sensitivity.

This logs provide initialization information, licensing information, as well as
XID error codes and faults.

ESXI HOST LOG FILES

Additionally, since the ESXi host is running the vGPU Host Driver (vGPU
Manager), it also has logs that pertain and assist with vGPU troubleshooting.

Some commands you can run are:

cat /var/log/vmkernel.log | grep -i vmiop

cat /var/log/vmkernel.log | grep -i nvrm

cat /var/log/vmkernel.log | grep -i nvidia

The above commands will pull NVIDIA vGPU related log items from the ESXi log
files.

USING “DXDIAG” IN THE GUEST VM

Microsoft has a tool called “dxdiag” which provides diagnostic infromation for
testing and troubleshooting video (and sound) with DirectX.

I find this tool very handy for quickly verifying

NVIDIA vGPU with Microsoft DirectX “dxdiag” tool

As you can see:

* DirectDraw Acceleration: Enabled
* Direct3D Acceleration: Enabled
* AGP Texture Acceleration: Enabled
* DirectX 12 Ultimate: Enabled

The above show that hardware acceleration is fully functioning with DirectX.
This is a indicator that things are generally working as expected. If you have a
vGPU and one of the first three is showing as disabled, then you have a problem
that requires troubleshooting. Additionally, if you do not see your vGPU card,
then you have a problem that requires troubleshooting.

Please Note: You may not see “DirectX 12 Ultimate” as this is related to
licensing.

USING THE “VMWARE HORIZON PERFORMANCE MONITOR”

The VMware Horizon Performance Monitor, is a great tool that can be installed by
the VMware Horizon Agent, that allows you to pull information (stats, connection
information, etc) for the session. Please note that this is not installed by
default, and must be selected when running the Horizon Agent installer.

When it comes to troubleshooting vGPU, it’s handy to use this too to confirm
you’re getting H.264 or H.265/HEVC offload from the vGPU instance, and also get
information on how many FPS (Frames Per Second) you’re getting from the session.

VMware Horizon Performance Tracker with NVIDIA vGPU

Once opening, you’ll change the view above using the specified selector, and you
can see what the “Encoder Name” is being used to encode the session.

Examples of GPU Offload “Encoder Name” types:

* NVIDIA NvEnc HEVC 4:2:0 – This is using the vGPU offload using HEVC
* NVIDIA NvEnc HEVC 4:4:4 – This is using the vGPU offload using HEVC high
color accuracy
* NVIDIA NvEnc H264 4:2:0 – This is using the vGPU offload using H.264
* NVIDIA NvEnc H264 4:4:4 – This is using the vGPU offload using H.264 high
color accuracy

Examples of Software (CPU) Session “Encoder Name” types:

* BlastCodec – New VMware Horizon “Blast Codec”
* h264 4:2:0 – Software CPU encoded h.264

If you’re seeing “NVIDIA NvEnc” in the encoder name, then the encoding is being
offloaded to the GPU resulting in optimum performance. If you don’t see it, it’s
most likely using the CPU for encoding, which is not optimal if you have a vGPU,
and requires further troubleshooting.

NVIDIA VGPU KNOWN ISSUES

Depending on the version of vGPU that you are running, there can be “known
issues”.

When viewing the NVIDIA vGPU Documentation, you can view known issues, and fixes
that NVIDIA may provide. Please make sure to reference the documentation
specific to the version you’re running and/or the version that fixes the issues
you’re experiencing.

COMMON PROBLEMS

There are a number of common problems that I come across when I’m contacted to
assist with vGPU deployments.

Please see below for some of the most common issues I experience, along with
their applicable fix/workaround.

XID ERROR CODES

When viewing your Virtual Machine VM or ESXi log file, and experiencing an XID
error or XID fault, you can usually look up the error codes.

Typically, vGPU errors will provide an “XiD Error” code, which can be looked up
on NVIDIA’s Xid Messages page here: XID Errors :: GPU Deployment and Management
Documentation (nvidia.com).

The table on this page allows you to lookup the XID code, find the cause, and
also provides information if the issue is realted to “HW Error” (Hardware
Error), “Driver Error”, “User App Error”, “System Memory Corruption”, “Bus
Error”, “Thermal Issue”, or “FB Corruption”.

An example:

2023-02-26T23:33:24.396Z Er(02) vthread-2108265 - vmiop_log: (0x0): XID 45 detected on physical_chid:0x60f, guest_chid:0xf
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): Timeout occurred, reset initiated.
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x52445456 0x00e207e8 0x000001cc 0x00000001
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00989680 0x00000000 0x000001bb 0x0000000f
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00000100 0x00000000 0x0000115e 0x00000000
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00000000 0x00000000 0x00001600 0x00000000
2023-02-26T23:33:36.023Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x00002214 0x00000000 0x00000000 0x00000000

2023-02-26T23:33:36.024Z Er(02) vthread-2108266 - vmiop_log: (0x0): TDR_DUMP:0x64726148 0x00736964 0x00000000 0x00000000
2023-02-26T23:33:36.068Z Er(02) vthread-2108265 - vmiop_log: (0x0): XID 43 detected on physical_chid:0x600, guest_chid:0x0

One can see XID code 45, as well as XID code 43, which after looking up on
NVIDIA’s document, states:

* XID 43 – GPU stopped processing
* Possible Cause: Driver Error
* Possible Cause: User App Error
* XID 45 – Preemptive cleanup, due to previous errors — Most likely to see when
running multiple cuda applications and hitting a DBE
* Possible Cause: Driver Error

In the situation above, one can deduce that the issue is either Driver Error,
Application Error, or a combination of both. In this specific case, you could
try changing drivers to troubleshoot.

VGPU LICENSING

You may experience issues in your vGPU deployment due to licensing issues.
Depending on how you have you environment configured, you may be running in an
unlicensed mode and not be aware.

In the event that the vGPU driver cannot obtain a valid license, it will run for
20 minutes with full capabilities. After that the performance and functionality
will start to degrade. After 24 hours it will degrade even further.

NVIDIA’s Software Enforcement of NVIDIA vGPU Software Licensing

Some symptoms of issues experienced when unlicensed:

* Users experiencing laggy VDI sessions
* Performance issues
* Frames per Second (FPS) limited to 15 fps or 3 fps
* Applications using OpenCL, CUDA, or other accelerated APIs fail

Additionally, some error messages and event logs may occur:

* Event ID 2, “NVIDIA OpenGL Driver” – “The NVIDIA OpenGL driver has not been
able to initialize a connection with the GPU.”
* AutoCAD/Revit – “Hardware Acceleration is disabled. Software emulation mode
is in use.”
* “Guest is unlicensed”

Please see below for screenshots of said errors:

vGPU Guest Is Unlicensed NVIDIA OpenGL Driver Not Found AutoCAD Hardware
Acceleration Disabled

Additonally, when looking at the Virtual Machine VM vmware.log (inside of the
VM’s folder on the ESXi datastore), you may see:

Guest is unlicensed. Cannot allocate more than 0x55 channels!
VGPU message 6 failed, result code: 0x1a

If this occurs, you’ll need to troubleshoot your vGPU licensing and resolve any
issues occurring.

VGPU TYPE (VGPU PROFILE) MISMATCH

When using the default (“time-sliced”) vGPU deployment method, only a single
vGPU type can be used on virtual machines or containers per physical GPU.
Essentially all VMs or containers utilizing the physical GPU must use the same
vGPU type.

If the physical GPU card has multiple GPUs (GPU chips), then a different type
can be used on each physical GPU chip on the same card. 2 x GPUs on a single
card = 2 different vGPU types.

Additionally, if you have multiple cards inside of a single host, the number of
vGPU types you can deployed is based off the total number of GPUs across the
total number of cards in your host.

If you configure multiple vGPU types and cannot support it, you will have issues
starting VMs, as shown below:

Cannot power on VM with vGPU: Power on Failure, Insuffiecient resources

The error reads as follows:

Power On Failures

vCenter Server was unable to find a suitable host to power on the following virtual machines for the reasons listed below.

Insufficient resources. One or more devices (pciPassthru0) required by VM VDIWS01 are not available on host ESXi-Host.

Additionally, if provisioning via VMware Horizon, you may see: “NVIDIA GRID vGPU
Support has detected a mismatch with the supported vGPUs”

Note: If you are using MIG (Multi Instance GPU), this does not apply as
different MIG types can be applied to VMs from the same card/GPU.

VGPU OR PASSTHROUGH WITH 16GB+ OF VIDEO RAM MEMORY

When attaching a vGPU to a VM, or passing through a GPU to a VM, with 16GB or
more of Video RAM (Framebuffer memory), you may run in to a situation where the
VM will not boot.

This is because the VM cannot map that large of memory space to be accesible for
use.

Please see my blog post GPU or vGPU Passthrough with 16GB+ of video memory, for
more information as well as the fix.

VGPU VM FREEZES DURING VMWARE VMOTION

Your users may report issues where their VDI guest VM freezes for a period of
time during use. This could be caused due to VMware vMotion moving the virtual
machine from one VMware ESXi host to another.

Please see my blog post NVIDIA vGPU VM Freezes during VMware vMotion: vGPU STUN
Time for more information.

“ERR!” STATE

When experiencing issues, you may notice that “nvidia-smi” throws “ERR!” in the
view. See the example below:

NVIDIA vGPU “nvidia-smi” reporting “ERR!”

This is an indicator that you’re in a fault or error state, and would recommend
checking the ESXi Host log files, and the Virtual Machine log files for XID
codes to identify the problem.

VGPU DRIVER MISMATCH

When vGPU is deployed, drivers are installed on the VMware ESXi host (vGPU
Manager Driver), as well as the guest VM virtual machine (guest VM driver).

NVIDIA vGPU Driver Mismatch

These two drivers must be compatible with each other. As per NVIDIA’s
Documentation, see below for compatibility:

* NVIDIA vGPU Manager with guest VM drivers from the same release
* NVIDIA vGPU Manager with guest VM drivers from different releases within the
same major release branch
* NVIDIA vGPU Manager from a later major release branch with guest VM drivers
from the previous branch

Additionally, if you’re using the LTS (Long Term Support Branch), the additional
compatibility note applies.

* NVIDIA vGPU Manager from a later long-term support branch with guest VM
drivers from the previous long-term support branch

If you have a vGPU driver mismatch, you’ll likely see Event ID 160 from
“nvlddmkm” reporting:

NVIDIA driver version mismatch error: Guest driver is incompatible with host drive.

To resolve this, you’ll need to change drivers on the ESXi host and/or Guest VM
to a supported combination.

UPGRADING NVIDIA VGPU

When upgrading NVIDIA vGPU drivers on the host, you may experience issues or
errors stating that the NVIDIA vGPU modules or services are loaded and in use,
stopping your ability to upgrade.

Normally an upgrade would be preformed by placing the host in maintenance mode
and running:

esxcli software vib update -d /vmfs/volumes/DATASTORE/Files/vGPU-15/NVD-VGPU-702_525.85.07-1OEM.702.0.0.17630552_21166599.zip

However, this fails due to modules that are loaded and in use by the NVIDIA vGPU
Manager Services.

Before attempting to upgrade (or uninstall and re-install), place the host in
maintenance mode and run the following command:

/etc/init.d/nvdGpuMgmtDaemon stop

This should allow you to proceed with the upgrade and/or re-install.

VMWARE HORIZON BLACK SCREEN

If you experiencing a blank or black screen when connecting to a VDI session
with an NVIDIA vGPU on VMware Horizon, it may not even be related to the vGPU
deployment.

To troubleshoot the VMware Horizon Black Screen, please review my guide on how
to troubleshoot a VMware Horizon Blank Screen.

VM HIGH CPU RDY (HIGH CPU READY)

CPU RDY (CPU Ready) is a state when a VM is ready and waiting to be scheduled on
a physical host’s CPU. In more detail, the VM’s vCPUs are ready to be scheduled
on the ESXi host’s pCPUs.

In rare cases, I have observed situations where VMs with a vGPU and high CPU RDY
times, experience instability. I believe this is due to timing conflicts with
the vGPU’s time slicing, and the VM’s CPU waiting to be scheduled.

To check VM CPU RDY, you can use one of the following methods:

1. Run “esxtop” from the CLI using the console or SSH
2. View the hosts performance stats on vCenter
* Select host, “Monitor”, “Advanced”, “Chart Options”, de-select all, select
“Readiness Average %”

When viewing the CPU RDY time in a VDI environment, generally we’d like to see
CPU RDY at 3 or lower. Anything higher than 3 may cause latency or user
experience issues, or even vGPU issues at higher values.

For your server virtualization environment (non-VDI and no vGPU), CPU Ready
times are not as big of a consideration.

VGPU PROFILES MISSING FROM VMWARE HORIZON

When using newer GPUs with older versions of VMware Horizon, you may encounter
an issue with non-persistent instant clones resulting in a provisioning error.

This is caused by missing vGPU Types or vGPU Profiles, and requires either
downloading the latest definitions, or possibly creating your own.

For more information on this issue, please see my post NVIDIA A2 vGPU Profiles
Missing from VMware Horizon causing provision failure.

ADDITIONAL (EXTERNAL) LINKS AND RESOURCES

Please see these these additional external links and resources which may assist.

* NVIDIA Virtual GPU (vGPU) Documentation – NVIDIA vGPU Documentation
* NVIDIA Virtual GPU Software Supported GPUs – List of GPUs supported by NVIDIA
vGPU
* Virtual GPU Software User Guide – Troubleshooting – Latest LTS (Long Term
Support) Version 13
* Virtual GPU Software User Guide – Troubleshooting – Latest Production Branch
Version 15

Posted by Stephen Wagner at 11:37 AM
Older Entries

ABOUT ME

Name: Stephen Wagner
President of Digitally Accurate Inc.
Business Tech Geek
IT Services and Solutions Provider
International IT Consultant
Global IT Advisor

VMware vExpert
NVIDIA NGCA Advisor
VMUG Leader

Click here to download my Resume!

Support this site and keep it running by buying hardware, software, and
licensing from my company, or by hiring me or my company!

CONSULTING

Current Status: Available for contract

(Available for travel)

I am also available for remote consulting! Please don’t hesitate to reach out!
International business welcome!

I’m available 24/7/365 (even holidays) for remote and on-site consulting.

Click here for Resume Download

HARDWARE/SOFTWARE/LICENSING

Need help with Servers, Storage, Networking, Licensing, and other IT products?

My company (Digitally Accurate Inc.) is partnered with and sells:
HPE (and HP)
VMware
Sophos
Duo (Duo Security)
Microsoft
10ZiG
Veeam
Redhat

We can help you with all your infrastructure requirements (solution design,
procurement, and installation/configuration).

Contact me today!

CONNECT WITH ME ON LINKEDIN!

CONNECT WITH ME ON FACEBOOK!

The Tech Journal

LINKS

* My Companies Blog – Digitally Accurate Inc.
* My Companies Site – Digitally Accurate Inc.

www.stephenwagner.com Open in urlscan Pro 104.26.14.95 Public Scan

Form analysis 1 forms found in the DOM

GET https://www.stephenwagner.com/

Text Content

www.stephenwagner.com Open in urlscan Pro
104.26.14.95 Public Scan

Form analysis
1 forms found in the DOM