Skip to main content

Recover a failed vCenter HA




recover-failed-vcenter-ha-01


The procedure to recover a failed vCenter HA takes place when Active, Passive and Witness nodes could not communicate with each other making the vCenter HA cluster non-functional.
Since the HA cluster doesn't support more than a single point of failure, the service availability is impacted and you need to restore the vCenter functionality to keep your infrastructure healthy.

recover-failed-vcenter-ha-02


vCenter HA shutdown sequence

If for any reason you need to reboot or shutdown the vCenter HA, you must follow a specific sequence to keep current roles:
  • Passive node
  • Witness node
  • Active node
You can restart nodes in any order.

Recover a failed vCenter HA

One reason of HA cluster failure is when nodes become isolated. Nodes cannot communicate to each other affecting the vCenter availability.

Check for connectivity issues

To troubleshoot connectivity issues, access the Active node through the Direct Console, login as root and enable the Bash shell. Run the command ifconfig to check network availability:

# ifconfig -a
recover-failed-vcenter-ha-03

In the example, the Eth0 doesn't have an assigned IP address.
Run the following command to check the vCenter's NICs operational status:

# networkctl
Recover a failed vCenter HA 1
As shown in the screenshot, Eth0 is not operational.
Run the following command to get additional details about Eth0:
# networkctl status eth0
recover-failed-vcenter-ha-05
Since the Eth0 NIC is not functional, try restarting the network service:
# systemctl restart systemd-networkd
recover-failed-vcenter-ha-06
Check the Eth0 NIC State once again:
# networkctl status eth0
recover-failed-vcenter-ha-07
The Eth0 State is now routable. Reboot the node to verify if the configuration remains permanent.
When the node has been rebooted, check the installed NICs status:
# ifconfig -a

recover-failed-vcenter-ha-08

Eth0 is still misconfigured causing the isolation of the node.
If the nodes connectivity is restored successfully, isolated vCenter HA nodes rejoin the cluster automatically and the Active node starts serving client requests again. If the connectivity issue cannot be solved, you need to recover the vCenter availability.

Remove the HA cluster configuration

If connectivity is not restored, the solution is to remove the HA cluster to have the Active node up and running again.
First step is to power off and delete both Passive and Witness nodes.
recover-failed-vcenter-ha-09
Login as root to the Active node via Direct Console and run the following command to remove the HA cluster configuration:
# destroy-vcha
recover-failed-vcenter-ha-10
If you get a warning message that stops the process, run the command again by appending the -f parameter:
# destroy-vcha -f
recover-failed-vcenter-ha-11
When the procedure has completed, reboot the node:
# reboot
After rebooting the Active node, check the network status:
# ifconfig -a
recover-failed-vcenter-ha-12
Eth0 has the correct IP Address and the vCenter Server is back on line.
recover-failed-vcenter-ha-13
Once the vCenter availability has been restored, the vCenter HA cluster can be rebuilt once again

Comments

Popular posts from this blog

Dell EMC VxRail – VMware Virtual SAN Stretched Cluster

Logical Diagram of VMware vSAN Stretched Cluster Physical Diagram of VMware vSAN Stretched Cluster Last week I deployed a test environment of VMware vSAN Stretched Cluster which is running on Dell EMC VxRail Appliance. In this post we will describe how to setup VMware vSAN Stretched Cluster on Dell EMC VxRail Appliance. Above figure is the high level of physical system diagram. In site A/B there are six VxRail Appliances and two 10GB Network Switch which are interconnected by two 10GB links, and each VxRail Appliance has one 10GB uplink connects to each Network Switch. In site C, there are one vSAN Witness host and one 10GB Network Switch. For the details of configuration of each hardware equipment in this environment, you can reference the followings. Site A (Preferred Site) 3 x VxRail E460 Appliance Each node includes 1 x SSD and 3 x SAS HDD, 2 x 10GB SFP+ ports 1 x 10GB Network switch Site B (Secondary Site) 3 x VxRail E460 Appliance Each node includes 1 x SSD and...

Console Mouse Not Working in Windows 2012 VMs

I recently ran into some problems while deploying a Windows Server 2012 R2 VM in my vSphere 6.5 U2 lab. I’ve come to expect that the console mouse response is going to be terrible until VMware Tools is installed, but for some odd reason I had no mouse control whatsoever. Thinking it may be a quirk of the Web Console, I tried both the Remote Console and the HTML5 client to no avail. The VM appeared to be healthy and would register keyboard input, but the motion of the mouse cursor was erratic or the cursor would not move at all. Thinking that I just needed to battle on and get Tools installed, I attempted to use the keyboard for this purpose – what a chore. You think it would have been easy, but the installer kept losing focus and falling behind other open windows. Many of the windows keyboard shortcuts I’d normally use were not functioning because they register on my laptop – not in the console. I couldn’t RDP to the VM either because the NIC needed to be configured with a vali...

Certificate Error During Datastore Upload

“The operation failed for an undetermined reason. Typically, this problem occurs due to certificates that the browser does no trust. If you are using self-signed or custom certificates, open the URL below in a new browser tab and accept the certificate, then retry the operation.” In my case, the URL that it listed was to one of my ESXi hosts in the compute-a cluster called Clu-1 . The error then goes on to reference VMware KB 2147256 . It may seem odd that the vSphere Client would be telling you to visit a random ESXi host’s UI address when you are trying to upload a file via vCenter. But if you stop to think about it for a second, vCenter has no access whatsoever to your datastores. Whether you are trying to create a new VMFS datastore, upload a file or even just browse, vCenter must rely on an ESXi host with the necessary access to do the actual legwork. That ESXi host then relays the information back through the Web Client. vCenter Server will broker th...