Skip to main content

Recover a failed vCenter HA




recover-failed-vcenter-ha-01


The procedure to recover a failed vCenter HA takes place when Active, Passive and Witness nodes could not communicate with each other making the vCenter HA cluster non-functional.
Since the HA cluster doesn't support more than a single point of failure, the service availability is impacted and you need to restore the vCenter functionality to keep your infrastructure healthy.

recover-failed-vcenter-ha-02


vCenter HA shutdown sequence

If for any reason you need to reboot or shutdown the vCenter HA, you must follow a specific sequence to keep current roles:
  • Passive node
  • Witness node
  • Active node
You can restart nodes in any order.

Recover a failed vCenter HA

One reason of HA cluster failure is when nodes become isolated. Nodes cannot communicate to each other affecting the vCenter availability.

Check for connectivity issues

To troubleshoot connectivity issues, access the Active node through the Direct Console, login as root and enable the Bash shell. Run the command ifconfig to check network availability:

# ifconfig -a
recover-failed-vcenter-ha-03

In the example, the Eth0 doesn't have an assigned IP address.
Run the following command to check the vCenter's NICs operational status:

# networkctl
Recover a failed vCenter HA 1
As shown in the screenshot, Eth0 is not operational.
Run the following command to get additional details about Eth0:
# networkctl status eth0
recover-failed-vcenter-ha-05
Since the Eth0 NIC is not functional, try restarting the network service:
# systemctl restart systemd-networkd
recover-failed-vcenter-ha-06
Check the Eth0 NIC State once again:
# networkctl status eth0
recover-failed-vcenter-ha-07
The Eth0 State is now routable. Reboot the node to verify if the configuration remains permanent.
When the node has been rebooted, check the installed NICs status:
# ifconfig -a

recover-failed-vcenter-ha-08

Eth0 is still misconfigured causing the isolation of the node.
If the nodes connectivity is restored successfully, isolated vCenter HA nodes rejoin the cluster automatically and the Active node starts serving client requests again. If the connectivity issue cannot be solved, you need to recover the vCenter availability.

Remove the HA cluster configuration

If connectivity is not restored, the solution is to remove the HA cluster to have the Active node up and running again.
First step is to power off and delete both Passive and Witness nodes.
recover-failed-vcenter-ha-09
Login as root to the Active node via Direct Console and run the following command to remove the HA cluster configuration:
# destroy-vcha
recover-failed-vcenter-ha-10
If you get a warning message that stops the process, run the command again by appending the -f parameter:
# destroy-vcha -f
recover-failed-vcenter-ha-11
When the procedure has completed, reboot the node:
# reboot
After rebooting the Active node, check the network status:
# ifconfig -a
recover-failed-vcenter-ha-12
Eth0 has the correct IP Address and the vCenter Server is back on line.
recover-failed-vcenter-ha-13
Once the vCenter availability has been restored, the vCenter HA cluster can be rebuilt once again

Comments

Popular posts from this blog

AWS on a Nutshell

                             Accessing the Platform To access AWS Cloud services, you can use the AWS Management Console, the AWS Command Line Interface (CLI), or the AWS Software Development Kits (SDKs). The AWS Management Console is a web application for managing AWS Cloud services. The console provides an intuitive user interface for performing many tasks. Each service has its own console, which can be accessed from the AWS Management Console. The console also provides information about the account and billing. The AWS Command Line Interface (CLI) is a unified tool used to manage AWS Cloud services. With just one tool to download and configure, you can control multiple services from the command line and automate them through scripts. The AWS Software Development Kits (SDKs) provide an application programming interface (AP...

vCenter Server Options on VxRail Appliance

In this post we will discuss the pros and cons of vCenter Server deployment on VxRail. During VxRail initialization we can choose two options for vCenter Server deployment, both options are “Deploy new vCenter Server Appliance (VCSA)” and “Join existing vCenter Server”. We should know its limitation before we choose the each option. You can find the details as below. If we choose the bundle vCenter Server deployment The vCenter Service Server Appliance (VCSA), vCenter Server Platform Services Controller (PSC) and vRealize Log Insight VM are already pre-loaded on each VxRail node. When VxRail initialization it can deploy these VMs automatically based on our business requirement. Pros The vCenter Server (VCSA) deployment is done automatically. When installing software package upgrade for the VxRail Manager, both VCSA and PSC includes the processing of package upgrade. VMware Log Insight is deployed automatically The vCenter license is bundled on VxRail Appliance. C...

Console Mouse Not Working in Windows 2012 VMs

I recently ran into some problems while deploying a Windows Server 2012 R2 VM in my vSphere 6.5 U2 lab. I’ve come to expect that the console mouse response is going to be terrible until VMware Tools is installed, but for some odd reason I had no mouse control whatsoever. Thinking it may be a quirk of the Web Console, I tried both the Remote Console and the HTML5 client to no avail. The VM appeared to be healthy and would register keyboard input, but the motion of the mouse cursor was erratic or the cursor would not move at all. Thinking that I just needed to battle on and get Tools installed, I attempted to use the keyboard for this purpose – what a chore. You think it would have been easy, but the installer kept losing focus and falling behind other open windows. Many of the windows keyboard shortcuts I’d normally use were not functioning because they register on my laptop – not in the console. I couldn’t RDP to the VM either because the NIC needed to be configured with a vali...