Skip to main content

Recover a failed vCenter HA




recover-failed-vcenter-ha-01


The procedure to recover a failed vCenter HA takes place when Active, Passive and Witness nodes could not communicate with each other making the vCenter HA cluster non-functional.
Since the HA cluster doesn't support more than a single point of failure, the service availability is impacted and you need to restore the vCenter functionality to keep your infrastructure healthy.

recover-failed-vcenter-ha-02


vCenter HA shutdown sequence

If for any reason you need to reboot or shutdown the vCenter HA, you must follow a specific sequence to keep current roles:
  • Passive node
  • Witness node
  • Active node
You can restart nodes in any order.

Recover a failed vCenter HA

One reason of HA cluster failure is when nodes become isolated. Nodes cannot communicate to each other affecting the vCenter availability.

Check for connectivity issues

To troubleshoot connectivity issues, access the Active node through the Direct Console, login as root and enable the Bash shell. Run the command ifconfig to check network availability:

# ifconfig -a
recover-failed-vcenter-ha-03

In the example, the Eth0 doesn't have an assigned IP address.
Run the following command to check the vCenter's NICs operational status:

# networkctl
Recover a failed vCenter HA 1
As shown in the screenshot, Eth0 is not operational.
Run the following command to get additional details about Eth0:
# networkctl status eth0
recover-failed-vcenter-ha-05
Since the Eth0 NIC is not functional, try restarting the network service:
# systemctl restart systemd-networkd
recover-failed-vcenter-ha-06
Check the Eth0 NIC State once again:
# networkctl status eth0
recover-failed-vcenter-ha-07
The Eth0 State is now routable. Reboot the node to verify if the configuration remains permanent.
When the node has been rebooted, check the installed NICs status:
# ifconfig -a

recover-failed-vcenter-ha-08

Eth0 is still misconfigured causing the isolation of the node.
If the nodes connectivity is restored successfully, isolated vCenter HA nodes rejoin the cluster automatically and the Active node starts serving client requests again. If the connectivity issue cannot be solved, you need to recover the vCenter availability.

Remove the HA cluster configuration

If connectivity is not restored, the solution is to remove the HA cluster to have the Active node up and running again.
First step is to power off and delete both Passive and Witness nodes.
recover-failed-vcenter-ha-09
Login as root to the Active node via Direct Console and run the following command to remove the HA cluster configuration:
# destroy-vcha
recover-failed-vcenter-ha-10
If you get a warning message that stops the process, run the command again by appending the -f parameter:
# destroy-vcha -f
recover-failed-vcenter-ha-11
When the procedure has completed, reboot the node:
# reboot
After rebooting the Active node, check the network status:
# ifconfig -a
recover-failed-vcenter-ha-12
Eth0 has the correct IP Address and the vCenter Server is back on line.
recover-failed-vcenter-ha-13
Once the vCenter availability has been restored, the vCenter HA cluster can be rebuilt once again

Comments

Popular posts from this blog

Dell EMC VxRail – VMware Virtual SAN Stretched Cluster

Logical Diagram of VMware vSAN Stretched Cluster Physical Diagram of VMware vSAN Stretched Cluster Last week I deployed a test environment of VMware vSAN Stretched Cluster which is running on Dell EMC VxRail Appliance. In this post we will describe how to setup VMware vSAN Stretched Cluster on Dell EMC VxRail Appliance. Above figure is the high level of physical system diagram. In site A/B there are six VxRail Appliances and two 10GB Network Switch which are interconnected by two 10GB links, and each VxRail Appliance has one 10GB uplink connects to each Network Switch. In site C, there are one vSAN Witness host and one 10GB Network Switch. For the details of configuration of each hardware equipment in this environment, you can reference the followings. Site A (Preferred Site) 3 x VxRail E460 Appliance Each node includes 1 x SSD and 3 x SAS HDD, 2 x 10GB SFP+ ports 1 x 10GB Network switch Site B (Secondary Site) 3 x VxRail E460 Appliance Each node includes 1 x SSD and...

UEFI Secure Boot with ESXi 6.5

UEFI Secure Boot: UEFI, or Unified Extensible Firmware Interface, is a replacement for the traditional BIOS firmware. In UEFI, Secure Boot is a “protocol” of the UEFI firmware. UEFI Secure boot ensures that the boot loaders are not compromised by validating their digital signature against a digital certificate in the firmware. UEFI can store whitelisted digital certificates in a signature database (DB). There is also a blacklist of forbidden certificates (DBX), a Key Exchange Keys (KEK) database and a platform key. These digital certificates are used by the UEFI firmware to validate the boot loader.  Boot loaders are typically cryptographically signed and their digital signature chains to the certificate in the firmware.The default digital certificate in almost every implementation of UEFI firmware is a x509 Microsoft UEFI Public CA cert. Most of the UEFI implementations also allows the installation of additional certificate in the UEFI firmware and UE...

VMware Interview Questions & Answers

These interview questions are categorized into the following technical areas: Hypervisor Fault Tolerance (FT) Virtual Networking vCenter Server Virtual Storage (Datastore) What’s New in vSphere 6.0 Content Libraries vSAN vApp and Miscellaneous Hypervisor What is VMKernel and why it is important? VMkernel is a virtualization interface between a Virtual Machine and the ESXi host which stores VMs. It is responsible to allocate all available resources of ESXi host to VMs such as memory, CPU, storage etc. It’s also controlled special services such as vMotion, Fault tolerance, NFS, traffic management and iSCSI. To access these services, VMkernel port can be configured on ESXi server using a standard or distributed vSwitch. Without VMkernel, hosted VMs cannot communicate with ESXi server. What is the hypervisor and its types? A hypervisor is a virtualization layer that enables multiple operating systems to share a single hardware host.  Each operating syste...