Hello everyone,
We recently had a company install a new 3 node vSphere environment which initially seemed to go well. However I noticed that some Veeam jobs fail when the VM being backed up was on a specific ESXi node, but everything works fine on the other 2. Initially I thought this was a Veeam issue but digging into the Veeam logs i found the following error:
[04.07.2019 12:24:54] < 3040> vdl| WARN|[vddk] [NFC ERROR] NfcNewAuthdConnectionEx: Failed to connect: The remote host certificate has these problems:
[04.07.2019 12:24:54] < 3040> vdl| WARN|[vddk]
[04.07.2019 12:24:54] < 3040> vdl| WARN|[vddk] * A certificate in the host's chain is based on an untrusted root.
Which pointed me towards the issue being with the ESXi server. I dug into the /var/log/vmauthd.log log on the ESXi server that is effected and found the following.
2019-07-08T13:47:44Z vmauthd[2108903]: lib/ssl: OpenSSL using FIPS_drbg for RAND
2019-07-08T13:47:44Z vmauthd[2108903]: lib/ssl: protocol list tls1.2
2019-07-08T13:47:44Z vmauthd[2108903]: lib/ssl: protocol list tls1.2 (openssl flags 0x17000000)
2019-07-08T13:47:44Z vmauthd[2108903]: lib/ssl: cipher list ECDHE+AESGCM:RSA+AESGCM:ECDHE+AES:RSA+AES
2019-07-08T13:47:44Z vmauthd[2108903]: lib/ssl: curves list prime256v1:secp384r1:secp521r1
2019-07-08T13:47:44Z vmauthd[2108903]: Connect from remote socket (172.18.4.53:61252).
2019-07-08T13:47:44Z vmauthd[2108903]: Connect from 172.18.4.53
2019-07-08T13:47:44Z vmauthd[2108903]: SSL Error: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca
2019-07-08T13:47:44Z vmauthd[2108903]: recv() FAIL: 1.
2019-07-08T13:47:44Z vmauthd[2108903]: VMAuthdSocketRead: read failed. Closing socket for reading.
2019-07-08T13:47:44Z vmauthd[2108903]: Read failed.
Which looks like there is an issue with the certificate authority on the effected host, which would tie in nicely with the error i am seeing in Veeam. So then I compared the CA on one of the working nodes with the non-working one with this command
openssl crl2pkcs7 -nocrl -certfile /etc/vmware/ssl/castore.pem | openssl pkcs7 -print_certs -noout
Working ESXi node
subject=/CN=CA/DC=vsphere/DC=local/C=US/ST=California/O=DWLAN-VCA01.brand.local/OU=VMware Engineering
issuer=/CN=CA/DC=vsphere/DC=local/C=US/ST=California/O=DWLAN-VCA01.brand.local/OU=VMware Engineering
subject=/O=VMware/CN=SMS-190614111842368
issuer=/O=VMware/CN=SMS-190614111842368
Non-working ESXi node
subject=/CN=CA/DC=vsphere/DC=local/C=US/ST=California/O=DWLAN-VCA01.brand.local/OU=VMware Engineering
issuer=/CN=CA/DC=vsphere/DC=local/C=US/ST=California/O=DWLAN-VCA01.brand.local/OU=VMware Engineering
subject=/O=VMware/CN=SMS-190614111842368
issuer=/O=VMware/CN=SMS-190614111842368
And they are identical. I'm not sure how to move forward from here. Can anyone help at all?
Thanks in advance. Frank