Hello Team
I’m trying to install vCenter 7.0.3 over an ESXi.
The 1st stage is getting by smoothly.
On 2nd stage, I’m getting error — An error occurred while starting service ‘hvc’ and the progress gets interrupted
On further debugging, I have the following in logs —
root@localhost [ /var/log/firstboot ]# cat hvc_firstboot.py_9254_stderr.log
2022-01-28T06:37:50.796Z ERROR starting hvc rc: 1, stdout: , stderr: Start service request failed. Error: Operation timed out
Traceback (most recent call last):
File «/usr/lib/vmware-hvc/firstboot/hvc_firstboot.py», line 217, in Main
hvcsvc_fb.start_hvcsvc()
File «/usr/lib/vmware-hvc/firstboot/hvc_firstboot.py», line 54, in start_hvcsvc
self.start_service()
File «/usr/lib/vmware/site-packages/cis/firstboot.py», line 266, in start_service
service_start(self.get_eff_service_name())
File «/usr/lib/vmware/site-packages/cis/utils.py», line 1087, in service_start
raise ServiceStartException(svc_name)
cis.exceptions.ServiceStartException: {
«detail»: [
{
«id»: «install.ciscommon.service.failstart»,
«translatable»: «An error occurred while starting service ‘%(0)s'»,
«args»: [
«hvc»
],
«localized»: «An error occurred while starting service ‘hvc'»
}
],
«componentKey»: null,
«problemId»: null,
«resolution»: null
}
2022-01-28T06:37:50.800Z HybridVC Service firstboot failed due to <cis.componentStatus.ErrorInfo object at 0x7f6b5130b490>
root@localhost [ ~ ]# cat /storage/log/vmware/vmon/vmon.log
…
2022-01-28T05:27:50.747Z Wa(03) host-13149 <hvc> Service api-health command’s stderr: Error getting service health. Error: Failed to read health xml file: /usr/lib/vmware-hvc/hvc-health.xml. Error: [Errno 2] No such file or directory: ‘/usr/lib/vmware-hvc/hvc-health.xml’
2022-01-28T05:27:50.762Z Wa(03) host-13149 <hvc> Service api healthcheck command returned unknown exit code 1
…
2022-01-28T05:27:57.074Z Wa(03) host-13149 <hvc> Service api-health command’s stderr: Error getting service health. Error: Failed to read health xml file: /usr/lib/vmware-hvc/hvc-health.xml. Error: [Errno 2] No such file or directory: ‘/usr/lib/vmware-hvc/hvc-health.xml’
2022-01-28T05:27:57.074Z Wa(03)+ host-13149
2022-01-28T05:27:57.110Z Wa(03) host-13149 <hvc> Service api healthcheck command returned unknown exit code 1
…
2022-01-28T05:28:11.094Z In(05) host-13149 <hvc> Running the API Health command as user
2022-01-28T05:28:11.094Z In(05) host-13149 <hvc-healthcmd> Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonApiHealthCmd.py -n hvc -f /usr/lib/vmware-hvc/hvc-health.xml -t 30
2022-01-28T05:28:16.473Z Wa(03) host-13149 <hvc> Service api-health command’s stderr: Error getting service health. Error: Failed to read health xml file: /usr/lib/vmware-hvc/hvc-health.xml. Error: [Errno 2] No such file or directory: ‘/usr/lib/vmware-hvc/hvc-health.xml’
2022-01-28T05:28:16.473Z Wa(03)+ host-13149
2022-01-28T05:28:16.510Z Wa(03) host-13149 <hvc> Service api healthcheck command returned unknown exit code 1
2022-01-28T05:28:16.510Z In(05) host-13149 <hvc> Re-check service health since it is still initializing.
2022-01-28T05:28:17.511Z In(05) host-13149 <hvc> Running the API Health command as user
2022-01-28T05:28:17.512Z In(05) host-13149 <hvc-healthcmd> Constructed command: /usr/bin/python /usr/lib/vmware-vmon/vmonApiHealthCmd.py -n hvc -f /usr/lib/vmware-hvc/hvc-health.xml -t 30
…
root@localhost [ ~ ]# ls -lrt /usr/lib/vmware-hvc/
total 16
drwxr-xr-x 3 root root 4096 Jan 28 04:49 site-packages
dr-xr-xr-x 2 root root 4096 Jan 28 04:49 firstboot
dr-xr-xr-x 3 root root 4096 Jan 28 04:49 config
drwxr-xr-x 2 root root 4096 Jan 28 04:49 lib
root@localhost [ ~ ]#
I’m stuck here and unable to find the solution for these error in any of the vmware support/community forms.
Let me know if any more info is required.
PS — In case this is not the correct forum for this question, please redirect
Содержание
- Failed to start services firstboot error
- Failed to start services firstboot error
- Failed to start services firstboot error
- Failed to start services firstboot error
- Failed to start services firstboot error
Failed to start services firstboot error
I was trying to install the vCSA 6.0, it was failing with the below error “Failed to Start Services. Firstboot Error”.
After sometime spent on the troubleshooting and few re-installation ,I realize the tab for the networking – Choose the network in which our VC vlan’s were missing and it was showing that Non-Ephemeral port groups are not supported . On my previous Installation I was trying without selecting any vlan with just the IP and other info…
Again I started the Installation and found that it is clearly mentioned that Ephemeral port group is must for the appliance and later we can change it to other port group.
And the Next challenge is changing the port binding from the default Static to Ephemeral which cant be done on the portgroup which has already pointed to the VMsHosts .I think in general most of our environment is configured with the static binding by default and If we try to configure on the port group running VMs then it will show the below error.
So we need to create the new port group with the Ephemeral settings to some-other free vlan or new vlan and once the Installation is done then we can change it to the appropriate vlan.
Once the Installation is done then power down the Appliance and change it to the proper vlan in the vnic network settings and also from the Appliance Management Network or easiest way is just create the standard switch in the appliance vlan and select it during the installation , later change it to the distributed switch.
Another Important thing to be noted is after the IP change , we need to make sure to change the DNS Entry which should resolve to the proper Name and the IP.
DNS with proper FQDN and IP have to be created prior to the VCSA deployment or it will fail with the error”Firstboot Script Execution Error – The Supplied System Name is not Valid”.
Pls check my other recent blogs for additional PSCVC installation issue.
To find the reason for the VMware recommending to go with Ephemeral and to learn more about it , Pls check the below reference link.
Источник
Failed to start services firstboot error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am testing vSphere 6.5 with the following setup:
I have a Windows 10 laptop with VMware Workstation 12 Pro installed. I have create a virtual machine in Workstation running Windows Server 2012 R2. I have set up AD and DNS on it. This server has an IP address of 192.168.59.129 and its FQDN is win2012.ad.example.com. I can ping the machine and DNS is working correctly. I have successfully also set up an ESXi host in another virtual machine and they can ping each other.
I am now trying to install vCenter Server Appliance 6.5, so far with no success. I am trying to use the OVA file: VMware-vCenter-Server-Appliance-6.5.0.5200-4944578_OVF10.ova located in the ISO. I understand after reading several articles that I first need to configure the .vmx file before booting my machine. I think the main reason is that I am not fully understanding the settings that should be placed in the file, specifically the vmdir settings. I have tried a number of different variations on the settings and still nothing seems to be working. My most recent configuration looks like this:
I have also made sure that vc.ad.example.com is mapped correctly to 192.168.59.194 on the DNS server, including reverse DNS (PTR) entries.
After the server begins its initialization routine, I eventually get this error on the screen:
Failed to start services. Firstboot Error.
I am unable to ping the machine or connect through a web browser.
What am I doing wrong here?
As a side note, I should also point out that on boot up I am also getting the following errors which I am not sure if they are related or not, or if they can be safely ignored:
sd 2:0:0:0: [sda] Assuming drive cache: write through (I am getting this line for all 12 drives sda through sdl)
A start job is running for dev-swap_vg-swap1.device (. / 1min 30s)
[TIME] Timed out waiting for device dev-swap_vg-swap1.device.
[DEPEND] Dependency failed for /dev/swap_vg/swap1.
[DEPEND] Dependency failed for Swap.
[FAILED] Failed to start LSB: Lightning fast webserver with light system requirements.
Are these errors safe to ignore, or are they the reason that I am getting the Firstboot Error?
Источник
Failed to start services firstboot error
Hi I´m a completely newby for Vcenter.
The Message on the title showing after a fresh install of Vcenter (tried the install 3 times)
Logs telling me:
INFO firstbootInfrastructure [Failed] /usr/lib/vmware-hvc/firstboot/hvc_firstboot.py
«id»: «install.ciscommon.service.failstart»,
«translatable»: «An error occurred while starting service ‘%(0)s’»,
«args»: [
«hvc»
],
«localized»: «An error occurred while starting service ‘hvc’»
>
],
«componentKey»: «hvc»,
«problemId»: null,
«resolution»: null
2022-06-17T22:58:16.944Z INFO firstbootInfrastructure [Failed] /usr/lib/vmware-hvc/firstboot/hvc_firstboot.py is complete
2022-06-17T22:58:16.946Z WARNING firstbootInfrastructure Bug component info file does not exist
2022-06-17T22:58:16.906Z HybridVC Service firstboot failed due to
Anyone has a suggestion to this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tried another install of the vcenter after changing some dns settings.
Now another error but i can login to the vcenter.
Error on the machine:
Failed to start services. Firstboot Error.
«failedSteps»: «wcp-firstboot»,
«stepsCompletedList»: «visl-support-firstboot,vdtc_firstboot,vmafd-firstboot,vmon-firstboot,rhttpproxy_firstboot,vpostgres-firstboot,dbconfig_firstboot,envoy_firstboot,pod_firstboot,postgres-archiver-firstboot,lookupsvc-firstboot,vmidentity-firstboot,soluser_firstboot,license_firstboot,scafirstboot,vapi_firstboot,vmonapi-firstboot,vpxd-svcs_firstboot,certificateauthority_firstboot,certificatemanagement_firstboot,infraprofile_firstboot,netdump-firstboot,observabilityVapi_firstboot,pschealth-firstboot,statsmonitor_firstboot,svcaccountmgmt-firstboot,topologysvc_firstboot,trustmanagement_firstboot,vpxd_firstboot,vsphere_ui_firstboot,analytics_firstboot,mgmt-firstboot,eam_firstboot,hvc_firstboot,sms_spbm_firstboot,updatemgr-firstboot,vcha_firstboot,vlcm_firstboot,vmcam-firstboot,vsanhealth_firstboot,vsm_firstboot,vtsdb-firstboot,autodeploy-firstboot,vstats-firstboot,wcp-firstboot»
>
But i can logon to vcenter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure which DNS settings you changed, but what’s important is that the vCenter Server has forward (Host-A) and reverse (PRT) entries, i.e. nslookup resolves to the vCenter Server’s FQDN.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yep that´s working now
Like i said I´m a newby, got it installed it didn´t work then figured out dns needed. But idk what my new error is about now. Any idea?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the same issue with the wcp service
New 7.0.3.00600 Appliance — Firstboot Error
«totalSteps»: 49,
«stepsStarted»: 45,
«stepsCompleted»: 45,
«finalStatus»: «failure»,
«runTime»: [
<
«visl-support-firstboot»: «0:01:04»
>,
<
«vdtc_firstboot»: «0:00:00»
>,
<
«vmafd-firstboot»: «0:00:56»
>,
<
«vmon-firstboot»: «0:00:02»
>,
<
«rhttpproxy_firstboot»: «0:00:37»
>,
<
«vpostgres-firstboot»: «0:00:24»
>,
<
«dbconfig_firstboot»: «0:00:29»
>,
<
«envoy_firstboot»: «0:00:11»
>,
<
«pod_firstboot»: «0:00:00»
>,
<
«postgres-archiver-firstboot»: «0:00:02»
>,
<
«lookupsvc-firstboot»: «0:00:29»
>,
<
«vmidentity-firstboot»: «0:01:46»
>,
<
«soluser_firstboot»: «0:00:31»
>,
<
«license_firstboot»: «0:00:59»
>,
<
«scafirstboot»: «0:00:34»
>,
<
«vapi_firstboot»: «0:02:08»
>,
<
«vmonapi-firstboot»: «0:00:32»
>,
<
«vpxd-svcs_firstboot»: «0:02:06»
>,
<
«certificateauthority_firstboot»: «0:00:33»
>,
<
«certificatemanagement_firstboot»: «0:00:34»
>,
<
«infraprofile_firstboot»: «0:00:40»
>,
<
«netdump-firstboot»: «0:00:20»
>,
<
«observabilityVapi_firstboot»: «0:00:36»
>,
<
«pschealth-firstboot»: «0:00:02»
>,
<
«statsmonitor_firstboot»: «0:00:03»
>,
<
«svcaccountmgmt-firstboot»: «0:00:23»
>,
<
«topologysvc_firstboot»: «0:00:30»
>,
<
«trustmanagement_firstboot»: «0:00:43»
>,
<
«vpxd_firstboot»: «0:03:23»
>,
<
«vsphere_ui_firstboot»: «0:01:17»
>,
<
«analytics_firstboot»: «0:02:29»
>,
<
«mgmt-firstboot»: «0:00:40»
>,
<
«eam_firstboot»: «0:00:37»
>,
<
«hvc_firstboot»: «0:00:38»
>,
<
«sms_spbm_firstboot»: «0:01:20»
>,
<
«updatemgr-firstboot»: «0:01:44»
>,
<
«vcha_firstboot»: «0:00:22»
>,
<
«vlcm_firstboot»: «0:00:30»
>,
<
«vmcam-firstboot»: «0:00:09»
>,
<
«vsanhealth_firstboot»: «0:01:24»
>,
<
«vsm_firstboot»: «0:00:52»
>,
<
«vtsdb-firstboot»: «0:00:10»
>,
<
«autodeploy-firstboot»: «0:00:34»
>,
<
«vstats-firstboot»: «0:00:26»
>,
<
«wcp-firstboot»: «0:00:03»
>
],
«failedSteps»: «wcp-firstboot»,
Источник
Failed to start services firstboot error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I’m having issues installing vCenter Server Appliance 6. I am getting the message «Failed to start services. Firstboot error.» on the VCSA’s console, and the installer reports that it was unable to contact the server so it only has partial logs.
The exact messages from the VCSA installer is:
Firstboot script execution error.
The supplied System Name vcenter.wh121e.lab. is not valid.
If the supplied system name is a FQDN, then make sure the DNS forward lookup results in at least one valid IP address in the system. If the supplied system name is an IP address, then it should be one of the valid IP address(es) in the system.
Failed to download vCenter Server support bundle logs.All other logs can be found at
The VCSA log on the install host ends with repeated entries like these:
Every post I’ve found on this subject seems to point to DNS being the issue. The problem is, I have functional DNS in my lab environment. The installation computer and the VCSA are both able to resolve all hostnames and can ping each other.
I saw another post where a user had success adding the DNS records manually to the local hosts file, but this did not help for me at all.
I have the following configuration in my lab environment:
172.16.24.2 — DNS server
172.16.24.15 — vCenter server desired IP
172.16.24.16 — ESXi host
I am using the domain «wh121e.lab» for DNS. The vCenter server is expected to appear at «vcenter.wh121e.lab».
I provided «vcenter.wh121e.lab» and also «vcenter.wh121e.lab.» as the System Name, and neither worked. I assigned the correct static IP and the correct DNS server, as you can see.
The attached screenshot shows that the installed but failed VCSA appliance is able to resolve the desired name and that it has the correct assigned IP address. It also shows the last few lines of the firstboot log showing the error message.
The only possibility I could fathom for why this is happening is that, for whatever reason, VCSA is considering «lab» to be an invalid TLD. If this is indeed the case, though, this would be really bad news for VMware, as tons of new TLDs are appearing every week it seems, so restricting to a list of predefined TLDs and systematically failing on any other would be a really bad idea.
Any advice? If you need more logs let me know.
Message was edited by: Flint Million — Added more details and clarifications
Источник
Failed to start services firstboot error
I am trying to install VCenter 6 Server on Server 2012 R2. The server is fully patched, but I get the error:
VMware VirtualCenter failed firstboot
An error occurred while starting service ‘vpxd’
Please refer to vSphere documentation to troubleshoot or Please contact VMware Support.
I have tried everything:
Verified SQL Permissions
Log on a service rights assigned to service account and it is a local admin
Windows installer service is running
DSN is all correct and the test is successful
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Verify that the vCenter Server can resolve the FQDN of the Active Directory server using the nslookup command in a Windows command prompt.
2. Ensure that the Active Directory server is running and accessible from the vCenter Server machine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i am also facing same problem .. i have tried below KB also . but no use.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I’ve had this problem, from my experience you’ll need 3 to 5 attempts for it to install, just make a snapshot before installation and revert to it every time the install fails.
1. VERY IMPORTANT — a FQDN DNS name must be made for vcenter server, and it must work before the installation even begins. Your PDC better be up and running too.
2. When you name your ‘vSphere’ do not use ‘local’ at the end. For example, your default name should be » administrator@yourcompanyname.com «, not «yourcompany.local». For some reason latest vCenters did not like «local».
I hope this helps.
Источник
This is on my nerves for days now after I have tried to deploy latest vCenter Server to Windows Server 2008 R2 and 2012 R2. Both fresh installations with nothing more than just the vCenter Server requirements installed.
Info:
- Firewall disabled
- UAC disabled
- Administrator used for installation
- 1 NIC
- NSLOOKUP working OK (forward/reverse)
Things I’ve tried:
- Windows Server 2008 R2 / 2012 R2
- PostgreSQL (embedded) / MsSQL 2012 Express
- Short file name creation enabled (NtfsDisable8dot3NameCreation to 0)
- In and out of C:Program Files
- With and without computer being added to AD domain
- Clean up all %TMP% / %TEMP% traces of vCenter install then try again
- With and without VMware Tools
All the different combinations from above give the same result, error code 1603. This is obviously a generic OS error code when something goes wrong with the MSI installation package and gives 0 meaningful information.
Further drilling down into vCenter installation log files all I see is this:
2017-03-29 16:24:35.932+01:00| vcsInstUtil-4602587| I: Leaving
function: ParseStatusFile 2017-03-29 16:30:47.722+01:00|
vcsInstUtil-4602587| I: Entering function: ParseStatusFile 2017-03-29
16:30:47.758+01:00| vcsInstUtil-4602587| I: ParseStatusFile: curr
error msg: «VMware VirtualCenter failed firstboot.» 2017-03-29
16:30:47.758+01:00| vcsInstUtil-4602587| I: ParseStatusFile: curr
error msg: «An error occurred while starting service ‘vpxd'»
2017-03-29 16:30:47.758+01:00| vcsInstUtil-4602587| E:
ParseStatusFile: Displaying error message for
«install.vpxd.action.failed»: «VMware VirtualCenter failed firstboot.An error occurred while starting service ‘vpxd’
Please refer to vSphere documentation to troubleshoot or Please
contact VMware Support.»
I have looked everywhere but for some reason in this version of vCenter the vpxd.log file is not created.
Nothing meaningful in Event Viewer (Application) either.
I was exploring the capabilities of the tool for our lab environment but this is just ridiculous.
Any hint you can throw at me is greatly appreciated as I am out of ideas guys..
Cheers!
We had experienced this bizarre situation with one of our vCenter Servers, and it was not able to start the “vpxd” service giving “localized”: “An Error Occurred While Starting Service ‘vmware-sts-idmd’” error message. Actually, vCenter was configured with VMware VCHA feature, and server was not operational. VCHA failed to recover from the secondary node, and it was totally disconnected from the network. We were able to fix it following these steps, then we led to this unusual situation. In this post, I just wanted to share my experience, and the steps, which I followed to fix it.
I just wanted to highlight few things in this post, as I think those could be very useful for someone who encountered the similar situation. Because, many people had gave up on fixing this, and redeployed the vCenter as a workaround. While I was searching around, I found this VMware Community article, which didn’t help me to fix this.
This error message appeared, while I was connecting from the vSphere web client.
[500] SSO error: AFD Native Error Occurred: 9234
Check the vSphere Web Client server logs for details.
Not to mention that, I was not able to directly identify the issue after checking the Web client logs, so I tried to restart the services again.
Then, the “vpxd” and “vmware-sts-idmd” didn’t start properly. Manual start ended up with the below error.
“localized”: “An Error Occurred While Starting Service ‘vmware-sts-idmd’”,
“translatable”: “An error occurred while starting service ‘%(0)s’”
While I was scraping the error logs, I identified password errors in the machine account in “vmdird-syslog.log” file located in “storage/log/vmware/vmdird“.
It was a good sign, and I reset the machine account password with “vdcadmintool” located in “/usr/lib/vmware-dir/bin”.
Option “3” was the reset password for machine account. You have to provide the UPN of your Machine account. If you don’t know that, you can find it in the “vmdird-syslog.log” file. Normally it goes as “FQDN@SSODOMAIN”.
One important thing in this step, we need to note the generated password for the next step.
Then likewise registration tool used to update the password, which I reset in the previous step. Execute these commands one at a time.
/opt/likewise/bin/lwregshell
cd HKEY_THIS_MACHINEservicesvmdir
#specify new password within the double quotes
set_value dcAccountPassword "NEW_PASSWORD"
quit
Then I rebooted the vCenter to take these changes come to the effect. Finally, I was able to start my “vpxd” and login to the vCenter server.
My Thoughts
I suppose, VCHA didn’t break the vCenter server, and this machine account did. Due to the communication or the authentication of this so called “Machine Account” made the cluster break situation of VCHA, and it was out of service. I’m not quite sure this is a bug of the VMware build version we used or some other reason, as we are looking in to this at the time of writing this article.
Troubleshooting was two steps, fixing VCHA to come vCenter alive in the network, and resetting the “Machine Account“.
I hope this helps.
Click to rate this post!
[Total: 5 Average: 5]
Недавно столкнулся с ситуацией, что перестал работать VCenter (в моём случае версия 6.5 – VCSA, но подобное может быть с любой версией не зависимо от платформы).
При попытке зайти как под доменной учёткой, так и под локальным админом – не проходила проверка подлинности – постоянно писало – сперва, что неправильные имя пользователя или пароль, а затем и вовсе, что требуется ввести имя пользователя и пароль, хотя они само собой были введены.
После попытки перезагрузить сервер или все службы командами service-control —stop —all и service-control —start —all добрая половина служб не запускалась, при этом на главной странице vcenter – выскакивала ошибка 503
При переходе по адресу vcenter/ui – появлялась ошибка
[400] An error occurred while sending an authentication request to the vCenter Single Sign-On server. An error occurred when processing the metadata during vCenter Single Sign-on setup – null.
В моём случае проблема оказалась в протухшем STS сертификате. Далее я пишу для своего случая, именно для VCSA, если у вас vCenter на винде, то читайте KBшки, ссылки на которые указаны, там есть описания, что делать для виндовых случаев.
Чтобы посмотреть срок действия STS сертификатов нужно скачать скрипт из KB VMWare. На всякий случай скопировал его.
Скачать можно при помощи wget, ну или закинуть на сервере через WinSCP, ну или просто скопировать текст скрипта и вставить в файл на сервере.
Запускается он командой:
python checksts.py
После выполнения будут отображены действительные и просроченные STS сертификаты. Если найдется просроченный сертификат, то внизу будет подсказка на какую KB стоит посмотреть.
В этой KB имеется скрипт, на всякий случай – копия, для обновления просроченных STS сертификатов, конечно как водится, перед запуском чего-бы то ни было – рекомендуется сделать бэкапы, снапшоты и т.д и т.п. Также обратите внимание, что если у вас в одном SSO домене находится несколько vCenter серверов – то запускать скрипт нужно только на одном.
В общем качаем скрипт, делаем его исполняемым и запускаем:
chmod +x fixsts.sh ./fixsts.sh
Если всё пройдет без ошибок – можно пробовать перезапустить все службы:
service-control --stop --all service-control --start --all
Если окажется, что есть еще какие-нибудь протухшие сертификаты, то службы всё равно могут не стартануть. Найти все просроченные сертификаты поможет команда:
for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo STORE $i; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | egrep "Alias|Not After"; done
Соответственно нужно будет такие сертификаты обновить и снова попробовать перезапустить все службы. Либо, можно пойти простым и топорным путём, и перевыпустить вообще все сертификаты:
/usr/lib/vmware-vmca/bin/certificate-manager
Тут нужно будет выбрать 8й пункт, и далее следовать инструкциям на экране. Но используйте эту команду на свой страх и риск, если используются сторонние сервисы/плагины/кастомизированные шаблоны то это дело скорее всего нужно будет перенастраивать.
После описанных выше действий у меня vCenter починился, и работает нормально. Вообще какое-то странное решение у VMWare с этим ультраважным сертификатом. Возникает вопрос – почему нельзя было сделать, что б он автоматически перевыпускался, когда походит к концу срок его действия?
После сбоя на кластере vCenter HA столкнулся с проблемой. Резервное копирование vCenter сломалось. Служба VMware Postgres Archiver не хочет запускаться. При попытке запуска службы из командной строки ошибка:
service-control --start vmware-postgres-archiver
An error occurred while starting service 'vmware-postgres-archiver'
Ошибка не очень информативная, нужно смотреть логи.
cd /var/log/vmware/vpostgres/
cat pg_archiver.log-0.stderr
ERROR pg_archiver unexpected termination of replication stream: ERROR: requested WAL segment 000000040000035700000048 has already been removed
Postgres Archiver не может прочитать WAL лог, поскольку его уже нет. Шустрее надо читать. Посмотрим что там с архивами.
cd /storage/archive/vpostgres/
ls -Fla
Видим, что 12 октября (а сейчас уже 17-е) что-то сломалось и служба не может закончить архивировать WAL лог. Архив 000000040000035700000048.gz.partial остался незаконченным. И тут всё сломалось, служба не знает что делать. Пока не пнёшь — не полетит.
Что делать? Похожий вопрос нашёл на форуме:
https://communities.vmware.com/t5/vCenter-Server-Discussions/Postgres-Archiver-service-fails-to-start-on-vCenter-Server/td-p/2816341
Кластер vCenter HA у меня уже разобран, так что эти архивы пока не особо нужны. Что-делать: да ничего тут не сделаешь. Будем чистить. Останавливаем службы:
service-control --stop vmware-postgres-archiver vmware-vpostgres
И чистим содержимое /storage/archive/vpostgres/.
Папка пуста. Запускаем службы:
service-control --start vmware-postgres-archiver vmware-vpostgres
Теперь всё запустилось.
cd /storage/archive/vpostgres/
ls -Fla
We have a single on-premises VCSA 6.5 instance that recently ran into the certificate expiration detailed in this KB:
https://kb.vmware.com/s/article/76719
All the certificates have been regenerated using the certificate-tool via the CLI, and now show up as up-to-date using the one-liner in the above KB (they were all previously expired a week ago):
STORE MACHINE_SSL_CERT
Alias : __MACHINE_CERT
Not After : Aug 18 19:56:50 2022 GMT
STORE TRUSTED_ROOTS
Alias : 9bd7b30bcb1dcecfe2491a3e91fcd3dd756f347f
Not After : Aug 1 13:58:01 2028 GMT
Alias : c0af9d76ae9fab214298c6b11d4efb72f64b6c13
Not After : Aug 13 18:18:55 2030 GMT
Alias : ac50bb369ff7dce7e8c372b9b3e50f6e3aaaa528
Not After : Aug 13 18:20:03 2030 GMT
Alias : 3e816060d6322a45114eac30798edbf1a4a1397d
Not After : Aug 13 18:28:26 2030 GMT
Alias : 074ddc83baeea4c6588f3f11837ed4fc77b25220
Not After : Aug 13 19:21:38 2030 GMT
Alias : 4bbaf83d23a818f2e8122b60ca0edc6dabf76d7d
Not After : Aug 13 19:33:49 2030 GMT
STORE TRUSTED_ROOT_CRLS
Alias : a45f284d7b9325005381b1b14d3ac3c823e104c9
Alias : 4b3b32cf9bb0d212aa6551bdd97dd3aaf029dde5
Alias : 02c60981250d68d94e1fcd31c93d0c50ae26d531
Alias : c4df908ec94dc3b1b774ca4a8768acfdbee90e59
Alias : f65b7ab274c5d949e8e914101797260d9e40fd70
Alias : 84d8635a51db3a011bab257873555c6776381d37
STORE machine
Alias : machine
Not After : Aug 18 19:12:42 2022 GMT
STORE vsphere-webclient
Alias : vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd
Alias : vpxd
Not After : Aug 18 19:12:43 2022 GMT
STORE vpxd-extension
Alias : vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
STORE SMS
Alias : sms_self_signed
Not After : Aug 7 14:06:21 2028 GMT
STORE BACKUP_STORE
Alias : bkp___MACHINE_CERT
Not After : Aug 18 19:11:39 2022 GMT
Alias : bkp_machine
Not After : Aug 18 19:12:42 2022 GMT
Alias : bkp_vsphere-webclient
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd
Not After : Aug 18 19:12:43 2022 GMT
Alias : bkp_vpxd-extension
Not After : Aug 18 19:12:44 2022 GMT
When I try to start all services now, it returns the following after ~5 minutes:
Service-control failed. Error Failed to start vmon services.vmon-cli RC=1, stderr=Failed to start vpxd-svcs, vapi-endpoint services. Error: Operation timed out
When using service-control to start just the vpxd-svcs service by itself, it returns the following error:
Perform start operation. vmon_profile=None, svc_names=[‘vmware-vpxd-svcs’], include_coreossvcs=False, include_leafossvcs=False
2020-08-18T21:10:50.484Z Service vpxd-svcs state STOPPED
Error executing start on service vpxd-svcs. Details {
“resolution”: null,
“detail”: [
{
“args”: [
“vpxd-svcs”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘vpxd-svcs’”,
“translatable”: “An error occurred while starting service ‘%(0)s’”
}
],
“componentKey”: null,
“problemId”: null
}
Service-control failed. Error {
“resolution”: null,
“detail”: [
{
“args”: [
“vpxd-svcs”
],
“id”: “install.ciscommon.service.failstart”,
“localized”: “An error occurred while starting service ‘vpxd-svcs’”,
“translatable”: “An error occurred while starting service ‘%(0)s’”
}
],
“componentKey”: null,
“problemId”: null
}
The web UI returns the following 503 error (which it has been returning since the certs expired):
503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x000056033c080640] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)
Can anyone point me to what log files specifically I need to be looking at to diagnose this and figure out what keeps the service from starting? I’ve already covered the following:
- It’s not a disk space / log rotation issue
- It’s not the postgre DB (for which I found a few threads, but it’s starting properly in our instance)
Our last resort is to simply wipe and reinstall VCSA, but I’d like to avoid it if this is possible to fix.