Error executing image transfer script error copying - Исправление ошибок и поиск оптимальных решений проблем

Содержание

Name already in use
docs / source / installation_and_configuration / opennebula_services / troubleshooting.rst
undeploy fails when using ceph system datastore #1314
Comments
Failure during undeploy on the same host as sunstone #3010
Comments
Progress Status

Name already in use

docs / source / installation_and_configuration / opennebula_services / troubleshooting.rst

Go to file T
Go to line L
Copy path
Copy permalink

Copy raw contents

Every OpenNebula server generates logs with a configurable verbosity (level of detail) and through different means (file, syslog, or standard error output) to allow cloud administrators to troubleshoot the potential problems. Logs are stored in /var/log/one/ on a Front-end Host with a particular component. Some valuable error messages can be also seen by the end-users in :ref:`CLI ` tools or the :ref:`Sunstone GUI ` .

Configure Logging System

Follow the guides of each component to find the logs’ location and configuration of log verbosity:

After changing the logging level, don’t forget to restart the service so that it can take effect.

Logs are rotated on (re)start of a particular component. Find a historic log alongside the current logs with date/time suffixes (e.g., latest /var/log/one/oned.log might have the following historic log /var/log/one/oned.log-20210321-1616319097 , or an even older compressed log /var/log/one/oned.log-20210314-1615719402.gz )

As well as the common service logs, the following are other places to investigate and troubleshoot problems:

Virtual Machines: The information specific to a VM will be dumped in the log file /var/log/one/ .log . All VMs controlled by OpenNebula have their own directory, /var/lib/one/vms/ if syslog/stderr isn’t enabled. You can find the following information in it:

Deployment description files : Stored in deployment. , where is the sequence number in the execution history of the VM ( deployment.0 for the first host, deployment.1 for the second and so on).

Transfer description files : Stored in transfer. . , where is the sequence number in the execution history of the VM, and is the stage where the script was used, e.g. transfer.0.prolog , transfer.0.epilog , or transfer.1.cleanup .

Drivers: Each driver can have its ONE_MAD_DEBUG variable activated in RC files. If enabled, the error information will be dumped in /var/log/one/name-of-the-driver-executable.log . Log information from the drivers is in oned.log .

OpenNebula Daemon Log Format

The structure of OpenNebula Daemon log messages for a file based logging system is the following:

In the case of syslog it follows the standard:

where the zone_id is the ID of the Zone in the federation ( 0 for single Zone setups), the module is any of the internal OpenNebula components ( VMM , ReM , TM , etc.), and the log_level is a single character indicating the log level ( I for informational, D for debugging, etc.).

For syslog, OpenNebula will also log the Virtual Machine events like this:

and similarly for stderr logging.

For oned and VM events the formats are:

The causes of Virtual Machine errors can be found in the details of VM. Any VM owner or cloud administrator can see the error via the onevm show $ID command (or in the Sunstone GUI). For example:

The error message here (see ERROR=[MESSAGE=»Error executing image. ) shows an error when copying an image (file /tmp/some_file ). The source file most likely doesn’t exist. Alternatively, you can check the detailed log of a particular VM in /var/log/one/$ID.log (in this case the VM has ID 0 and the log file would be /var/log/one/0.log )

Recover from VM Failure

The overall state of a virtual machine in a failure condition will show as failure (or fail in the CLI). To find out the specific failure situation you need to check the LCM_STATE of the VM in the VM info tab (or onevm show in the CLI.). Moreover, a VM can be stuck in a transition (e.g. boot or save) because of a host or network failure. Typically these operations will eventually time out and lead to a VM failure state.

The administrator has the ability to force a recovery action from Sunstone or from the CLI, with the onevm recover command. This command has the following options:

—success : If the operation has been confirmed to succeed. For example, the administrator can see the VM properly running in the hypervisor, but the driver failed to inform OpenNebula of the successful boot.
—failure : This will have the same effect as a driver reporting a failure. It is intended for VMs that get stuck in transient states. As an example, if a storage problem occurs and the administrator knows that a VM stuck in prolog is not going to finish the pending transfer, this action will manually move the VM to prolog_failure .
—retry : To retry the previously failed action. It can be used, for instance, if a VM is in boot_failure because the hypervisor crashed. The administrator can tell OpenNebula to retry the boot after the hypervisor is started again.
—retry —interactive : In some scenarios where the failure was caused by an error in the Transfer Manager actions, each action can be rerun and debugged until it works. Once the commands are successful, a success should be sent. See the specific section below for more details.
—delete : No recovery action possible, delete the VM. This is equivalent to the deprecated OpenNebula onevm delete .
—delete-db : No recover action possible, delete the VM from the DB. It does not trigger any action on the hypervisor.
—recreate : No recovery action possible, delete and recreate the VM. This is equivalent to the deprecated OpenNebula onevm delete —recreate .

Note also that OpenNebula will try to automatically recover some failure situations using the monitor information. A specific example is that a VM in the boot_failure state will become running if the monitoring reports that the VM was found running in the hypervisor.

The following list details failure states caused by errors related to the hypervisor.

BOOT_FAILURE : The VM failed to boot but all the files needed by the VM are already in the Host. Check the hypervisor logs to find out the problem and, once fixed, recover the VM with the retry option.
BOOT_MIGRATE_FAILURE : same as above but during a migration. Check the target hypervisor and retry the operation.
BOOT_UNDEPLOY_FAILURE : same as above but during a resume after an undeploy. Check the target hypervisor and retry the operation.
BOOT_STOPPED_FAILURE : same as above but during a resume after a stop. Check the target hypervisor and retry the operation.

Transfer Manager / Storage Problems

The following list details failure states caused by errors in the Transfer Manager driver. These states can be recovered by checking the vm.log and looking for the specific error (disk space, permissions, misconfigured datastore, etc). You can execute —retry to relaunch the Transfer Manager actions after fixing the problem (freeing disk space, etc). You can execute —retry —interactive to launch a Transfer Manager Interactive Debug environment that will allow you to: (1) see all the TM actions in detail (2) relaunch each action until it’s successful (3) skip TM actions.

PROLOG_FAILURE : there was a problem setting up the disk images needed by the VM.
PROLOG_MIGRATE_FAILURE : problem setting up the disks in the target host.
EPILOG_FAILURE : there was a problem processing the disk images (may be discard or save) after the VM execution.
EPILOG_STOP_FAILURE : there was a problem moving the disk images after a stop.
EPILOG_UNDEPLOY_FAILURE : there was a problem moving the disk images after an undeploy.
PROLOG_MIGRATE_POWEROFF_FAILURE : problem restoring the disk images after a migration in a poweroff state.
PROLOG_MIGRATE_SUSPEND_FAILURE : problem restoring the disk images after a migration in a suspend state.
PROLOG_RESUME_FAILURE : problem restoring the disk images after a stop.
PROLOG_UNDEPLOY_FAILURE : problem restoring the disk images after an undeploy.

Here’s an example of a Transfer Manager Interactive Debug environment ( onevm recover —retry —interactive ):

Host errors can be investigated via the onehost show $ID command. For example:

The error message here (see ERROR=[MESSAGE=»Error monitoring host. ) shows an error when updating remote drivers on a host. To get more information, you have to check OpenNebula Daemon log ( /var/log/one/oned.log ) and, for example, see this relevant error:

The error message ( Could not resolve hostname ) explains there is the incorrect hostname of OpenNebula Host, which can’t be resolved in DNS.

Источник

undeploy fails when using ceph system datastore #1314

Author Name: Tobias Fischer (Tobias Fischer)
Original Redmine Issue: 5353, https://dev.opennebula.org/issues/5353
Original Date: 2017-09-06
Original Assignee: Ruben S. Montero

when I use a Ceph System Datastore the undeployment of VMs fails with following error:

Wed Sep 6 12:31:51 2017 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ceph/mv node01.example.com:/var/lib/one//datastores/130/23214 opennebula:/var/lib/one//datastores/130/23214 23214 130
Wed Sep 6 12:31:51 2017 [Z0][TM][I]: mv: Moving node01.example.com:/var/lib/one/datastores/130/23214 to opennebula:/var/lib/one/datastores/130/23214
Wed Sep 6 12:31:51 2017 [Z0][TM][E]: mv: Command «set -e -o pipefail
Wed Sep 6 12:31:51 2017 [Z0][TM][I]:
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: tar -C /var/lib/one/datastores/130 —sparse -cf — 23214 | ssh opennebula ‘tar -C /var/lib/one/datastores/130 —sparse -xf -‘
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: rm -rf /var/lib/one/datastores/130/23214″ failed: ssh: Could not resolve hostname opennebula: Name or service not known
Wed Sep 6 12:31:52 2017 [Z0][TM][E]: Error copying disk directory to target host
Wed Sep 6 12:31:52 2017 [Z0][TM][I]: ExitCode: 255
Wed Sep 6 12:31:53 2017 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Wed Sep 6 12:31:53 2017 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY_FAILURE
Wed Sep 6 12:34:36 2017 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY

When using NFS System Datastore then Undeployment works as expected.

The question is why for the controller «opennebula» is used instead of «opennebula.example.com»? Do I have to configure it somewhere?
Temporary fix is to add «opennebula» with IP to /etc/hosts. But would be nice to fix it differently so we don’t have to change /etc/hosts on all blades in case we have to change the IP of controller 🙂

The text was updated successfully, but these errors were encountered:

Источник

Failure during undeploy on the same host as sunstone #3010

Description
When I run undeploy from athe Sunstone server I get an error if VM located on the same host as Sunstone. It doesn’t happen if VM is deployed on the different node.

To Reproduce
Login into Sunstone web interface and execute undeploy of the VM instance located on the same node

Expected behavior
VM instance becomes shutdown first and then changes status to undeployed.

Details

Affected Component: Storage
Hypervisor: KVM
Version: 5.6.1,5.8.0

Additional context
Here is the log from that node

All of my VMs are using shared ceph store. A similar issue may be discussed at the forum

But no resolution in sight.

Here is my store config

Progress Status

Branch created
Code committed to development branch
Testing — QA
Documentation
Release notes — resolved issues, compatibility, known issues
Code committed to upstream release/hotfix branches
Documentation committed to upstream release/hotfix branches

The text was updated successfully, but these errors were encountered:

The same issue is reproducible in 5.8.0

mv has a check that verifies the source and destination are the same, but it does it based on string comparison

if virt1n1-chi == virt1n2-chi.xcastlabs.net in mv virt1n1-chi:/var/lib/one//datastores/100/145 virt1n2-chi.xcastlabs.net:/var/lib/one//datastores/100/145 145 100 it isn’t skipped and then fails.

I managed to reproduce it calling the frontend node localhost, while the «real» name is ubuntu1804-lxd-ceph-luminous-e8941-0.test , so the undeploy attempts to move from localhost to ubuntu1804-lxd-ceph-luminous-e8941-0.test and fails doing so.

So, the check could be improved to identify if both hosts are the same

In my scenario ubuntu1804-lxd-ceph-luminous-e8941-0.test is the frontend, and it was added as a host twice, with that name and also localhost.

I successfully undeployed the VM 16, which was deployed on ubuntu1804-lxd-ceph-luminous-e8941-0.test

And VM 15 failed, when undeploying from localhost

Can you check that is your case ?

# onehost list —csv ID,NAME,CLUSTER,TVM,ALLOCATED_CPU,ALLOCATED_MEM,STAT 7,virt1n5-chi.xcastlabs.net,default,6,3000 / 4000 (75%),28G / 125.6G (22%),on 6,virt1n4-chi.xcastlabs.net,default,5,2800 / 4000 (70%),26G / 125.6G (20%),on 3,virt1n3-chi.xcastlabs.net,default,5,1600 / 2400 (66%),22G / 94.4G (23%),on 2,virt1n2-chi.xcastlabs.net,default,4,1400 / 2400 (58%),20G / 94.4G (21%),on 1,virt1n1-chi.xcastlabs.net,default,3,1200 / 2400 (50%),12G / 94.4G (12%),on They are all unique and fqdn

$ onehost list ID NAME CLUSTER TVM ALLOCATED_CPU ALLOCATED_MEM STAT 4 ubuntu1804-lxd- default 0 0 / 100 (0%) 0K / 985.4M (0%) on 2 localhost default 1 10 / 100 (10%) 128M / 985.4M (12% on 1 ubuntu1804-lxd- default 0 0 / 100 (0%) 0K / 985.4M (0%) dsbl 0 ubuntu1804-lxd- default 0 0 / 100 (0%) 0K / 985.4M (0%) dsbl I successfully undeployed the VM 16, which was deployed on ubuntu1804-lxd-ceph-luminous-e8941-0.test [image: image] And VM 15 failed, when undeploying from localhost [image: image] Can you check that is your case ? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or mute the thread .

Well, in your case, /var/lib/one/remotes/tm/ceph/mv virt1n1-chi:/var/lib/one//datastores/100/145 virt1n2-chi.xcastlabs.net:/var/lib/one//datastores/100/145 145 100 the first host corresponds to BRIDGE_LIST=»virt1n1-chi» , is this host (which is absent on your virtualization cluster) the same as one of the hosts ?

Lines 63 to 66 in d936158

if [ » $SRC » == » $DST » ] ; then

log » Not moving $SRC to $DST , they are the same path «

exit 0

The comparison is done by literal strings, and it seems fails to run under your conditions

What is odd in this case is the fact that even with FQDN

virt1n1 != virt1n2 and the issue you are experiencing happens to me when they are the same, but named differently, that’s the way I can reproduce your issue.

with FQDNs like the output of onehost list ?

one or the other, just to be the same

We haven’t had more feedback, we can reopen if necessary

Sorry for the delay and confusion. The bridge list was changed to virt1n1-chi.xcastlabs.net. I am also in the process of installing a new opennebula system in a different DC. For this one I have built packages for 5.8.3, was the issue resolved?

Sorry, could you reopen this?

So, with, virt1n1-chi.xcastlabs.net still happens ?

Became worse. I can no longer migrate. BTW I am still on 5.8.0

Tue Jun 25 08:43:23 2019 [Z0][VM][I]: New LCM state is SHUTDOWN_UNDEPLOY
Tue Jun 25 08:43:30 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:43:30 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: shutdown.
Tue Jun 25 08:43:30 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:43:31 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:43:31 2019 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue Jun 25 08:43:31 2019 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: Command execution failed (exit code: 2): /var/lib/one/remotes/tm/ceph/mv virt1n1-chi.xcastlabs.net:/var/lib/one//datastores/100/147 virt1n1-chi:/var/lib/one//datastores/100/147 147 100
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: mv: Moving virt1n1-chi.xcastlabs.net:/var/lib/one/datastores/100/147 to virt1n1-chi:/var/lib/one/datastores/100/147
Tue Jun 25 08:43:33 2019 [Z0][TM][E]: mv: Command «set -e -o pipefail
Tue Jun 25 08:43:33 2019 [Z0][TM][I]:
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: tar -C /var/lib/one/datastores/100 —sparse -cf — 147 | ssh virt1n1-chi ‘tar -C /var/lib/one/datastores/100 —sparse -xf -‘
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: rm -rf /var/lib/one/datastores/100/147″ failed: Warning: Permanently added ‘virt1n1-chi.xcastlabs.net’ (ECDSA) to the list of known hosts.
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: tar: 147: Cannot stat: No such file or directory
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Tue Jun 25 08:43:33 2019 [Z0][TM][I]: Warning: Permanently added ‘virt1n1-chi’ (ECDSA) to the list of known hosts.
Tue Jun 25 08:43:33 2019 [Z0][TM][E]: Error copying disk directory to target host
Tue Jun 25 08:43:33 2019 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Tue Jun 25 08:43:33 2019 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY_FAILURE
Tue Jun 25 08:44:23 2019 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Tue Jun 25 08:44:23 2019 [Z0][VM][I]: New state is UNDEPLOYED
Tue Jun 25 08:44:23 2019 [Z0][VM][I]: New LCM state is LCM_INIT
Tue Jun 25 08:45:02 2019 [Z0][VM][I]: New state is PENDING
Tue Jun 25 08:45:03 2019 [Z0][VM][I]: New state is ACTIVE
Tue Jun 25 08:45:03 2019 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: Command execution failed (exit code: 2): /var/lib/one/remotes/tm/ceph/mv virt1n1-chi:/var/lib/one//datastores/100/147 virt1n5-chi.xcastlabs.net:/var/lib/one//datastores/100/147 147 100
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: mv: Moving virt1n1-chi:/var/lib/one/datastores/100/147 to virt1n5-chi.xcastlabs.net:/var/lib/one/datastores/100/147
Tue Jun 25 08:45:05 2019 [Z0][TM][E]: mv: Command «set -e -o pipefail
Tue Jun 25 08:45:05 2019 [Z0][TM][I]:
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: tar -C /var/lib/one/datastores/100 —sparse -cf — 147 | ssh virt1n5-chi.xcastlabs.net ‘tar -C /var/lib/one/datastores/100 —sparse -xf -‘
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: rm -rf /var/lib/one/datastores/100/147″ failed: Warning: Permanently added ‘virt1n1-chi’ (ECDSA) to the list of known hosts.
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: tar: 147: Cannot stat: No such file or directory
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Tue Jun 25 08:45:05 2019 [Z0][TM][I]: Warning: Permanently added ‘virt1n5-chi.xcastlabs.net,10.0.28.21’ (ECDSA) to the list of known hosts.
Tue Jun 25 08:45:05 2019 [Z0][TM][E]: Error copying disk directory to target host
Tue Jun 25 08:45:05 2019 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Tue Jun 25 08:45:05 2019 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY_FAILURE
Tue Jun 25 08:45:47 2019 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY
Tue Jun 25 08:45:47 2019 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/147/deployment.2
Tue Jun 25 08:45:48 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue Jun 25 08:45:49 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:45:49 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:45:49 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Jun 25 08:45:51 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:45:51 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Tue Jun 25 08:45:53 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:45:53 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:45:53 2019 [Z0][VMM][I]: Successfully execute network driver operation: post.
Tue Jun 25 08:45:53 2019 [Z0][VM][I]: New LCM state is RUNNING
Tue Jun 25 08:48:24 2019 [Z0][VM][I]: New LCM state is MIGRATE
Tue Jun 25 08:48:26 2019 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Tue Jun 25 08:48:26 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:48:27 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 08:48:27 2019 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Jun 25 08:48:27 2019 [Z0][VMM][I]: Command execution fail: cat

I fixed apparmor issue. It was not related. However, after making bridge list on ceph datastores virt1n1-chi.xcastlabs.net, I still see this

Tue Jun 25 12:03:17 2019 [Z0][VM][I]: New LCM state is SHUTDOWN_UNDEPLOY
Tue Jun 25 12:03:25 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 12:03:25 2019 [Z0][VMM][I]: Successfully execute virtualization driver operation: shutdown.
Tue Jun 25 12:03:26 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 12:03:26 2019 [Z0][VMM][I]: ExitCode: 0
Tue Jun 25 12:03:26 2019 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue Jun 25 12:03:26 2019 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: Command execution failed (exit code: 2): /var/lib/one/remotes/tm/ceph/mv virt1n1-chi.xcastlabs.net:/var/lib/one//datastores/100/147 virt1n1-chi:/var/lib/one//datastores/100/147 147 100
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: mv: Moving virt1n1-chi.xcastlabs.net:/var/lib/one/datastores/100/147 to virt1n1-chi:/var/lib/one/datastores/100/147
Tue Jun 25 12:03:29 2019 [Z0][TM][E]: mv: Command «set -e -o pipefail
Tue Jun 25 12:03:29 2019 [Z0][TM][I]:
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: tar -C /var/lib/one/datastores/100 —sparse -cf — 147 | ssh virt1n1-chi ‘tar -C /var/lib/one/datastores/100 —sparse -xf -‘
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: rm -rf /var/lib/one/datastores/100/147″ failed: Warning: Permanently added ‘virt1n1-chi.xcastlabs.net’ (ECDSA) to the list of known hosts.
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: tar: 147: Cannot stat: No such file or directory
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Tue Jun 25 12:03:29 2019 [Z0][TM][I]: Warning: Permanently added ‘virt1n1-chi’ (ECDSA) to the list of known hosts.
Tue Jun 25 12:03:29 2019 [Z0][TM][E]: Error copying disk directory to target host
Tue Jun 25 12:03:29 2019 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Tue Jun 25 12:03:29 2019 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY_FAILURE

So I don’t understand where virt1n1-ch (short name) is coming from.

I think problem may be somewhere in host attributes.

HOSTNAME | virt1n1-chi
I changed it via web to FQDN but as I came back to the page the change reverted back to a short host name.

After changing hostnames via hostctl on every node, I restarted opennebula and opennebula-sunstone on virt1n1-chi.xcastlabs.net. I also restarted libvirtd on the same node just in case. I then migrate VM to the virt1n1-chi.xcastlabs.net and then undeploy worked without failure.

Still, I believe your software needs to detect node via both short and FQDN names.

Glad things worked out in the end. Could you open a feature request regarding this situation ? You can always reference this issue for the additional context.

I am having a hard time to describe this feature. Match localhost to FQDN and short host names perhaps?

Yes, somehow take into account short and FQDN to avoid false checks.

Источник

Logging

Every OpenNebula server generates logs with a configurable verbosity (level of detail) and through different means (file, syslog, or standard error output) to allow cloud administrators to troubleshoot the potential problems. Logs are stored in /var/log/one/ on a Front-end Host with a particular component. Some valuable error messages can be also seen by the end-users in :ref:`CLI <cli>` tools or the :ref:`Sunstone GUI <sunstone>`.

Configure Logging System

Follow the guides of each component to find the logs’ location and configuration of log verbosity:

OpenNebula Daemon: :ref:`logs <oned_conf_service>`, :ref:`configuration <oned_conf>` (parameter LOG/DEBUG_LEVEL)
Scheduler: :ref:`logs <sched_conf_service>`, :ref:`configuration <sched_conf>` (parameter LOG/DEBUG_LEVEL)
Monitoring: :ref:`logs <mon_conf_service>`, :ref:`configuration <mon_conf>` (parameter LOG/DEBUG_LEVEL)
Sunstone: :ref:`logs <sunstone_conf_service>`, :ref:`configuration <sunstone_conf>` (parameter :debug_level)
FireEdge: :ref:`logs <fireedge_conf_service>`, :ref:`configuration <fireedge_conf>` (parameter log)
OneFlow: :ref:`logs <oneflow_conf_service>`, :ref:`configuration <oneflow_conf>` (parameter :debug_level)
OneGate: :ref:`logs <onegate_conf_service>`, :ref:`configuration <onegate_conf>` (parameter :debug_level)

After changing the logging level, don’t forget to restart the service so that it can take effect.

Important

Logs are rotated on (re)start of a particular component. Find a historic log alongside the current logs with date/time suffixes (e.g., latest /var/log/one/oned.log might have the following historic log /var/log/one/oned.log-20210321-1616319097, or an even older compressed log /var/log/one/oned.log-20210314-1615719402.gz)

Additional Resources

As well as the common service logs, the following are other places to investigate and troubleshoot problems:

Virtual Machines: The information specific to a VM will be dumped in the log file /var/log/one/<vmid>.log. All VMs controlled by OpenNebula have their own directory, /var/lib/one/vms/<VID> if syslog/stderr isn’t enabled. You can find the following information in it:
- Deployment description files : Stored in deployment.<EXECUTION>, where <EXECUTION> is the sequence number in the execution history of the VM (deployment.0 for the first host, deployment.1 for the second and so on).
- Transfer description files : Stored in transfer.<EXECUTION>.<OPERATION>, where <EXECUTION> is the sequence number in the execution history of the VM, and <OPERATION> is the stage where the script was used, e.g. transfer.0.prolog, transfer.0.epilog, or transfer.1.cleanup.
Drivers: Each driver can have its ONE_MAD_DEBUG variable activated in RC files. If enabled, the error information will be dumped in /var/log/one/name-of-the-driver-executable.log. Log information from the drivers is in oned.log.

OpenNebula Daemon Log Format

The structure of OpenNebula Daemon log messages for a file based logging system is the following:

date [Z<zone_id>][module][log_level]: message body

In the case of syslog it follows the standard:

date hostname process[pid]: [Z<zone_id>][module][log_level]: message

where the zone_id is the ID of the Zone in the federation (0 for single Zone setups), the module is any of the internal OpenNebula components (VMM, ReM, TM, etc.), and the log_level is a single character indicating the log level (I for informational, D for debugging, etc.).

For syslog, OpenNebula will also log the Virtual Machine events like this:

date hostname process[pid]: [VM id][Z<zone_id>][module][log_level]: message

and similarly for stderr logging.

For oned and VM events the formats are:

date [Z<zone_id>][module][log_level]: message
date [VM id][Z<zone_id>][module][log_level]: message

Infrastructure Failures

Virtual Machines

The causes of Virtual Machine errors can be found in the details of VM. Any VM owner or cloud administrator can see the error via the onevm show $ID command (or in the Sunstone GUI). For example:

.. prompt:: bash $ auto

    $ onevm show 0
    VIRTUAL MACHINE 0 INFORMATION
    ID                  : 0
    NAME                : one-0
    USER                : oneadmin
    GROUP               : oneadmin
    STATE               : ACTIVE
    LCM_STATE           : PROLOG_FAILED
    START TIME          : 07/19 17:44:20
    END TIME            : 07/19 17:44:31
    DEPLOY ID           : -

    VIRTUAL MACHINE MONITORING
    NET_TX              : 0
    NET_RX              : 0
    USED MEMORY         : 0
    USED CPU            : 0

    VIRTUAL MACHINE TEMPLATE
    CONTEXT=[
      FILES=/tmp/some_file,
      TARGET=hdb ]
    CPU=0.1
    ERROR=[
      MESSAGE="Error executing image transfer script: Error copying /tmp/some_file to /var/lib/one/0/images/isofiles",
      TIMESTAMP="Tue Jul 19 17:44:31 2011" ]
    MEMORY=64
    NAME=one-0
    VMID=0

    VIRTUAL MACHINE HISTORY
     SEQ        HOSTNAME ACTION           START        TIME       PTIME
       0          host01   none  07/19 17:44:31 00 00:00:00 00 00:00:00

The error message here (see ERROR=[MESSAGE="Error executing image...) shows an error when copying an image (file /tmp/some_file). The source file most likely doesn’t exist. Alternatively, you can check the detailed log of a particular VM in /var/log/one/$ID.log (in this case the VM has ID 0 and the log file would be /var/log/one/0.log)

Recover from VM Failure

The overall state of a virtual machine in a failure condition will show as failure (or fail in the CLI). To find out the specific failure situation you need to check the LCM_STATE of the VM in the VM info tab (or onevm show in the CLI.). Moreover, a VM can be stuck in a transition (e.g. boot or save) because of a host or network failure. Typically these operations will eventually time out and lead to a VM failure state.

The administrator has the ability to force a recovery action from Sunstone or from the CLI, with the onevm recover command. This command has the following options:

--success: If the operation has been confirmed to succeed. For example, the administrator can see the VM properly running in the hypervisor, but the driver failed to inform OpenNebula of the successful boot.
--failure: This will have the same effect as a driver reporting a failure. It is intended for VMs that get stuck in transient states. As an example, if a storage problem occurs and the administrator knows that a VM stuck in prolog is not going to finish the pending transfer, this action will manually move the VM to prolog_failure.
--retry: To retry the previously failed action. It can be used, for instance, if a VM is in boot_failure because the hypervisor crashed. The administrator can tell OpenNebula to retry the boot after the hypervisor is started again.
--retry --interactive: In some scenarios where the failure was caused by an error in the Transfer Manager actions, each action can be rerun and debugged until it works. Once the commands are successful, a success should be sent. See the specific section below for more details.
--delete: No recovery action possible, delete the VM. This is equivalent to the deprecated OpenNebula < 5.0 command: onevm delete.
--delete-db: No recover action possible, delete the VM from the DB. It does not trigger any action on the hypervisor.
--recreate: No recovery action possible, delete and recreate the VM. This is equivalent to the deprecated OpenNebula < 5.0 command: onevm delete --recreate.

Note also that OpenNebula will try to automatically recover some failure situations using the monitor information. A specific example is that a VM in the boot_failure state will become running if the monitoring reports that the VM was found running in the hypervisor.

Hypervisor Problems

The following list details failure states caused by errors related to the hypervisor.

BOOT_FAILURE: The VM failed to boot but all the files needed by the VM are already in the Host. Check the hypervisor logs to find out the problem and, once fixed, recover the VM with the retry option.
BOOT_MIGRATE_FAILURE: same as above but during a migration. Check the target hypervisor and retry the operation.
BOOT_UNDEPLOY_FAILURE: same as above but during a resume after an undeploy. Check the target hypervisor and retry the operation.
BOOT_STOPPED_FAILURE: same as above but during a resume after a stop. Check the target hypervisor and retry the operation.

Transfer Manager / Storage Problems

The following list details failure states caused by errors in the Transfer Manager driver. These states can be recovered by checking the vm.log and looking for the specific error (disk space, permissions, misconfigured datastore, etc). You can execute --retry to relaunch the Transfer Manager actions after fixing the problem (freeing disk space, etc). You can execute --retry --interactive to launch a Transfer Manager Interactive Debug environment that will allow you to: (1) see all the TM actions in detail (2) relaunch each action until it’s successful (3) skip TM actions.

PROLOG_FAILURE: there was a problem setting up the disk images needed by the VM.
PROLOG_MIGRATE_FAILURE: problem setting up the disks in the target host.
EPILOG_FAILURE: there was a problem processing the disk images (may be discard or save) after the VM execution.
EPILOG_STOP_FAILURE: there was a problem moving the disk images after a stop.
EPILOG_UNDEPLOY_FAILURE: there was a problem moving the disk images after an undeploy.
PROLOG_MIGRATE_POWEROFF_FAILURE: problem restoring the disk images after a migration in a poweroff state.
PROLOG_MIGRATE_SUSPEND_FAILURE: problem restoring the disk images after a migration in a suspend state.
PROLOG_RESUME_FAILURE: problem restoring the disk images after a stop.
PROLOG_UNDEPLOY_FAILURE: problem restoring the disk images after an undeploy.

Here’s an example of a Transfer Manager Interactive Debug environment (onevm recover <id> --retry --interactive):

.. prompt:: bash $ auto

    $ onevm show 2|grep LCM_STATE
    LCM_STATE           : PROLOG_UNDEPLOY_FAILURE

    $ onevm recover 2 --retry --interactive
    TM Debug Interactive Environment.

    TM Action list:
    (1) MV shared haddock:/var/lib/one//datastores/0/2/disk.0 localhost:/var/lib/one//datastores/0/2/disk.0 2 1
    (2) MV shared haddock:/var/lib/one//datastores/0/2 localhost:/var/lib/one//datastores/0/2 2 0

    Current action (1):
    MV shared haddock:/var/lib/one//datastores/0/2/disk.0 localhost:/var/lib/one//datastores/0/2/disk.0 2 1

    Choose action:
    (r) Run action
    (n) Skip to next action
    (a) Show all actions
    (q) Quit
    > r

    LOG I  Command execution fail: /var/lib/one/remotes/tm/shared/mv haddock:/var/lib/one//datastores/0/2/disk.0 localhost:/var/lib/one//datastores/0/2/disk.0 2 1
    LOG I  ExitCode: 1

    FAILURE. Repeat command.

    Current action (1):
    MV shared haddock:/var/lib/one//datastores/0/2/disk.0 localhost:/var/lib/one//datastores/0/2/disk.0 2 1

    Choose action:
    (r) Run action
    (n) Skip to next action
    (a) Show all actions
    (q) Quit
    > # FIX THE PROBLEM...

    > r

    SUCCESS

    Current action (2):
    MV shared haddock:/var/lib/one//datastores/0/2 localhost:/var/lib/one//datastores/0/2 2 0

    Choose action:
    (r) Run action
    (n) Skip to next action
    (a) Show all actions
    (q) Quit
    > r

    SUCCESS

    If all the TM actions have been successful and you want to
    recover the Virtual Machine to the RUNNING state execute this command:
    $ onevm recover 2 --success

    $ onevm recover 2 --success

    $ onevm show 2|grep LCM_STATE
    LCM_STATE           : RUNNING

Hosts

Host errors can be investigated via the onehost show $ID command. For example:

.. prompt:: text $ auto

    $ onehost show 1
    HOST 1 INFORMATION
    ID                    : 1
    NAME                  : host01
    STATE                 : ERROR
    IM_MAD                : im_kvm
    VM_MAD                : vmm_kvm
    TM_MAD                : tm_shared

    HOST SHARES
    MAX MEM               : 0
    USED MEM (REAL)       : 0
    USED MEM (ALLOCATED)  : 0
    MAX CPU               : 0
    USED CPU (REAL)       : 0
    USED CPU (ALLOCATED)  : 0
    TOTAL VMS             : 0

    MONITORING INFORMATION
    ERROR=[
      MESSAGE="Error monitoring host 1 : MONITOR FAILURE 1 Could not update remotes",
      TIMESTAMP="Tue Jul 19 17:17:22 2011" ]

The error message here (see ERROR=[MESSAGE="Error monitoring host...) shows an error when updating remote drivers on a host. To get more information, you have to check OpenNebula Daemon log (/var/log/one/oned.log) and, for example, see this relevant error:

Tue Jul 19 17:17:22 2011 [InM][I]: Monitoring host host01 (1)
Tue Jul 19 17:17:22 2011 [InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. host01:/var/tmp/one
Tue Jul 19 17:17:22 2011 [InM][I]: ssh: Could not resolve hostname host01: nodename nor servname provided, or not known
Tue Jul 19 17:17:22 2011 [InM][I]: lost connection
Tue Jul 19 17:17:22 2011 [InM][I]: ExitCode: 1
Tue Jul 19 17:17:22 2011 [InM][E]: Error monitoring host 1 : MONITOR FAILURE 1 Could not update remotes

The error message (Could not resolve hostname) explains there is the incorrect hostname of OpenNebula Host, which can’t be resolved in DNS.

Источник

Small problem encountered when instantiating a virtual machine on OpenNebula ().

PROBLEM

[oneadmin@opennebula01 .ssh]$ tail -f /var/log/one/1.log
Mon Mar 28 00:15:43 2016 [Z0][DiM][I]: New VM state is ACTIVE.
Mon Mar 28 00:15:43 2016 [Z0][LCM][I]: New VM state is PROLOG.
Mon Mar 28 00:15:43 2016 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/shared/clone opennebula01:/var/lib/one//datastores/1/ac152a0c5c61978e8448f68ade249a98 mdskvm-p01:/var/lib/one//datastores/0/1/disk.0 1 1
Mon Mar 28 00:15:43 2016 [Z0][TM][I]: clone: Cloning /var/lib/one/datastores/1/ac152a0c5c61978e8448f68ade249a98 in mdskvm-p01:/var/lib/one//datastores/0/1/disk.0
Mon Mar 28 00:15:43 2016 [Z0][TM][E]: clone: Command «cd /var/lib/one/datastores/0/1; cp /var/lib/one/datastores/1/ac152a0c5c61978e8448f68ade249a98 /var/lib/one/datastores/0/1/disk.0» failed: Warning: Permanently added ‘mdskvm-p01,192.168.0.60’ (ECDSA) to the list of known hosts.
Mon Mar 28 00:15:43 2016 [Z0][TM][I]: cp: cannot stat ‘/var/lib/one/datastores/1/ac152a0c5c61978e8448f68ade249a98’: No such file or directory
Mon Mar 28 00:15:43 2016 [Z0][TM][E]: Error copying opennebula01:/var/lib/one//datastores/1/ac152a0c5c61978e8448f68ade249a98 to mdskvm-p01:/var/lib/one//datastores/0/1/disk.0
Mon Mar 28 00:15:43 2016 [Z0][TM][I]: ExitCode: 1
Mon Mar 28 00:15:43 2016 [Z0][TM][E]: Error executing image transfer script: Error copying opennebula01:/var/lib/one//datastores/1/ac152a0c5c61978e8448f68ade249a98 to mdskvm-p01:/var/lib/one//datastores/0/1/disk.0
Mon Mar 28 00:15:43 2016 [Z0][DiM][I]: New VM state is FAILED

[oneadmin@opennebula01 .ssh]$

SOLUTION

Ensure that the mount is possible and that the NFS daemon is running on OpenNebula:

/etc/fstab
192.168.0.70:/var/lib/one/ /var/lib/one/ nfs soft,intr,rsize=8192,wsize=8192,noauto

[oneadmin@opennebula01 .ssh]$ cat /etc/exports
/var/lib/one/ *(rw,sync,no_subtree_check,root_squash)
[oneadmin@opennebula01 .ssh]$ ps -ef|grep -i nfs
oneadmin 12481 9559 0 00:22 pts/0 00:00:00 grep –color=auto -i nfs
[oneadmin@opennebula01 .ssh]$
[oneadmin@opennebula01 .ssh]$

[root@opennebula01 ~]# systemctl start nfs
[root@opennebula01 ~]#

Alos if you note above, I had used noauto in /etc/fstab. Change to auto if you trust the NFS won’t lock up your boot order getting stuck on NFS.

Cheers,
TK

This entry was posted

on Sunday, March 27th, 2016 at 11:26 pm and is filed under NIX Posts.
You can follow any responses to this entry through the RSS 2.0 feed.

You can leave a response, or trackback from your own site.

Источник

Добрый день уважаемые товарищи.

Можно сказать я новичок в работе с Linux серверами, так как до этого был Windows админ. Но нынешние события в мире заставляют перепрофилироваться.
Так вот у меня есть вопрос на который я никак не могу найти ответ.
Дано 3 сервера.
1й сервер управления opennebula
2й сервер виртуализации
3й сервер виртуализации

Делал все согласно инструкциям.
Однако не все так просто.

Сервер управления работает вопросов нет, сервера виртуализации тоже работают.
SSH между ними работает. То есть подключаюсь без паролей и все отлично.
Хосты добавлены в opennebuls видятся и мониторятся, то есть SSH работает.
Но при попытке создать виртуальную машину я получаю ошибку такого вида:

Sat Mar 12 23:31:22 2022 : Error executing image transfer script: Error copying Linuxmc:/var/lib/one//datastores/1/ab6e177654ddff4212cca10f1418744d to linuxv01:/var/lib/one//datastores/0/15/disk.0

В логах я вижу детали:

Sat Mar 12 23:31:22 2022 [Z0][TM][E]: ln: Command «scp -r Linuxmc:/var/lib/one//datastores/1/ab6e177654ddff4212cca10f1418744d linuxv01:/var/lib/one//datastores/0/15/disk.0» failed: ssh: Host key verification failed.

Если я правильно понимаю, то не работает ключ SSH.
Но тогда возникает вопрос, а как хосты я сумел подключить и они видятся и мониторятся.

Буду очень признателен за помощь.

Записан

ssh: Host key verification failed.

Предлагаю проверить файл

.ssh/known_hosts

. Допускаю, что в нём устаревший отпечаток ключа той машины. Предлагаю попробовать удалить строку с машиной, на которую ругается.
А если среда пока ещё тестовая, не production, так и вообще можно весь файл удалить. ;-)
НО! Всегда создавайте резервные копии.

Ещё у

ssh

есть параметр

-o StrictHostKeyChecking

Записан

Вы знаете я нашел ошибку. вы были правы что дело было в SSH. но не в файле known_hosts а в публичном ключе. недокопировал в шаблон всю строчку со значением ключа.
И даже что то пошло работать после прописи ключа, но вышла новая ошибка

error: внутренняя ошибка: процесс завершил работу во время подключения к монитору: Could not access KVM kernel module: Permission denied

Среда тестовая, учимся виртуализации на Linux).

Записан

И это исправил.

vmusers должна быть группа у пользователя oneadmin.
Спасибо за помощь

Можете подсказать толковый мануал по настройке сети в onenebula?
Никак не могу сообразить как прокинуть сеть в созданные виртуалки.

« Последнее редактирование: 13.03.2022 17:55:21 от sunfreeman »

Записан

Андрей Черепанов (cas@)

Спасибо большое. Прочитал многое понял и часть не очень.
Сделал согласно инструкциям, а именно:

Сетевые интерфейсы назвал на всех 3 серверах одинаково. например ens3
На 2х серверах виртуализации добавил сетевой мост, назвал его br1

В open nebula создал виртуальную сеть, выбрал режим моста указал идентификатор моста br1 и физический интерфейс ens3.
В параметрах свойств адресации сети выбрал ether количество адресов 250 шт, так как у меня есть внешний DHCP и DNS и мне не нужно, чтобы Opennebula сама раздавала и назначала ip.

Установил 1ю виртуальную машину, назначил после установки ip. Сеть поднялась, работает.
Начал создавать вторую машину, и в процессе создания вышли такие ошибки, ошибки гуглил но ничего толкового не сумел найти
Я понимаю что что то связано с мостами, но только что.

Tue Mar 15 08:40:13 2022 [Z0][VM]: New state is ACTIVE
Tue Mar 15 08:40:13 2022 [Z0][VM]: New LCM state is BOOT_POWEROFF
Tue Mar 15 08:40:13 2022 [Z0][VMM]: Generating deployment file: /var/lib/one/vms/40/deployment.8
Tue Mar 15 08:40:13 2022 [Z0][VMM]: Successfully execute transfer manager driver operation: tm_context.
Tue Mar 15 08:40:13 2022 [Z0][VMM]: Command execution fail: cat << EOT | /var/tmp/one/vnm/bridge/pre
Tue Mar 15 08:40:13 2022 [Z0][VMM]: /var/tmp/one/vnm/vlan.rb:210:in «’: No such file or directory — sudo (Errno::ENOENT)
Tue Mar 15 08:40:13 2022 [Z0][VMM]: from /var/tmp/one/vnm/vlan.rb:210:in `list_bridges’
Tue Mar 15 08:40:13 2022 [Z0][VMM]: from /var/tmp/one/vnm/vlan.rb:36:in `activate’
Tue Mar 15 08:40:13 2022 [Z0][VMM]: from /var/tmp/one/vnm/bridge/pre:29:in `<main>’
Tue Mar 15 08:40:13 2022 [Z0][VMM]: ExitCode: 1
Tue Mar 15 08:40:13 2022 [Z0][VMM]: Failed to execute network driver operation: pre.
Tue Mar 15 08:40:13 2022 [Z0][VMM][E]: Error deploying virtual machine: bridge: —

У меня даже возникла мысль, что под каждую виртуальную машину, нужно создавать свою виртуальную сеть с другим идентификатором моста, делать мосты на хостах с этим именем. Но мне кажется я ошибаюсь. И проблема в чем то другом.

И вопрос по хранилищам, не совсем понимаю схему работы.
Созданная виртуалка запущена на хосте виртуализации 1, но хранится она получается на сервере управления? Не на самом хосту получается. (Опыт vmware и hyper-v, в них при развертывании виртуалки все хранится на хосту где размещается виртуалка)
И если я правильно понял, то чтобы виртулка хранилась на самом хосту надо добавить хранилище этого хоста в управлении хранилищами, и уже при создании виртуалки указать необходимое хранилище, и тогда виртуалка будет хранится там где надо.

« Последнее редактирование: 15.03.2022 09:22:36 от sunfreeman »

Записан

Попробовал следующую конфигурацию:
на 1м хосте мост с именем br1
на 2м хосте мост с именем br2
Виртуальную сеть с именем net1 для первого хоста и net2 для второго хоста.

Создал 2 виртуальные машины на 1м хосте. Сеть поднялась на обеих машинах.
Создал виртуальную машину на 2м хосте, и снова ошибка.

Может быть надо в какие то конфигурационные файлы добавлять названия мостов на сервере управления?

Записан

Разобрался с сетевой ошибкой. Каким то образом oneadmin не был включен в группу wheel.

На радостях обновил все 3 машины и теперь не стартует ни одна виртуалка.

Wed Mar 16 17:35:18 2022 [Z0][VM]: New state is ACTIVE
Wed Mar 16 17:35:18 2022 [Z0][VM]: New LCM state is BOOT_POWEROFF
Wed Mar 16 17:35:18 2022 [Z0][VMM]: Generating deployment file: /var/lib/one/vms/36/deployment.5
Wed Mar 16 17:35:18 2022 [Z0][VMM]: Successfully execute transfer manager driver operation: tm_context.
Wed Mar 16 17:35:18 2022 [Z0][VMM]: ExitCode: 0
Wed Mar 16 17:35:18 2022 [Z0][VMM]: Successfully execute network driver operation: pre.
Wed Mar 16 17:35:18 2022 [Z0][VMM]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/0/36/deployment.5’ ‘linuxv01’ 36 linuxv01
Wed Mar 16 17:35:18 2022 [Z0][VMM]: error: Failed to create domain from /var/lib/one//datastores/0/36/deployment.5
Wed Mar 16 17:35:18 2022 [Z0][VMM]: error: внутренняя ошибка: процесс завершил работу во время подключения к монитору: 2022-03-16T14:33:10.959269Z qemu-system-x86_64: The -accel and «-machine accel=» options are incompatible
Wed Mar 16 17:35:18 2022 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/36/deployment.5
Wed Mar 16 17:35:18 2022 [Z0][VMM]: ExitCode: 255
Wed Mar 16 17:35:18 2022 [Z0][VMM]: Failed to execute virtualization driver operation: deploy.
Wed Mar 16 17:35:18 2022 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/36/deployment.5
Wed Mar 16 17:35:18 2022 [Z0][VM]: New state is POWEROFF
Wed Mar 16 17:35:18 2022 [Z0][VM]: New LCM state is LCM_INIT

Что на этот раз поломалось?

Записан

Андрей Черепанов (cas@)

Источник

undeploy fails when using ceph system datastore

Status:	Pending	Start date:	09/06/2017
Priority:	Normal	Due date:
Assignee:	Ruben S. Montero	% Done:	0%
Category:	Core & System
Target version:	Release 5.6
Resolution:		Pull request:
Affected Versions:	OpenNebula 5.4

Description

Hello,

when I use a Ceph System Datastore the undeployment of VMs fails with following error:

When using NFS System Datastore then Undeployment works as expected.

Thanks

History

#2

Updated by Vlastimil Holer over 3 years ago

This is a problem with the core. Frontend hostname is detected by the gethostname, which doesn’t return the FQDN. It can return FQDN only in case the FQDN is set as the hostname.
https://github.com/OpenNebula/one/blob/512da1ee67ee83aef9df736aaa9988349a62d0d2/src/nebula/Nebula.cc#L53

Example 1:

$ hostname
thunder
$ hostname -f
thunder.localdomain

and gethostname returns thunder.

Example 2:

$ hostname
thunder.localdomain
$ hostname -f
thunder.localdomain

and gethostname returns thunder.localdomain.

There’ll have to be more sophisticated frontend FQDN detection, preferably also configurable in the oned.conf (frontend can have multiple IPs for public and cluster-private communication, without the override option it can use wrong interface with e.g. some performance penalty).

#3

Updated by Vlastimil Holer over 3 years ago

Tobias,

Temporary fix is to add «opennebula» with IP to /etc/hosts. But would be nice to fix it differently so we don’t have to change /etc/hosts on all blades in case we have to change the IP of controller

a better fix for you, for now, is to ensure you have FQDN as a hostname before starting the OpenNebula (check with just the «hostname» command without parameters, see examples in my previous comment).

Despite this is a bad practice, it’s now often the recommended way
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-configure_host_names

However, Red Hat recommends that both static and transient names
match the fully-qualified domain name (FQDN) used for the machine
in DNS, such as host.example.com.

Best regards,
Vlastimil

#6

Updated by Vlastimil Holer over 3 years ago

Category changed from Drivers — Storage to Core & System
Assignee changed from Vlastimil Holer to Ruben S. Montero

Also available in: Atom
PDF

Источник

Hi I am trying run virtual machine with OpenNebula by this tutorial:
http://docs.opennebula.org/4.8/design_and_installation/quick_starts/qs_centos7_kvm.html

but when I want to instantiate a template, so vm fails and in log is:

Wed Nov 12 09:48:25 2014 [Z0][DiM][I]: New VM state is ACTIVE.
Wed Nov 12 09:48:25 2014 [Z0][LCM][I]: New VM state is PROLOG.
Wed Nov 12 09:48:25 2014 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/shared/clone localhost.localdomain:/var/lib/one//datastores/1/ea53f24b6bd08f5a59d41f69fe4e356f localhost:/var/lib/one//datastores/0/9/disk.0 9 1
Wed Nov 12 09:48:25 2014 [Z0][TM][I]: clone: Cloning /var/lib/one/datastores/1/ea53f24b6bd08f5a59d41f69fe4e356f in localhost:/var/lib/one//datastores/0/9/disk.0
Wed Nov 12 09:48:25 2014 [Z0][TM][E]: clone: Command "cd /var/lib/one/datastores/0/9; cp /var/lib/one/datastores/1/ea53f24b6bd08f5a59d41f69fe4e356f /var/lib/one/datastores/0/9/disk.0" failed: ssh: connect to host localhost port 22: Connection refused
Wed Nov 12 09:48:25 2014 [Z0][TM][E]: Error copying localhost.localdomain:/var/lib/one//datastores/1/ea53f24b6bd08f5a59d41f69fe4e356f to localhost:/var/lib/one//datastores/0/9/disk.0
Wed Nov 12 09:48:25 2014 [Z0][TM][I]: ExitCode: 255
Wed Nov 12 09:48:25 2014 [Z0][TM][E]: Error executing image transfer script: Error copying localhost.localdomain:/var/lib/one//datastores/1/ea53f24b6bd08f5a59d41f69fe4e356f to localhost:/var/lib/one//datastores/0/9/disk.0
Wed Nov 12 09:48:25 2014 [Z0][DiM][I]: New VM state is FAILED

Do you know what is wrong?
Thanks

asked Nov 12, 2014 at 10:27

The error message tells that you can not connect to localhost using ssh:

failed: ssh: connect to host localhost port 22: Connection refused

Is sshd running and port 22 open in the firewall?

answered Nov 13, 2014 at 9:41

Источник

kotoko

I tried to install opennebula using miniONE v6.6.0 and installation failed. I am using freshly installed Ubuntu 22.04 inside qemu + virt-manager. Also after script failure I can not connect to internet in firefox or in apt-get update. Here is log:

user@linux:~/Downloads$ sudo ./minione  --yes --password abc123 --bridge-interface minionebr

### Checks & detection
Checking AppArmor  SKIP will try to modify
Checking docker is installed  SKIP will try to install
Checking ansible  SKIP will try to install
Checking terraform  SKIP will try to install

### Main deployment steps:
Install OpenNebula frontend version 6.6
Install Terraform
Install Docker
Configure bridge minionebr with IP 172.16.100.1/24
Enable NAT over enp1s0
Modify AppArmor
Install OpenNebula KVM node
Export appliance and update VM template
Install pip 'ansible==2.9.9'

Do you agree? [yes/no]:

### Installation
Updating APT cache  OK
Updating PIP  OK
Install from PyPI 'ansible==2.9.9'  OK
Creating bridge interface minionebr  OK
Bring bridge interfaces up  OK
Enabling IPv4 forward  OK
Persisting IPv4 forward  OK
Configuring NAT using iptables  OK
Saving iptables changes  OK
Installing DNSMasq  retry 1 retry 2 retry 3 FAILED
apt-get -q -y install dnsmasq
--- STDERR ---
E: Nie udau0142o siu0119 pobrau0107 http://security.ubuntu.com/ubuntu/pool/universe/d/dnsmasq/dnsmasq_2.86-1.1ubuntu0.1_all.deb  Tymczasowy bu0142u0105d przy tu0142umaczeniu "pl.archive.ubuntu.com"
E: Nie udau0142o siu0119 pobrau0107 niektu00f3rych archiwu00f3w, proszu0119 spru00f3bowau0107 uruchomiu0107 apt-get update lub uu017cyu0107 opcji --fix-missing.
--------------

Report not shown
and if login in to ui ,download «33 Alpine Linux 3.15» to default datastore and «Instantiate VM Template» and run we get

Driver Error
Sun Jul 3 14:21:47 2022: DEPLOY: INFO: deploy: No block device on /var/lib/one/datastores/0/0/mapper/disk.0 INFO: deploy: No block device on /var/lib/one/datastores/0/0/mapper/disk.1 Running command sudo lxc-ls Running command sudo -n qemu-nbd --fork -c /dev/nbd0 /var/lib/one/datastores/0/0/disk.0 Running command lsblk -o NAME,FSTYPE Running command lsblk -o NAME,FSTYPE Running command lsblk -o NAME,FSTYPE Running command lsblk -o NAME,FSTYPE Running command sudo -n mount /dev/nbd0 /var/lib/one/datastores/0/0/mapper/disk.0 mount: /var/lib/one/datastores/0/0/mapper/disk.0: wrong fs type, bad option, bad superblock on /dev/nbd0, missing codepage or helper program, or other error. Running command sudo -n qemu-nbd -d /dev/nbd0 There was an error creating the containter. ExitCode: 255

user@server:~$ sudo bash minione --lxc -v --password pass
[sudo] password for user:

### Checks & detection
Checking distribution and version [Ubuntu 20.04 6.4]  OK
Checking if OpenNebula repository exists  OK
Checking cpu virtualization capabilities  SKIP QEMU will be used
Checking augeas is installed  OK
Checking curl is installed  OK
Checking add-apt-repository is available  OK
Checking free disk space  OK
Checking directories from previous installation  OK
Checking user from previous installation  OK
Checking sshd service is running  OK
Checking iptables are installed  OK
Checking bridge-utils are installed  SKIP will try to install
Checking apt-transport-https is installed  OK
Checking if gnupg is installed  OK
Checking if ca-certificates is up-to-date  OK
Checking AppArmor  SKIP will try to modify
Checking for present ssh key  OK
Checking local interface [ens18]  OK
Checking (iptables|netfilter)-persistent are installed  SKIP will try to install
Checking minionebr interface is not present  OK
Checking virtual network 172.16.100.0/24 is not routed  OK
Checking docker is installed  SKIP will try to install
Checking python3-pip is installed  OK
Checking ansible  SKIP will try to install
Checking terrafrom version (>= v0.13.6)  OK

### Main deployment steps:
Install OpenNebula frontend version 6.4
Install Docker
Configure bridge minionebr with IP 172.16.100.1/24
Enable NAT over ens18
Modify AppArmor
Install OpenNebula LXC node
Export appliance and update VM template
Install  bridge-utils iptables-persistent netfilter-persistent
Install pip 'ansible==2.9.9'

Do you agree? [yes/no]:
yes

### Installation
Updating APT cache  OK
Install  bridge-utils iptables-persistent netfilter-persistent  OK
Updating PIP  OK
Install from PyPI 'ansible==2.9.9'  OK
Creating bridge interface minionebr  OK
Bring bridge interfaces up  OK
Enabling IPv4 forward  OK
Persisting IPv4 forward  OK
Configuring NAT using iptables  OK
Saving iptables changes  OK
Installing DNSMasq  OK
Starting DNSMasq  OK
Configuring repositories  OK
Updating APT cache  OK
Installing OpenNebula packages  OK
Installing opennebula-provision package   OK
Create docker packages repository  OK
Install docker  OK
Start docker service  OK
Enable docker service  OK
Installing OpenNebula lxc node packages  OK

### Configuration
Generating ssh keypair in /root/.ssh-oneprovision/id_rsa  OK
Add oneadmin to docker group  OK
Update network hooks  OK
Switching OneGate endpoint in oned.conf  OK
Switching OneGate endpoint in onegate-server.conf  OK
Switching keep_empty_bridge on in OpenNebulaNetwork.conf  OK
Switching scheduler interval in oned.conf  OK
Switching to QEMU emulation  OK
Setting initial password for current user and oneadmin  OK
Changing WebUI to listen on port 80  OK
Switching FireEdge public endpoint  OK
Starting OpenNebula services  OK
Enabling OpenNebula services  OK
Add ssh key to oneadmin user  OK
Update ssh configs to allow VM addresses reusing  OK
Ensure own hostname is resolvable  OK
Checking OpenNebula is working  OK
Disabling ssh from virtual network  OK
Adding localhost ssh key to known_hosts  OK
Testing ssh connection to localhost  OK
Updating datastores template  OK
Creating LXC host  OK
Restarting OpenNebula  OK
Creating virtual network  OK
Exporting [alpine_edge - LXD] from Marketplace to local datastore  user@server:~$

ospalax

After deploying minione on the CentOS 8 with this command:

bash minione --yes --password somepass --sunstone-port 9869

Everything is fine but once the machine is rebooted, there is not the iptables Masquerade (SNAT/DNAT) anymore and dnsmasq service is not started (VMs need it to be able resolve dns as it seems).

(base) raphy@pc:~/minione-6.0.0.1$ sudo ./minione 
[sudo] password for raphy: 

### Checks & detection
Checking cpu virtualization capabilities  SKIP QEMU will be used
Checking augeas is installed  SKIP will try to install
Checking bridge-utils are installed  SKIP will try to install
Checking apt-transport-https is installed  SKIP will try to install
Checking AppArmor  SKIP will try to modify
Checking for present ssh key  SKIP
Checking (iptables|netfilter)-persistent are installed  SKIP will try to install
Checking docker is installed  SKIP will try to install
Checking ansible  SKIP will try to install
Checking terraform  SKIP will try to install

### Main deployment steps:
Install OpenNebula frontend version 6.0
Install Terraform
Install Docker
Configure bridge minionebr with IP 172.16.100.1/24
Enable NAT over enp3s0
Modify AppArmor
Install OpenNebula KVM node
Export appliance and update VM template
Install  augeas-tools bridge-utils apt-transport-https iptables-persistent netfilter-persistent
Install pip 'ansible==2.9.9'

Do you agree? [yes/no]:
yes

### Installation
Updating APT cache  OK
Install  augeas-tools bridge-utils apt-transport-https iptables-persistent netfilter-persistent  OK
Updating PIP  OK
Install from PyPI 'ansible==2.9.9'  OK
Creating bridge interface minionebr  OK
Bring bridge interfaces up  OK
Enabling IPv4 forward  OK
Persisting IPv4 forward  OK
Configuring NAT using iptables  OK
Saving iptables changes  OK
Installing DNSMasq  OK
Starting DNSMasq  OK
Configuring repositories  FAILED

--- STDERR ---
Warning: apt-key output should not be parsed (stdout is not a terminal)
gpg: no valid OpenPGP data found.
--------------
(base) raphy@pc:~/minione-6.0.0.1$

How to solve the problem?

Hi,

I’m still using the same env to spawn a minione installation (vagrant with some scripts and fixed ubuntu/ONE combination), but this time it doesn’t work anymore.
I tried several combination of versions (with and without —force options) but I get this error everytime:

...
wget https://github.com/OpenNebula/minione/releases/latest/download/minione
...
sudo bash minione --verbose --yes --version 5.12 --marketapp-name 'Alpine Linux 3.11'
...
minione: Configuring repositories  
minione: OK
minione: Updating APT cache  
minione: FAILED
minione: --- STDERR ---
minione: E: The repository 'https://downloads.opennebula.io/repo/5.12/Ubuntu/20.04 stable Release' does not have a Release file.
minione: --------------
minione: Provisionning script done.

I know it’s a bit light but before going deeper I just wanted to quickly check if something has changed etc.
Any idea ?

Thank you

Hi,
i installed opennebula-mini with bash script on debian10;
I would also like to activate the https functionality through proxy-nginx;
I have a wildcard ssl certificate for my domain, and it works perfectly on other nginx-proxies.
The functionality of opennebula would appear to be respected except the error message below «fireedge public endpoint is not working» and the vnc panel not opening.

/etc/one/sunstone-server.conf

:vnc_proxy_port: 29876
:vnc_proxy_support_wss: only
:vnc_proxy_cert: /etc/ssl/private/example.com.crt
:vnc_proxy_key: /etc/ssl/private/example.com.key
:vnc_proxy_ipv6: false
:vnc_request_password: false
:allow_vnc_federation: no

/etc/nginx/sites-enabled/default

server {

    listen                        8443 ssl;
    server_name             test.example.com;
  
location / {
    proxy_pass      http://127.0.0.1:9869;
    proxy_set_header        X-Real-IP       $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header        host    $host;
    proxy_set_header        X-Forwarded-Server      $host;
    proxy_read_timeout 600s;
    proxy_connect_timeout 600s;
}

location /websockify {
    proxy_pass http://127.0.0.1:29876;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    add_header Access-Control-Allow-Origin *;
}

}

error_page 497 https://$host$request_uri;

ssl_certificate /etc/ssl/private/example.com.crt;
ssl_certificate_key /etc/ssl/private/example.com.key;

ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;

keepalive_requests 150;

I am attaching screenshots
can you help me ?
Thank you

I had to run pip install setuptools to pass this firs step on a fresh Debian 10.

Install from PyPI 'ansible==2.9.9'  retry 1 retry 2 retry 3 FAILED
Collecting ansible==2.9.9
  Using cached ansible-2.9.9.tar.gz (14.2 MB)
--- STDERR ---
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-vIURiS/ansible/setup.py'"'"'; __file__='"'"'/tmp/pip-install-vIURiS/ansible/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'rn'"'"', '"'"'n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-yV7JXF
         cwd: /tmp/pip-install-vIURiS/ansible/
    Complete output (3 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ImportError: No module named setuptools
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Running Opennebula on Xubuntu 20.04.

I don’t understand where this LIMIT_MB=50000 value comes from at all. First — it breaks Minione installer on Marketplace image download stage. Ok, after commenting that out and installing Opennebula I’m trying to add a VM image manually, but then it says that we have «no free space on datastore», while I have 100GB of 190GB free. Only when I remove that parameter from all datastores, everything works fine.
Please don’t add that limit for at least, my Ubuntu 20.04 bare metal target.

2 is deprecated. On the install, it outputs the deprecation notice (see #78).

Description

To Reproduce
Front-end(ubuntu 20.10)

wget 'https://github.com/OpenNebula/minione/releases/latest/download/minione'
chmod +x  ./minione 
sudo bash minione -f

host(ubuntu 20.04.2)

sudo apt-get -y install gnupg wget apt-transport-https
sudo wget -q -O- https://downloads.opennebula.io/repo/repo.key | sudo apt-key add -
sudo echo "deb https://downloads.opennebula.io/repo/6.0/Ubuntu/20.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list
sudo apt-get update
sudo  apt-get -y install opennebula-node-lxc

2)add public key from Front-end to host
3)add host to Front-end

3)download alpine_3.13 — LXD-10 from apps(MarketStore)
4)deploy to host (create instances)

Сurrent behavior

Wed Apr 28 10:08:58 2021: Error executing image transfer script: Error copying <Front-end>:/var/lib/one//datastores/1/b41d4ea3bcadddf0e8316fea40d029f0 to <host>:/var/lib/one//datastores/0/4/disk.0

ssh <Front-end>
ssh <host>
ssh <Front-end> *required password*

Expected behavior
deploy success
or

ssh <Front-end>
ssh <host>
ssh <Front-end> *success*

Details

Affected Component: ??
Hypervisor: LXC
Version: 6.0.0

Additional context
in https://docs.opennebula.io/6.0/open_cluster_deployment/lxc_node/lxc_node_installation.html#step-4-configure-passwordless-ssh written

«Since OpenNebula 5.12. On the Front-end runs dedicated SSH authentication agent service which imports the oneadmin’s private key on its start …. While the authentication agent is used, you don’t need to distribute private SSH key from Front-end to hypervisor Nodes!»

Progress Status

Branch created
Code committed to development branch
Testing — QA
Documentation
Release notes — resolved issues, compatibility, known issues
Code committed to upstream release/hotfix branches
Documentation committed to upstream release/hotfix branches

Configuring repositories FAILED
when i run minione i get this error

Description
strange behavior

Step to Reproduce
Front-end(ubuntu 20.10)

wget 'https://github.com/OpenNebula/minione/releases/latest/download/minione'
echo yes | sudo bash minione -f

3)download alpine(DockerHub) from apps( MarketPlace | DockerHub)

4)deploy (create instances)

Сurrent behavior

sudo docker ps -a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

same situation with «alpine_3.13 — LXD» if set host(Front-end)

Expected behavior
status — PENDING / PROLOG_FAILURE /FAILURE

Additional context
OpenNebula/one#5367 (comment)

add an argument for minione to not export MarketApp by default

wget 'https://github.com/OpenNebula/minione/releases/latest/download/minione'
echo yes | sudo bash minione -f

log

.
.
Exporting [CentOS 7] from Marketplace to local datastore  OK
.
.

Hi Folks,

installation on CentOS 8 Stream, on a VM with one network interface fails.
Here are the output from the script:

[root@opennebula ~]# sudo bash minione

### Checks & detection
Checking cpu virtualization capabilities  SKIP QEMU will be used
Checking augeas is installed  SKIP will try to install
Checking network-scripts are installed  SKIP will try to install
Checking SELinux  SKIP will try to disable
Checking for present ssh key  SKIP
Checking docker is installed  SKIP will try to install
Checking python3-pip is installed  SKIP will try to install
Checking ansible  SKIP will try to install
Checking terraform  SKIP will try to install
Checking unzip is installed  SKIP will try to install

### Main deployment steps:
Install OpenNebula frontend version 6.0
Install Terraform
Install Docker
Disable_firewalld
Configure bridge minionebr with IP 172.16.100.1/24
Enable NAT over ens18
Install OpenNebula KVM node
Export appliance and update VM template
Disable SELinux
Install  augeas network-scripts python3-pip unzip
Install pip 'ansible==2.9.9'

Do you agree? [yes/no]:
yes

### Installation
Disabling SELinux  OK
Install  augeas network-scripts python3-pip unzip  OK
Install from PyPI 'ansible==2.9.9'  retry 1 retry 2 retry 3 FAILED
Collecting ansible==2.9.9
  Using cached https://files.pythonhosted.org/packages/00/5d/e10b83e0e6056dbd5b4809b451a191395175a57e3175ce04e35d9c5fc2a0/ansible-2.9.9.tar.gz
Collecting jinja2 (from ansible==2.9.9)
  Using cached https://files.pythonhosted.org/packages/7e/c2/1eece8c95ddbc9b1aeb64f5783a9e07a286de42191b7204d67b7496ddf35/Jinja2-2.11.3-py2.py3-none-any.whl
Collecting PyYAML (from ansible==2.9.9)
  Using cached https://files.pythonhosted.org/packages/7a/5b/bc0b5ab38247bba158504a410112b6c03f153c652734ece1849749e5f518/PyYAML-5.4.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting cryptography (from ansible==2.9.9)
  Using cached https://files.pythonhosted.org/packages/9b/77/461087a514d2e8ece1c975d8216bc03f7048e6090c5166bc34115afdaa53/cryptography-3.4.7.tar.gz
    Complete output from command python setup.py egg_info:

            =============================DEBUG ASSISTANCE==========================
            If you are seeing an error here please try the following to
            successfully install cryptography:

            Upgrade to the latest pip and try again. This will fix errors for most
            users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip
            =============================DEBUG ASSISTANCE==========================

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-hv1z3m6_/cryptography/setup.py", line 14, in <module>
        from setuptools_rust import RustExtension
    ModuleNotFoundError: No module named 'setuptools_rust'

    ----------------------------------------
--- STDERR ---
WARNING: Running pip install with root privileges is generally not a good idea. Try `pip3 install --user` instead.
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-hv1z3m6_/cryptography/

Any suggestions?

Running Minione script on Xubuntu 20.04, singlehost configuration.
After running «sudo bash minione —purge» all Opennebula packages stay installed. Especially the one that creates «oneadmin» user. So, there is no straightforward chance to install it again.
Also, I think that we could also delete the /var/log/one directory after Opennebula deinstallation.
So please add those last steps to —purge procedure:

apt -yy purge opennebula-*
rm -rf /var/log/one

ghostry

Now even if the -v parameter is added, the information is very little,
The installation progress of apt is completely blocked,
Wait for half an hour in «Install augeas-tools curl apt-transport-https gnupg iptables-persistent netfilter-persistent», I can’t know the progress

Hello,

Installation fails on Ubuntu 20.04.1 LTS, on a server with only one network interface :

### Checks & detection
Checking augeas is installed  SKIP will try to install
Checking curl is installed  SKIP will try to install
Checking bridge-utils are installed  SKIP will try to install
Checking apt-transport-https is installed  SKIP will try to install
Checking AppArmor  SKIP will try to modify
Checking for present ssh key  SKIP
Checking (iptables|netfilter)-persistent are installed  SKIP will try to install

### Main deployment steps:
Install OpenNebula frontend version 5.12
Configure bridge minionebr with IP 172.16.100.1/24
Enable NAT over eno1
Modify AppArmor
Install OpenNebula KVM node
Export appliance and update VM template
Install  augeas-tools curl bridge-utils apt-transport-https iptables-persistent netfilter-persistent

Do you agree? [yes/no]:
yes

### Installation
Updating APT cache  OK
Install  augeas-tools curl bridge-utils apt-transport-https iptables-persistent netfilter-persistent  OK
Creating bridge interface minionebr  OK
Bring bridge interfaces up  OK
Enabling IPv4 forward  OK
Persisting IPv4 forward  OK
Configuring NAT using iptables  OK
Saving iptables changes  OK
Installing DNSMasq  OK
Starting DNSMasq  FAILED

--- STDERR ---
Job for dnsmasq.service failed because the control process exited with error code.
See "systemctl status dnsmasq.service" and "journalctl -xe" for details.

Ouput of systemctl status dnsmasq.service:

Nov 10 18:52:07 my-host systemd[1]: Starting dnsmasq - A lightweight DHCP and caching DNS server...
Nov 10 18:52:07 my-host dnsmasq[2314]: dnsmasq: syntax check OK.
Nov 10 18:52:07 my-host dnsmasq[2315]: dnsmasq: failed to create listening socket for 127.0.0.1: Address already in use
Nov 10 18:52:07 my-host dnsmasq[2315]: failed to create listening socket for 127.0.0.1: Address already in use
Nov 10 18:52:07 my-host dnsmasq[2315]: FAILED to start up
Nov 10 18:52:07 my-host systemd[1]: dnsmasq.service: Control process exited, code=exited, status=2/INVALIDARGUMENT
Nov 10 18:52:07 my-host systemd[1]: dnsmasq.service: Failed with result 'exit-code'.
Nov 10 18:52:07 my-host systemd[1]: Failed to start dnsmasq - A lightweight DHCP and caching DNS server.

It seems that dnsmasq can’t bind to port 53 because it is already in use by named/bind9.
Do you have a solution that does not require to uninstall bind9 ?

grarich

The page reloads repeatedly when you log in to the dashboard.

I installed it by following these steps

wget 'https://github.com/OpenNebula/minione/releases/latest/download/minione'
sudo bash minione
Access the dashboard.

I’m looking for a solution to find out why I can’t install the minione in LXD, having tried a few times. Below the generated error message, and after another message presented when manually running the installation of opennebula-node-lxd.

bash minione —lxd

Installing OpenNebula packages OK
Installing OpenNebula lxd node packages retry 1 retry 2 retry 3 FAILED
apt-get -q -y install opennebula-node-lxd
— STDERR —
E: Sub-process /usr/bin/dpkg returned an error code (1)

sudo apt-get -q -y install opennebula-node-lxd
Reading package lists…
Building dependency tree…
Reading state information…
opennebula-node-lxd is already the newest version (5.12.0.3-1.ce).
0 upgraded, 0 newly installed, 0 to remove and 82 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Setting up opennebula-node-lxd (5.12.0.3-1.ce) …
Error: Storage pool ‘default’ is of type ‘btrfs’ instead of ‘dir’
dpkg: error processing package opennebula-node-lxd (—configure):
installed opennebula-node-lxd package post-installation script subprocess retur ned error exit status 1
Errors were encountered while processing:
opennebula-node-lxd
E: Sub-process /usr/bin/dpkg returned an error code (1)

Источник

Name already in use

docs / source / installation_and_configuration / opennebula_services / troubleshooting.rst

undeploy fails when using ceph system datastore #1314

Failure during undeploy on the same host as sunstone #3010

Progress Status

Logging

Configure Logging System

Additional Resources

OpenNebula Daemon Log Format

Infrastructure Failures

Virtual Machines

Recover from VM Failure

Hypervisor Problems

Transfer Manager / Storage Problems

Hosts

undeploy fails when using ceph system datastore

History

#2 <img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" alt src="https://secure.gravatar.com/avatar/556b7e6d51a6a869c08ce73cf683e80e?rating=PG&amp;size=24&amp;default=" title /> Updated by Vlastimil Holer over 3 years ago

#3 <img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" alt src="https://secure.gravatar.com/avatar/556b7e6d51a6a869c08ce73cf683e80e?rating=PG&amp;size=24&amp;default=" title /> Updated by Vlastimil Holer over 3 years ago

#6 <img decoding="async" onError="javascript: wp_broken_images = window.wp_broken_images || function(){}; wp_broken_images(this);" alt src="https://secure.gravatar.com/avatar/556b7e6d51a6a869c08ce73cf683e80e?rating=PG&amp;size=24&amp;default=" title /> Updated by Vlastimil Holer over 3 years ago

Progress Status

#2

Updated by Vlastimil Holer over 3 years ago

#3

Updated by Vlastimil Holer over 3 years ago

#6

Updated by Vlastimil Holer over 3 years ago