Task error job errors - Исправление ошибок и поиск оптимальных решений проблем

Hello,

I have a VM with qcow2 image but when i try to back it up i get this error below, i already tried qemu-img check with no errors, after that i tried to destroy and restore vm using an older backup that worked but with same error, only relevant thing that i remember is that i updated proxmox (package versions attached below) like 1 hour ago because backing up this vm always worked without a problem with this config.

Thanks for help.

Code:

INFO: starting new backup job: vzdump 980 --remove 0 --storage Nasko --notes-template '{{guestname}}' --mode stop --compress zstd --node Carmen
INFO: Starting Backup of VM 980 (qemu)
INFO: Backup started at 2022-06-09 15:33:49
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: storj
INFO: include disk 'scsi0' 'local:980/vm-980-disk-0.qcow2' 32G
INFO: exclude disk 'scsi2' '/dev/disk/by-id/ata-ST31000333AS_6TE07S11' (backup=no)
INFO: exclude disk 'scsi3' '/dev/disk/by-id/ata-ST1000LM014-1EJ164_W381Y0K1' (backup=no)
INFO: exclude disk 'scsi4' '/dev/disk/by-id/ata-TOSHIBA_MQ01ABD100_X4GKPKXAT' (backup=no)
INFO: exclude disk 'scsi5' '/dev/disk/by-id/ata-WDC_WD1003FZEX-00MK2A0_WD-WMC3F0813056' (backup=no)
INFO: creating vzdump archive '/mnt/pve/Nasko/dump/vzdump-qemu-980-2022_06_09-15_33_49.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task '4459dc13-1b5a-4302-a588-36d6467f00cd'
INFO:   2% (913.0 MiB of 32.0 GiB) in 3s, read: 304.3 MiB/s, write: 140.4 MiB/s
INFO:   4% (1.4 GiB of 32.0 GiB) in 6s, read: 159.7 MiB/s, write: 133.8 MiB/s
INFO:   8% (2.8 GiB of 32.0 GiB) in 9s, read: 482.1 MiB/s, write: 116.8 MiB/s
INFO:  11% (3.8 GiB of 32.0 GiB) in 12s, read: 342.0 MiB/s, write: 124.4 MiB/s
INFO:  24% (7.8 GiB of 32.0 GiB) in 15s, read: 1.3 GiB/s, write: 39.4 MiB/s
INFO:  44% (14.1 GiB of 32.0 GiB) in 18s, read: 2.1 GiB/s, write: 5.9 MiB/s
INFO:  56% (18.2 GiB of 32.0 GiB) in 21s, read: 1.4 GiB/s, write: 91.7 MiB/s
INFO:  70% (22.7 GiB of 32.0 GiB) in 24s, read: 1.5 GiB/s, write: 71.7 MiB/s
INFO:  87% (28.1 GiB of 32.0 GiB) in 27s, read: 1.8 GiB/s, write: 41.2 MiB/s
ERROR: VM 980 not running
INFO: aborting backup job
ERROR: VM 980 not running
VM 980 not running
trying to acquire lock...
 OK
ERROR: Backup of VM 980 failed - VM 980 not running
INFO: Failed at 2022-06-09 15:34:20
INFO: Backup job finished with errors
TASK ERROR: job errors

VM config:

Code:

agent: 1
balloon: 0
boot: order=scsi0;ide2;net0
cores: 4
cpu: host,flags=+pcid;+aes
ide2: none,media=cdrom
memory: 24576
meta: creation-qemu=6.2.0,ctime=1650378494
name: storj
net0: virtio=AE:8B:51:96:EE:AA,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local:980/vm-980-disk-0.qcow2,cache=directsync,discard=on,iothread=1,size=32G,ssd=1
scsi2: /dev/disk/by-id/ata-ST31000333AS_6TE07S11,size=976762584K,backup=0,cache=directsync,iothread=1
scsi3: /dev/disk/by-id/ata-ST1000LM014-1EJ164_W381Y0K1,size=976762584K,backup=0,cache=directsync,iothread=1
scsi4: /dev/disk/by-id/ata-TOSHIBA_MQ01ABD100_X4GKPKXAT,size=976762584K,backup=0,cache=directsync,iothread=1
scsi5: /dev/disk/by-id/ata-WDC_WD1003FZEX-00MK2A0_WD-WMC3F0813056,size=976762584K,backup=0,cache=directsync,iothread=1
scsihw: virtio-scsi-single
smbios1: uuid=dee0c5fe-cd30-4b0b-b0d4-0c6a4e5521d0
sockets: 1
vmgenid: 1ae3cf45-3638-49f7-b0d6-0add848d43da

Package versions:

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.15.35-2-pve: 5.15.35-5
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-9
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Источник

qmp command Backup failed got timeout

wire2hire

Member

when i want make a backup from my vm it gives me a error:

NFO: starting new backup job: vzdump 100 —mode snapshot —remove 0 —node pve01 —storage PBS
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2020-10-20 15:39:43
INFO: status = running
INFO: VM Name: SERVER100
INFO: include disk ‘scsi1’ ‘STORAGE:vm-100-disk-0’ 480G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive ‘vm/100/2020-10-20T13:39:43Z’
INFO: issuing guest-agent ‘fs-freeze’ command
INFO: enabling encryption
INFO: issuing guest-agent ‘fs-thaw’ command
ERROR: VM 100 qmp command ‘backup’ failed — got timeout
ERROR: Backup of VM 100 failed — VM 100 qmp command ‘backup’ failed — got timeout
INFO: Failed at 2020-10-20 15:40:49
INFO: Backup job finished with errors
TASK ERROR: job errors

pve 6.2.-12
pbs: 0.9-1 ( on the same HOST)

1. VM runs
2. VM off
3. agent enable and disable
4. Downgrade pbs to 0.8.16-1

Have anyone the same failure?

Member

when i want make a backup from my vm it gives me a error:

pve 6.2.-12
pbs: 0.9-1 ( on the same HOST)

1. VM runs
2. VM off
3. agent enable and disable
4. Downgrade pbs to 0.8.16-1

Have anyone the same failure?

I had the same error a few days ago. A VM with a Windows Server showed a similar problem. In my case the backup managed to finish, but it went from 45m before the error to 8 hours, with transfer rates for read/write of 10MB/s instead the 180MB/s as in previous days.

Источник

PVE7 / PBS2 — Backup Timeout (qmp command ‘cont’ failed — got timeout)

iprigger

Active Member

I did update from pve6.4 to pve7.0 yesterday and seem to have some problems with backups.

we run on nfs storage and have the pbs server as a VM — which did not cause any troubles in the past but does now.

The error above can be resolved when deleting all old backups but comes back after the 4th or so backup.

It seems to be a bit worse with raw images — but on several tries qcow did fail, too.

any idea?
Tobias

P.S: Just one more info: The «timeout» is instant — so no real delay. looks a bit like something is amiss.

Moayad

Proxmox Staff Member

Have you tried on other storage target?

Please post the VM config and the PVE version

Best regards,
Moayad

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

iprigger

Active Member

Have you tried on other storage target?

Please post the VM config and the PVE version

I only have one pbs volume (4tb), so can’t really test.

iprigger

Active Member

I have found some sort of mitigation to the problem. not a real solution but it seems to help:

1) I did still have some NFS Links on NFSv3 — which seems to be suboptimal with proxmox 7 (looks a bit like some changes in the network stack of the underpinnings do react different)

2) I throtteled the backup a bit in order to have less load during backup.

Seems to have helped.

Still would be cool if I could revert the config to what I did have in the past.

Moayad

Proxmox Staff Member

Thank you for the output,

it seems to be like this issue [0] Could you please try to install the debug packages and post the full backup task when the problem occurs again?

See below links to install the debug package of qemu-server [1]

Best regards,
Moayad

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

New Member

I’ve exactly the same issue since I upgraded to 7. The backup fails every few days on random VMs and then some VMs change the root file system to read-only afterward. Really annoying.

Just for the documentation, here is my output of two different cases:

I’m going to install the debug package and will let you know when I know more

Moayad

Proxmox Staff Member

Please provide us with an output of pveversion -v

The patch fix is available in pve-no-subscription for now [0] pve-qemu-kvm_6.0.0-4_amd64.deb .

Best regards,
Moayad

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

New Member

It seems like I’m already using this version 🙁

I’ll now try to install the debug package

Moayad

Proxmox Staff Member

Best regards,
Moayad

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

New Member

Alright, I installed qemu-server-dbgsym, hard restarted all VMs and after a couple of backup iterations, it failed again. Unfortunately, I don’t see much more information.

Frank666

Active Member

New Member

Two days ago I updated to the latest version, rebooted the whole host and after the first backup, one VM failed again. Unfortunately, the log still doesn’t tell me something interesting.

While searching the needle in the haystack, I just changed the NFS version from 3 to 4. Let’s see whether that changes something. I’ll let you know.

New Member

Funar

New Member

Incidentally, I’m having the same issue, but only on one of my Proxmox clusters.

In my home lab, where the problem exists, I’m running 2 PVE-7 hosts and one host with PBS-1.1 (haven’t upgraded yet). The PVE host VM and container storage connect via LVM over iSCSI on 10Gbe. I have about 9 QEMU VMs running on this and a few LXC containers. During backups, I get the occasional «qmp command ‘cont’ failed — got timeout.» Sometimes this results in the guest remounting its filesystem read-only, but not every time.

In my production system, I have 9 PVE-7 hosts using CEPH-RBD storage for VMs and LXC. I have zero issues with the «qmp command ‘cont’ failed — got timeout» there. Not even one.

Possible storage issue?

New Member

Just for the protocol, the backup on the latest version failed as well.

sztanpet

New Member

Hi!
I am also seeing this issue,
Currently every automatic backup fails on a single vm (800gb),
sometimes manual backup also fails, but on-retry it works (the vm and the proxmox node is mostly idle),
while on a smaller vm (100gb) it works every time.
The bigger vm has a storage volume from another pool that is on rotating media, while the majority of the data is on nvme storage.

The error from a manual backup run failing:

nielsnl

New Member

Same issue here: qmp command ‘cont’ failed — got timeout

Oddly, it happens with only 1 specific VM. All the other ones backup just fine (to PBS). The only difference is that this particular VM is the «busiest» one, handling more network connections and having more filesystem writes.

I’m not sure if this is related as well:

without qemu guest agent, I simply get this backup error and the VM continues operation as normal.
with qemu guest agent the VM goes into some kind of semi-hang where some processes are stuck, others continu to operate normally.

I fired up a snapshot of this VM on another host and it can do the backup just fine. Of course that one lacks actual activity.

Also curious is that making snapshots works fine, but making backups fails. Both very consistently so. (On rare occasion the backup will succeed.)

leen15

New Member

Hello, same issue here.

# pveversion —verbose
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
ceph-fuse: 15.2.14-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Источник

Problem Backups Proxmox 6.1

Cortesano

Member

Hi, I have performed a new installation Proxmox 6.1 and I have problems with container backups. The Storage backups is NAS NFS, The problem also repeat in hdd local and external. only works when the container is off.

Container off = Backups OK
Container on = Backups Error

Task viewer Backup error

Proxmox Retired Staff

can you post the container configuration as well? ( pct config CTID )

Best regards,
Oguz

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

Cortesano

Member

can you post the container configuration as well? ( pct config CTID )

# pct config 100
arch: amd64
cores: 1
hostname: dataserver
memory: 2048
net0: name=eth0,bridge=vmbr0,gw=192.168.20.1,hwaddr=2A:9C:BE:E9:B2:F9,ip=192.168.20.11/24,type=veth
ostype: debian
rootfs: vms:vm-100-disk-0,size=600G
swap: 512

Proxmox Retired Staff

the two outputs in your first post are from different containers, which doesn’t tell us too much about what could be going on.

in either case i suspect it could be storage related

could you please post:

* backup output of container 100 when it’s on/off
* same for CT 101
* contents of /etc/pve/storage.conf so we can see what kind of storage you use

Best regards,
Oguz

Do you already have a Commercial Support Subscription? — If not, Buy now and read the documentation

Источник

backup failed .

PaulVM

Active Member

Fresh VPE 7.x.
2 LXC container
I could do standard backup until yesterday, now it fails any time I try.
Fails on local storage, on mounted dir and on PBS.

A couple of failed log:

Header Proxmox Virtual Environment 7.1-10 Datacenter Some guests are not covered by any backup job. Logs () INFO: starting new backup job: vzdump 410722 —node srv2113 —mode snapshot —compress zstd —mailto staff@domain.tld —storage local —all 0 —mailnotification always INFO: Starting Backup of VM 410722 (lxc) INFO: Backup started at 2022-02-10 21:37:04 INFO: status = running INFO: CT Name: d410722.domain.tld INFO: including mount point rootfs (‘/’) in backup INFO: excluding bind mount point mp0 (‘/var/www’) from backup (not a volume) INFO: excluding bind mount point mp1 (‘/tmp’) from backup (not a volume) INFO: mode failure — some volumes do not support snapshots INFO: trying ‘suspend’ mode instead INFO: backup mode: suspend INFO: ionice priority: 7 INFO: CT Name: d410722.domain.tld INFO: including mount point rootfs (‘/’) in backup INFO: excluding bind mount point mp0 (‘/var/www’) from backup (not a volume) INFO: excluding bind mount point mp1 (‘/tmp’) from backup (not a volume) INFO: starting first sync /proc/3442/root/ to /var/lib/vz/dump/vzdump-lxc-410722-2022_02_10-21_37_04.tmp INFO: first sync finished — transferred 6.42G bytes in 49s INFO: suspending guest INFO: starting final sync /proc/3442/root/ to /var/lib/vz/dump/vzdump-lxc-410722-2022_02_10-21_37_04.tmp INFO: resume vm INFO: guest is online again after 2 seconds ERROR: Backup of VM 410722 failed — command ‘rsync —stats -h -X -A —numeric-ids -aH —delete —no-whole-file —inplace —one-file-system —relative ‘—exclude=/tmp/?*’ ‘—exclude=/var/tmp/?*’ ‘—exclude=/var/run/?*.pid’ ‘—exclude=/var/www’ ‘—exclude=/tmp’ /proc/3442/root//./ /var/lib/vz/dump/vzdump-lxc-410722-2022_02_10-21_37_04.tmp’ failed: exit code 23 INFO: Failed at 2022-02-10 21:37:58 INFO: Backup job finished with errors TASK ERROR: job errors Header Proxmox Virtual Environment 7.1-10 Datacenter Some guests are not covered by any backup job. Logs () INFO: starting new backup job: vzdump 410722 —mode snapshot —node srv2113 —storage pbs2110 —all 0 —mailnotification always —mailto staff@domain.tld INFO: Starting Backup of VM 410722 (lxc) INFO: Backup started at 2022-02-10 22:08:26 INFO: status = running INFO: CT Name: d410722.domain.tld INFO: including mount point rootfs (‘/’) in backup INFO: excluding bind mount point mp0 (‘/var/www’) from backup (not a volume) INFO: excluding bind mount point mp1 (‘/tmp’) from backup (not a volume) INFO: mode failure — some volumes do not support snapshots INFO: trying ‘suspend’ mode instead INFO: backup mode: suspend INFO: ionice priority: 7 INFO: CT Name: d410722.domain.tld INFO: including mount point rootfs (‘/’) in backup INFO: excluding bind mount point mp0 (‘/var/www’) from backup (not a volume) INFO: excluding bind mount point mp1 (‘/tmp’) from backup (not a volume) INFO: starting first sync /proc/3442/root/ to /var/tmp/vzdumptmp19034_410722 INFO: first sync finished — transferred 6.42G bytes in 24s INFO: suspending guest INFO: starting final sync /proc/3442/root/ to /var/tmp/vzdumptmp19034_410722 INFO: resume vm INFO: guest is online again after 2 seconds ERROR: Backup of VM 410722 failed — command ‘rsync —stats -h -X -A —numeric-ids -aH —delete —no-whole-file —inplace —one-file-system —relative ‘—exclude=/tmp/?*’ ‘—exclude=/var/tmp/?*’ ‘—exclude=/var/run/?*.pid’ ‘—exclude=/var/www’ ‘—exclude=/tmp’ /proc/3442/root//./ /var/tmp/vzdumptmp19034_410722′ failed: exit code 23 INFO: Failed at 2022-02-10 22:08:58 INFO: Backup job finished with errors TASK ERROR: job errors

PVE updated (same problem before last updates):

# pveversion —verbose proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-10 pve-kernel-5.13: 7.1-7 pve-kernel-5.4: 6.4-11 pve-kernel-5.13.19-4-pve: 5.13.19-9 pve-kernel-5.13.19-3-pve: 5.13.19-7 pve-kernel-5.13.19-2-pve: 5.13.19-4 pve-kernel-5.4.157-1-pve: 5.4.157-1 ceph-fuse: 14.2.21-1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: 0.8.36+pve1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-2 libpve-guest-common-perl: 4.0-3 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-1 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-1 proxmox-backup-client: 2.1.5-1 proxmox-backup-file-restore: 2.1.5-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-5 pve-cluster: 7.1-3 pve-container: 4.1-3 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-5 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.1-1 pve-xtermjs: 4.16.0-1 pve-zsync: 2.2.1 qemu-server: 7.1-4 smartmontools: 7.2-pve2 spiceterm: 3.2-2 swtpm: 0.7.0

rc1+2 vncterm: 1.7-1 zfsutils-linux: 2.1.2-pve1

Container config (tried to change RAM and cores without differences)
arch: amd64 cores: 6 features: nesting=1 hostname: d410722.apf.it memory: 16000 mp0: /var/lib/vz/WWW/410722,mp=/var/www mp1: /var/lib/vz/WWW/410722TMP,mp=/tmp net0: name=eth0,bridge=vmbr9,firewall=1,gw=192.168.109.1,hwaddr=4E:E0:43:F2:01:54,ip=192.168.109.154/24,type=veth ostype: debian rootfs: local:410722/vm-410722-disk-0.raw,size=15G swap: 512

In PVE logs I have simply:

Feb 10 22:50:23 srv2113 pvedaemon[1596]: starting task UPID:srv2113:00008402:0008BDAF:6205889F:vzdump:410722: root@pam: Feb 10 22:50:23 srv2113 pvedaemon[33794]: INFO: starting new backup job: vzdump 410722 —remove 0 —node srv2113 —compr ess zstd —mode snapshot —storage local Feb 10 22:50:23 srv2113 pvedaemon[33794]: INFO: Starting Backup of VM 410722 (lxc) Feb 10 22:51:05 srv2113 pvedaemon[33794]: ERROR: Backup of VM 410722 failed — command ‘rsync —stats -h -X -A —numeric- ids -aH —delete —no-whole-file —inplace —one-file-system —relative ‘—exclude=/tmp/?*’ ‘—exclude=/var/tmp/?*’ ‘—e xclude=/var/run/?*.pid’ ‘—exclude=/var/www’ ‘—exclude=/tmp’ /proc/3442/root//./ /var/lib/vz/dump/vzdump-lxc-410722-202 2_02_10-22_50_23.tmp’ failed: exit code 23 Feb 10 22:51:05 srv2113 pvedaemon[33794]: INFO: Backup job finished with errors Feb 10 22:51:05 srv2113 pvedaemon[33794]: job errors Feb 10 22:51:05 srv2113 pvedaemon[1596]: end task UPID:srv2113:00008402:0008BDAF:6205889F:vzdump:410722:root@ pam: job errors

Источник

1 / 1 / 1

Регистрация: 15.01.2013

Сообщений: 128

Ошибка при создании бэкапа виртуальной машины

05.07.2017, 19:27. Показов 6362. Ответов 4

Имеется Proxmox 4.1 При создании бэкапа виртуальной машины вылетает следующее:

Код

INFO: starting new backup job: vzdump 101 --mode suspend --storage local --mailnotification always --quiet 1 --compress lzo 
INFO: Starting Backup of VM 101 (qemu) 
INFO: status = running 
INFO: update VM 101: -lock backup 
INFO: backup mode: suspend 
INFO: ionice priority: 7 
INFO: suspend vm 
INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-101-2017_07_04-21_45_01.vma.lzo' 
INFO: started backup task '02ea7f92-1f94-4a54-b166-531f24a26d4b' 
INFO: status: 0% (866648064/2211908157440), sparse 0% (107388928), duration 3, 288/253 MB/s 
INFO: status: 1% (22316187648/2211908157440), sparse 0% (3417477120), duration 103, 214/181 MB/s 
INFO: status: 2% (44366299136/2211908157440), sparse 0% (3441803264), duration 221, 186/186 MB/s 
INFO: status: 3% (66442231808/2211908157440), sparse 0% (3447193600), duration 344, 179/179 MB/s 
INFO: status: 4% (88527994880/2211908157440), sparse 0% (3477245952), duration 478, 164/164 MB/s 
INFO: status: 5% (110725693440/2211908157440), sparse 0% (5363281920), duration 604, 176/161 MB/s 
INFO: status: 6% (132794810368/2211908157440), sparse 0% (5363507200), duration 844, 91/91 MB/s 
lzop: No space left on device: <stdout> 
INFO: status: 6% (133114101760/2211908157440), sparse 0% (5363507200), duration 885, 7/7 MB/s 
ERROR: vma_queue_write: write error - Broken pipe 
INFO: aborting backup job 
INFO: resume vm 
INFO: vm is online again after 893 seconds 
ERROR: Backup of VM 101 failed - vma_queue_write: write error - Broken pipe 
INFO: Backup job finished with errors 
TASK ERROR: job errors

может кто то сталкивался с таким?!

__________________
Помощь в написании контрольных, курсовых и дипломных работ, диссертаций здесь

Источник

title	description	ms.topic	ms.date
Check for job and task errors	Learn about errors to check for and how to troubleshoot jobs and tasks.	how-to	09/08/2021

Job and task error checking

There are various errors that can occur when adding jobs and tasks. Detecting failures for these operations is straightforward because any failures are returned immediately by the API, CLI, or UI. However, there are also failures that can happen later, when jobs and tasks are scheduled and run.

This article covers the errors that can occur after jobs and tasks are submitted and how to check for and handle them.

Jobs

A job is a grouping of one or more tasks, with the tasks actually specifying the command lines to be run.

When adding a job, the following parameters can be specified which can influence how the job can fail:

Job Constraints
- The maxWallClockTime property can optionally be specified to set the maximum amount of time a job can be active or running. If exceeded, the job will be terminated with the terminateReason property set in the executionInfo for the job.
Job Preparation Task
- If specified, a job preparation task is run the first time a task is run for a job on a node. The job preparation task can fail, which will lead to the task not being run and the job not completing.
Job Release Task
- A job release task can only be specified if a job preparation task is configured. When a job is being terminated, the job release task is run on the each of pool nodes where a job preparation task was run. A job release task can fail, but the job will still move to a completed state.

Job properties

The following job properties should be checked for errors:

‘executionInfo’:
- The terminateReason property can have values to indicate that the maxWallClockTime, specified in the job constraints, was exceeded and therefore the job was terminated. It can also be set to indicate a task failed if the job onTaskFailure property was set appropriately.
- The schedulingError property is set if there has been a scheduling error.

Job preparation tasks

If a job preparation task is specified for a job, then an instance of that task will be run the first time a task for the job is run on a node. The job preparation task configured on the job can be thought of as a task template, with multiple job preparation task instances being run, up to the number of nodes in a pool.

The job preparation task instances should be checked to determine if there were errors:

When a job preparation task is run, then the task that triggered the job preparation task will move to a state of preparing; if the job preparation task then fails, the triggering task will revert to the active state and will not be run.
All the instances of the job preparation task that have been run can be obtained from the job using the List Preparation and Release Task Status API. As with any task, there is execution information available with properties such as failureInfo, exitCode, and result.
If job preparation tasks fail, then the triggering job tasks will not be run, the job will not complete and will be stuck. The pool may go unutilized if there are no other jobs with tasks that can be scheduled.

Job release tasks

If a job release task is specified for a job, then when a job is being terminated, an instance of the job release task is run on each pool node where a job preparation task was run. The job release task instances should be checked to determine if there were errors:

All the instances of the job release task being run can be obtained from the job using the API List Preparation and Release Task Status. As with any task, there is execution information available with properties such as failureInfo, exitCode, and result.
If one or more job release tasks fail, then the job will still be terminated and move to a completed state.

Tasks

Job tasks can fail for multiple reasons:

The task command line fails, returning with a non-zero exit code.
There are resourceFiles specified for a task, but there was a failure that meant one or more files didn’t download.
There are outputFiles specified for a task, but there was a failure that meant one or more files didn’t upload.
The elapsed time for the task, specified by the maxWallClockTime property in the task constraints, was exceeded.

In all cases the following properties must be checked for errors and information about the errors:

The tasks executionInfo property contains multiple properties that provide information about an error. result indicates if the task failed for any reason, with exitCode and failureInfo providing more information about the failure.
The task will always move to the completed state, independent of whether it succeeded or failed.

The impact of task failures on the job and any task dependencies must be considered. The exitConditions property can be specified for a task to configure an action for dependencies and for the job.

For dependencies, DependencyAction controls whether the tasks dependent on the failed task are blocked or are run.
For the job, JobAction controls whether the failed task leads to the job being disabled, terminated, or left unchanged.

Task command line failures

When the task command line is run, output is written to stderr.txt and stdout.txt. In addition, the application may write to application-specific log files.

If the pool node on which a task has run still exists, then the log files can be obtained and viewed. For example, the Azure portal lists and can view log files for a task or a pool node. Multiple APIs also allow task files to be listed and obtained, such as Get From Task.

Since pools and pool nodes are frequently ephemeral, with nodes being continuously added and deleted, we recommend saving log files. Task output files are a convenient way to save log files to Azure Storage.

The command lines executed by tasks on compute nodes do not run under a shell, so they can’t natively take advantage of shell features such as environment variable expansion. To take advantage of such features, you must invoke the shell in the command line.

Output file failures

On every file upload, Batch writes two log files to the compute node, fileuploadout.txt and fileuploaderr.txt. You can examine these log files to learn more about a specific failure. In cases where the file upload was never attempted, for example because the task itself couldn’t run, then these log files will not exist.

Next steps

Check that your application implements comprehensive error checking; it can be critical to promptly detect and diagnose issues.
Learn more about jobs and tasks and job preparation and release tasks.

Источник

qmp command Backup failed got timeout

wire2hire

Member

Member

PVE7 / PBS2 — Backup Timeout (qmp command ‘cont’ failed — got timeout)

iprigger

Active Member

Moayad

Proxmox Staff Member

iprigger

Active Member

iprigger

Active Member

Moayad

Proxmox Staff Member

New Member

Moayad

Proxmox Staff Member

New Member

Moayad

Proxmox Staff Member

New Member

Frank666

Active Member

New Member

New Member

Funar

New Member

New Member

sztanpet

New Member

nielsnl

New Member

leen15

New Member

Problem Backups Proxmox 6.1

Cortesano

Member

Proxmox Retired Staff

Cortesano

Member

Proxmox Retired Staff

backup failed .

PaulVM

Active Member

Ошибка при создании бэкапа виртуальной машины

Job and task error checking

Jobs

Job properties

Job preparation tasks

Job release tasks

Tasks

Task command line failures

Output file failures

Next steps

Читайте также: