Blk update request critical target error - Исправление ошибок и поиск оптимальных решений проблем

As I can’t reopen #703 so it’s more «visible», I open a new thread and copy the last post. I apologize if this isn’t the right way of doing this.

I think this issue have risen again, the only difference I see is that the log message changed from «end_request» to «blk_update_request», but behaviour seems the same:

Apr 14 13:00:39 raspberrypi kernel: [  561.226469] blk_update_request: critical target error, dev sda, sector 404211137
Apr 14 13:00:39 raspberrypi kernel: [  561.226579] Aborting journal on device sda2-8.
Apr 14 13:00:40 raspberrypi kernel: [  563.043697] EXT4-fs error (device sda2): ext4_journal_check_start:56: Detected aborted journal
Apr 14 13:00:40 raspberrypi kernel: [  563.057031] EXT4-fs (sda2): Remounting filesystem read-only
Apr 14 13:00:51 raspberrypi kernel: [  574.083665] EXT4-fs error (device sda2): ext4_put_super:789: Couldn't clean up the journal

If I try to fsck:

Apr 14 12:45:24 raspberrypi kernel: [ 1603.653290] EXT4-fs error (device sda2): ext4_put_super:789: Couldn't clean up the journal
Apr 14 12:45:30 raspberrypi kernel: [ 1610.138494] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:45:30 raspberrypi kernel: [ 1610.197294] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:45:30 raspberrypi kernel: [ 1610.203760] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.278773] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.282161] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.283998] blk_update_request: critical target error, dev sda, sector 0

Based on my tests, last working commit is Hexxeh/rpi-firmware@f74b921

Broken from Hexxeh/rpi-firmware@cad071a and onwards.

Источник

First off: It is NOT your fault. It just shows that updates, without backups, are dangerous on ANY OS and no matter how often it worked before.

I had exactly the same problem today on Debian 9.

A whole ext3 RAID1 «vanished» after kernel was updated from:

linux-image-4.9.0-11-amd64                        4.9.189-3+deb9u2

linux-image-4.9.0-12-amd64                        4.9.210-1

list all installed kernels

dpkg --list | grep linux-image
ii  linux-image-4.9.0-11-amd64                        4.9.189-3+deb9u2                            amd64        Linux 4.9 for 64-bit PCs
ii  linux-image-4.9.0-12-amd64                        4.9.210-1                                   amd64        Linux 4.9 for 64-bit PCs
rc  linux-image-4.9.0-6-amd64                         4.9.88-1+deb9u1                             amd64        Linux 4.9 for 64-bit PCs
rc  linux-image-4.9.0-8-amd64                         4.9.144-3.1                                 amd64        Linux 4.9 for 64-bit PCs
ii  linux-image-4.9.0-9-amd64                         4.9.168-1+deb9u3                            amd64        Linux 4.9 for 64-bit PCs
ii  linux-image-amd64                                 4.9+80+deb9u10                              amd64        Linux for 64-bit PCs (meta-package)

hostnamectl; # os used
   Static hostname: storagepc
         Icon name: computer-desktop
           Chassis: desktop
  Operating System: Debian GNU/Linux 9 (stretch)
            Kernel: Linux 4.9.0-12-amd64
      Architecture: x86-64

Those are the kind of «heart attack» moments X-D

Let’s try to stay cool!

«solution»: boot previous kernel ( in this case: linux-image-4.9.0-11-amd64 )

vim /etc/default/grub

GRUB_TIMEOUT=3 <- make sure a timeout larger than 0 is defined (or no time to select any options during boot)

# let grub2 do its stuff
update-grub
# is the same as:
uupdate-grub2

# reboot the system (if USB keyboard is not reacting during grub boot screen, try PS2 keyboard)
reboot

# when grub boot screen appears

After booting linux-image-4.9.0-11-amd64 kernel, can access ext3 RAID1 AGAIN!

Problem: grub won’t remember that choice.

To make this permanent:

vim /etc/default/grub

# during boot:
## select in the first menu the second (0,1) entry
#### then select in the second menu select the 3rd entry (0,1,2)
GRUB_DEFAULT="1>2"

# make grub2 realize the changes
update-grub

… yes it is confusing I know X-D

this is what it was supposed to look like

Have two RAID1 defined.

# show status of raid
cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md126 : active raid1 sdc1[1] sdb1[0]
      3906886464 blocks super 1.2 [2/2] [UU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

md127 : active raid1 sde1[0] sdd1[2]
      1953381376 blocks super 1.2 [2/2] [UU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

# show what is mounted
mount
/dev/md126 on /media/user/ext4RAID1 type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/md127 on /media/user/ext3RAID1 type ext3 (rw,relatime,data=ordered)

# show block devices
lsblk 
NAME      MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
fd0         2:0    1     4K  0 disk  
sda         8:0    0 238.5G  0 disk  
├─sda1      8:1    0 230.8G  0 part  /
├─sda2      8:2    0     1K  0 part  
└─sda5      8:5    0   7.7G  0 part  [SWAP]
sdb         8:16   0   3.7T  0 disk  
└─sdb1      8:17   0   3.7T  0 part  
  └─md126   9:126  0   3.7T  0 raid1 /media/user/ext4RAID1
sdc         8:32   0   3.7T  0 disk  
└─sdc1      8:33   0   3.7T  0 part  
  └─md126   9:126  0   3.7T  0 raid1 /media/user/ext4RAID1
sdd         8:48   0   1.8T  0 disk  
└─sdd1      8:49   0   1.8T  0 part  
  └─md127   9:127  0   1.8T  0 raid1 /media/user/ext3RAID1
sde         8:64   0   1.8T  0 disk  
└─sde1      8:65   0   1.8T  0 part  
  └─md127   9:127  0   1.8T  0 raid1 /media/user/ext3RAID1
sr0        11:0    1  1024M  0 rom 


# find defined raids
mdadm --examine --scan
ARRAY /dev/md/2  metadata=1.2 UUID=90642755:fa191325:0fe4ec59:2456c645 name=storagepc:2
ARRAY /dev/md/1  metadata=1.2 UUID=433fb7e1:9d7f3f17:bc5ee18b:0f4eeb52 name=storagepc:1

# show UUIDS
blkid /dev/sdb1
/dev/sdb1: UUID="90642755-fa19-1325-0fe4-ec592456c645" UUID_SUB="bee458e0-509a-c110-b577-8a1ddbe6bbb3" LABEL="storagepc:2" TYPE="linux_raid_member" PARTUUID="1fd02041-9dd2-4918-83a3-c8bafbab3bed"

blkid /dev/sdc1
/dev/sdc1: UUID="90642755-fa19-1325-0fe4-ec592456c645" UUID_SUB="7d5947f8-1ba0-0c7b-18a7-194ab4051a2c" LABEL="storagepc:2" TYPE="linux_raid_member" PARTUUID="5e4ea781-68e5-43f0-accf-26342aeb4daa"

userblkid /dev/sdd1
/dev/sdd1: UUID="433fb7e1-9d7f-3f17-bc5e-e18b0f4eeb52" UUID_SUB="bed17780-3817-27c9-6336-44d4aedfb857" LABEL="storagepc:1" TYPE="linux_raid_member" PARTUUID="f6aab6c2-01"

userblkid /dev/sde1
/dev/sde1: UUID="433fb7e1-9d7f-3f17-bc5e-e18b0f4eeb52" UUID_SUB="eb90b361-94d6-2f38-7727-d386097dce81" LABEL="storagepc:1" TYPE="linux_raid_member" PARTUUID="d2fd127f-01"

regular filesystem checks

Has nothing to do with the problem but defining this via tune2fs has the advantage, that it will automatically be performed during boot.

tune2fs -C 2 -c 1 /dev/sda1; # check filesystem on every boot (for ext3 takes rather long X-D)
tune2fs -c 10 -i 30 /dev/sda1; # check sda1 every 10 mounts or after 30 days

Источник

I just ordered a new server with a 1TB Samsung SSD. Installed Ubuntu 14.04.5 LTS.

After booting into the newly installed system, I see this in my dmesg and /var/lib/syslog. Output of grep error /var/log/syslog:

May 12 03:47:34 lf5 kernel: [    0.373789] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:47:34 lf5 kernel: [   10.382147] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.382152]          res 40/00:e0:f8:69:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   10.712517] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   10.712521]          res 40/00:d0:38:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.119541] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.119545]          res 40/00:40:30:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526336] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 03:47:34 lf5 kernel: [   11.526341]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526345]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526348]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   11.526351]          res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [   21.349950] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 03:51:10 lf5 kernel: [    0.389787] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:51:10 lf5 kernel: [   10.906423] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   10.906429]          res 40/00:80:08:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.488276] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.488281]          res 40/00:c0:28:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   11.960792] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   11.960796]          res 40/00:b8:b0:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   12.366482] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [   12.366486]          res 40/00:60:e0:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [   20.918620] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:07:19 lf5 kernel: [    0.390011] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:07:19 lf5 kernel: [   10.349119] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.349124]          res 40/00:88:a8:6d:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   10.738449] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   10.738453]          res 40/00:20:60:6b:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.072972] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.072976]          res 40/00:60:50:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   11.471777] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [   11.471781]          res 40/00:48:c8:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [   20.651217] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:18:16 lf5 kernel: [    0.389808] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:18:17 lf5 kernel: [   10.762352] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   10.762360]          res 40/00:40:08:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   11.338565]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338569]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338572]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   11.338576]          res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [   20.087229]          res 41/84:08:b8:14:7d/00:00:63:00:00/00 Emask 0x410 (ATA bus error) <F>
May 12 17:18:17 lf5 kernel: [   20.298295] ata8.00: error: { ICRC ABRT }
May 12 17:18:17 lf5 kernel: [   21.176551] sd 7:0:0:0: [sda] tag#0 Add. Sense: Scsi parity error
May 12 17:18:17 lf5 kernel: [   21.316632] blk_update_request: I/O error, dev sda, sector 1669074520
May 12 17:18:17 lf5 kernel: [   21.542013] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [   21.759477]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.052681]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.347138]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.642363]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   22.938868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.239764]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.542336]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   23.840288]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.138769]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.439063]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   24.740494]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.047057]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.354884]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.662079]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   25.967498]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.273208]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.579035]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   26.884890]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.190868]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.496523]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   27.801825]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.106876]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.412223]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   28.717662]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.022620]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.326675]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.629826]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   29.932271]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.234666]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   30.537024]          res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   31.765128] blk_update_request: I/O error, dev sda, sector 1669071496
May 12 17:18:17 lf5 kernel: [   32.143969] blk_update_request: I/O error, dev sda, sector 1669071504
May 12 17:18:17 lf5 kernel: [   32.527171] blk_update_request: I/O error, dev sda, sector 1669071512
May 12 17:18:17 lf5 kernel: [   32.915371] blk_update_request: I/O error, dev sda, sector 1669071544
May 12 17:18:17 lf5 kernel: [   33.308218] blk_update_request: I/O error, dev sda, sector 1669071552
May 12 17:18:17 lf5 kernel: [   33.706503] blk_update_request: I/O error, dev sda, sector 1669071520
May 12 17:18:17 lf5 kernel: [   34.108892] blk_update_request: I/O error, dev sda, sector 1669071528
May 12 17:18:17 lf5 kernel: [   34.516541] blk_update_request: I/O error, dev sda, sector 1669071536
May 12 17:18:17 lf5 kernel: [   34.929267] blk_update_request: I/O error, dev sda, sector 1669071368
May 12 17:18:17 lf5 kernel: [   35.347838] blk_update_request: I/O error, dev sda, sector 1669071376
May 12 17:18:17 lf5 kernel: [   36.004437]          res 41/04:a8:90:d2:89/00:00:5f:00:00/00 Emask 0x401 (device error) <F>
May 12 17:18:17 lf5 kernel: [   36.257143] ata8.00: error: { ABRT }
May 12 17:18:17 lf5 kernel: [   37.681581] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 17:18:17 lf5 kernel: [   37.681586]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681590]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681593]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681596]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681599]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681602]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681605]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681608]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681611]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681615]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681618]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681621]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681624]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681627]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681630]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681633]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681636]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681639]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681642]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681645]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681649]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681652]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   37.681655]          res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [   38.005003] blk_update_request: I/O error, dev sda, sector 1891370112
May 12 17:18:17 lf5 kernel: [   38.005009] blk_update_request: I/O error, dev sda, sector 1891370120
May 12 17:18:17 lf5 kernel: [   38.005013] blk_update_request: I/O error, dev sda, sector 1891370128
May 12 17:18:17 lf5 kernel: [   38.005017] blk_update_request: I/O error, dev sda, sector 1891370136
May 12 17:18:17 lf5 kernel: [   38.005021] blk_update_request: I/O error, dev sda, sector 1891370144
May 12 17:18:17 lf5 kernel: [   38.005025] blk_update_request: I/O error, dev sda, sector 1891370152
May 12 17:18:17 lf5 kernel: [   38.005029] blk_update_request: I/O error, dev sda, sector 1891370160
May 12 17:18:17 lf5 kernel: [   38.005032] blk_update_request: I/O error, dev sda, sector 1891370168
May 12 17:18:17 lf5 kernel: [   38.005036] blk_update_request: I/O error, dev sda, sector 1891370176
May 12 17:18:17 lf5 kernel: [   38.005040] blk_update_request: I/O error, dev sda, sector 1891370184
May 12 17:18:17 lf5 kernel: [   49.093973] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro

I am mostly concerned about these entries: blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx

I ran badblocks -v /dev/sda which returned no errors.

I then ran smartctl --all /dev/sda, which also returned no errors. See output below. This one includes a short self test

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Samsung SSD 850 EVO 1TB
Serial Number:    S3PHNF0JC00710K
LU WWN Device Id: 5 002538 d428254a0
Firmware Version: EMT03B6Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat May 12 19:08:22 2018 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 512) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       8
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       31
177 Wear_Leveling_Count     0x0013   100   100   000    Pre-fail  Always       -       0
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   099   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   069   067   000    Old_age   Always       -       31
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   099   099   000    Old_age   Always       -       20
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       25
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       55078112

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         8         -

SMART Selective self-test log data structure revision number 1
 SPAN    MIN_LBA    MAX_LBA  CURRENT_TEST_STATUS
    1          0          0  Not_testing
    2          0          0  Not_testing
    3          0          0  Not_testing
    4          0          0  Not_testing
    5          0          0  Not_testing
  255  116055040  116120575  Read_scanning was never started
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

My question is simple: What do you think might be wrong? The SSD should be brand new. It’s hard for me, in good conscience, to put this server into production with those errors in the logs. And the box is otherwise acting normal.

Источник

It is odd… this array was working well in Windows on the pre-flashed Perc h710p, I’m not saying you’re wrong but it would be very coincidental for the cables to go between switching OSs. Plus the Preclean scripts had no issues erasing the disks nor writing zeros across all sectors.

When I tail the /var/log/syslog and attempt to add a disk to the array, there is some output with this block seeming to be the most relevant:

Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 Sense Key : 0x7 [current] 
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 ASC=0x20 ASCQ=0x2 
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 CDB: opcode=0x28 28 00 00 00 00 00 00 00 20 00
Apr 19 15:24:11 Tower kernel: blk_update_request: critical target error, dev sdh, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 Sense Key : 0x7 [current] 
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 ASC=0x20 ASCQ=0x2 
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
Apr 19 15:24:11 Tower kernel: blk_update_request: critical target error, dev sdh, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 19 15:24:11 Tower kernel: Buffer I/O error on dev sdh, logical block 0, async page read
Apr 19 15:24:11 Tower emhttpd: error: ckmbr, 2197: Input/output error (5): read: /dev/sdh
Apr 19 15:24:11 Tower emhttpd: ckmbr error: -1

The iDRAC on this box (pre flash) used to show seven out of the eight disks as «secured», I wonder if there isn’t some lockdown that Dell has in place that ensures genuine disks or something similar.

Источник

A friend of mine gave me his external 2TB Seagate HDD which appeared to be somewhat faulty.
And, it is indeed pretty faulty.

First, I did try a lot of «common» commands, spent a few hours googling stuff, tried Linux and Windows (for chkdsk), opened the HDD case to plug it directly in SATA and I’ll add that I do not need to recover the data, I just need to format it.

lsblk

NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda            8:0    0   1,8T  0 disk

Here, sda is the disk, its size, 1,8T seems correct.

In GParted, the disk only appears to be ~1.9GB. I can create a partition table but I cannot create a valid partition. And even if I could, it could only be 1.9GB.

dd if=/dev/zero of=/dev/sda

dd: error writing '/dev/sda': No space left on device
3782129+0 records in
3782128+0 records out
1936449536 bytes (1,9 GB, 1,8 GiB) copied, 7,04022 s, 275 MB/s

smartctl -a /dev/sda

Read Device Identity failed: Invalid argument

parted -l

Error: Unable to open /dev/sda - unrecognised disk label.   
Model:  (file)                                                           
Disk /dev/sda : 1936MB
Sector size (logical/physical): 512B/512B
Partition table : unknown

dmesg

[ 7925.612174] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7925.862625] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 7931.193045] sd 2:0:0:0: [sda] 3809353968 512-byte logical blocks: (1.95 TB/1.77 TiB)
[ 7931.193049] sd 2:0:0:0: [sda] 4096-byte physical blocks
[ 7931.193313] sd 2:0:0:0: [sda] Write Protect is off
[ 7931.193316] sd 2:0:0:0: [sda] Mode Sense: 2f 00 00 00
[ 7931.193593] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7931.193995] sd 2:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[ 7931.390515] sd 2:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.390523] sd 2:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current] 
[ 7931.390529] sd 2:0:0:0: [sda] tag#18 Add. Sense: Invalid command operation code
[ 7931.390536] sd 2:0:0:0: [sda] tag#18 CDB: Read(6) 08 00 00 00 08 00
[ 7931.390545] blk_update_request: critical target error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7931.390558] Buffer I/O error on dev sda, logical block 0, async page read
[ 7931.500384] sd 2:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.500451] sd 2:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current] 
[ 7931.500461] sd 2:0:0:0: [sda] tag#19 Add. Sense: Invalid command operation code
[ 7931.500472] sd 2:0:0:0: [sda] tag#19 CDB: Read(6) 08 00 00 00 08 00

Do you have any idea? I guess the HDD may be dead, but I’m not quite sure.
What I find intriguing is the 1.8TB size with lsblk and 1.9GB elsewhere.
And again, I do not need to recover previous data (and since I did write a lot of 0’s, they’re probably gone for good :p). I just want to format the disk to make it usable again.

Thanks for your time

Источник

I am currently setting up a SAN for diskless boot. My backend consists of ZFS-Vol shared via iSCSI. So far everything is working just fine except for TRIM/UNMAP. For test puposes I setup two VMs running Ubuntu20.04 in VirtualBox networked together via an internal network with static IPv4 addresses. On the target (tgt) got a second virtual drive formatted with ZFS. On this zpool I created a zVol and formatted it with GPT and ext4.

/etc/tgt/conf.d/iscsi.conf
<target example.com:lun1>
    <backing-store /dev/zvol/tank/iscsi_share>
        params thin_provisioning=1
    </backing-store>
    initiator-address 192.168.0.2
</target>

On the initiator (open-iscsi) I use this command to provoke a TRIM operation:

sudo mount /dev/sdb1 /iscsi-share
sudo dd if=/dev/zero of=/iscsi-share/zero bs=1M count=512
sudo rm /iscsi-share/zero
sudo fstrim /iscsi-share

but the shell responds with «fstrim: /iscsi-share: the discard option is not supported». If I issue those commands on the target machine the «REFER» property of the zVol decreases as expected.

As I found nothing while searching the web I found no hint as to why this is not working or if this is even possible at all.

Edit:
As I got the advice to use the option thin_provisioning.

After I repartitioned the drive and mounted it on the initiator I got error message blk_update_request: critical target error, dev sdb, sector 23784 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
for several sectors and after creating and deleting my testfile, fstrim send the message

blk_update_request: I/O error, dev sdb, sector 68968 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
fstrim: iscsi-share: FITRIM ioctl failed: Input/output error

Edit:
As there were Answers refering to LIO I now also tried targetcli. There I setup a target with my zVol under /backstores/block/iscsi and set attribute emultate_tpu=1. After importing this into my initiator I repartitioned, formatted and mounted it on the initiator. Then I created my test file, deletetd it and issued the fstrim command and it worked. Thanks for the help.

Источник

list all installed kernels

«solution»: boot previous kernel ( in this case: linux-image-4.9.0-11-amd64 )

this is what it was supposed to look like

regular filesystem checks

Читайте также: