As I can’t reopen #703 so it’s more «visible», I open a new thread and copy the last post. I apologize if this isn’t the right way of doing this.
I think this issue have risen again, the only difference I see is that the log message changed from «end_request» to «blk_update_request», but behaviour seems the same:
Apr 14 13:00:39 raspberrypi kernel: [ 561.226469] blk_update_request: critical target error, dev sda, sector 404211137
Apr 14 13:00:39 raspberrypi kernel: [ 561.226579] Aborting journal on device sda2-8.
Apr 14 13:00:40 raspberrypi kernel: [ 563.043697] EXT4-fs error (device sda2): ext4_journal_check_start:56: Detected aborted journal
Apr 14 13:00:40 raspberrypi kernel: [ 563.057031] EXT4-fs (sda2): Remounting filesystem read-only
Apr 14 13:00:51 raspberrypi kernel: [ 574.083665] EXT4-fs error (device sda2): ext4_put_super:789: Couldn't clean up the journal
If I try to fsck:
Apr 14 12:45:24 raspberrypi kernel: [ 1603.653290] EXT4-fs error (device sda2): ext4_put_super:789: Couldn't clean up the journal
Apr 14 12:45:30 raspberrypi kernel: [ 1610.138494] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:45:30 raspberrypi kernel: [ 1610.197294] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:45:30 raspberrypi kernel: [ 1610.203760] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.278773] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.282161] blk_update_request: critical target error, dev sda, sector 0
Apr 14 12:46:41 raspberrypi kernel: [ 1681.283998] blk_update_request: critical target error, dev sda, sector 0
Based on my tests, last working commit is Hexxeh/rpi-firmware@f74b921
Broken from Hexxeh/rpi-firmware@cad071a and onwards.
First off: It is NOT your fault. It just shows that updates, without backups, are dangerous on ANY OS and no matter how often it worked before.
I had exactly the same problem today on Debian 9.
A whole ext3 RAID1 «vanished» after kernel was updated from:
linux-image-4.9.0-11-amd64 4.9.189-3+deb9u2
to
linux-image-4.9.0-12-amd64 4.9.210-1
list all installed kernels
dpkg --list | grep linux-image
ii linux-image-4.9.0-11-amd64 4.9.189-3+deb9u2 amd64 Linux 4.9 for 64-bit PCs
ii linux-image-4.9.0-12-amd64 4.9.210-1 amd64 Linux 4.9 for 64-bit PCs
rc linux-image-4.9.0-6-amd64 4.9.88-1+deb9u1 amd64 Linux 4.9 for 64-bit PCs
rc linux-image-4.9.0-8-amd64 4.9.144-3.1 amd64 Linux 4.9 for 64-bit PCs
ii linux-image-4.9.0-9-amd64 4.9.168-1+deb9u3 amd64 Linux 4.9 for 64-bit PCs
ii linux-image-amd64 4.9+80+deb9u10 amd64 Linux for 64-bit PCs (meta-package)
hostnamectl; # os used
Static hostname: storagepc
Icon name: computer-desktop
Chassis: desktop
Operating System: Debian GNU/Linux 9 (stretch)
Kernel: Linux 4.9.0-12-amd64
Architecture: x86-64
Those are the kind of «heart attack» moments X-D
Let’s try to stay cool!
«solution»: boot previous kernel ( in this case: linux-image-4.9.0-11-amd64 )
vim /etc/default/grub
GRUB_TIMEOUT=3 <- make sure a timeout larger than 0 is defined (or no time to select any options during boot)
# let grub2 do its stuff
update-grub
# is the same as:
uupdate-grub2
# reboot the system (if USB keyboard is not reacting during grub boot screen, try PS2 keyboard)
reboot
# when grub boot screen appears
After booting linux-image-4.9.0-11-amd64 kernel, can access ext3 RAID1 AGAIN!
Problem: grub won’t remember that choice.
To make this permanent:
vim /etc/default/grub
# during boot:
## select in the first menu the second (0,1) entry
#### then select in the second menu select the 3rd entry (0,1,2)
GRUB_DEFAULT="1>2"
# make grub2 realize the changes
update-grub
… yes it is confusing I know X-D
this is what it was supposed to look like
Have two RAID1 defined.
# show status of raid
cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 sdc1[1] sdb1[0]
3906886464 blocks super 1.2 [2/2] [UU]
bitmap: 0/30 pages [0KB], 65536KB chunk
md127 : active raid1 sde1[0] sdd1[2]
1953381376 blocks super 1.2 [2/2] [UU]
bitmap: 0/15 pages [0KB], 65536KB chunk
# show what is mounted
mount
/dev/md126 on /media/user/ext4RAID1 type ext4 (rw,relatime,errors=remount-ro,data=ordered)
/dev/md127 on /media/user/ext3RAID1 type ext3 (rw,relatime,data=ordered)
# show block devices
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 238.5G 0 disk
├─sda1 8:1 0 230.8G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 7.7G 0 part [SWAP]
sdb 8:16 0 3.7T 0 disk
└─sdb1 8:17 0 3.7T 0 part
└─md126 9:126 0 3.7T 0 raid1 /media/user/ext4RAID1
sdc 8:32 0 3.7T 0 disk
└─sdc1 8:33 0 3.7T 0 part
└─md126 9:126 0 3.7T 0 raid1 /media/user/ext4RAID1
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
└─md127 9:127 0 1.8T 0 raid1 /media/user/ext3RAID1
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
└─md127 9:127 0 1.8T 0 raid1 /media/user/ext3RAID1
sr0 11:0 1 1024M 0 rom
# find defined raids
mdadm --examine --scan
ARRAY /dev/md/2 metadata=1.2 UUID=90642755:fa191325:0fe4ec59:2456c645 name=storagepc:2
ARRAY /dev/md/1 metadata=1.2 UUID=433fb7e1:9d7f3f17:bc5ee18b:0f4eeb52 name=storagepc:1
# show UUIDS
blkid /dev/sdb1
/dev/sdb1: UUID="90642755-fa19-1325-0fe4-ec592456c645" UUID_SUB="bee458e0-509a-c110-b577-8a1ddbe6bbb3" LABEL="storagepc:2" TYPE="linux_raid_member" PARTUUID="1fd02041-9dd2-4918-83a3-c8bafbab3bed"
blkid /dev/sdc1
/dev/sdc1: UUID="90642755-fa19-1325-0fe4-ec592456c645" UUID_SUB="7d5947f8-1ba0-0c7b-18a7-194ab4051a2c" LABEL="storagepc:2" TYPE="linux_raid_member" PARTUUID="5e4ea781-68e5-43f0-accf-26342aeb4daa"
userblkid /dev/sdd1
/dev/sdd1: UUID="433fb7e1-9d7f-3f17-bc5e-e18b0f4eeb52" UUID_SUB="bed17780-3817-27c9-6336-44d4aedfb857" LABEL="storagepc:1" TYPE="linux_raid_member" PARTUUID="f6aab6c2-01"
userblkid /dev/sde1
/dev/sde1: UUID="433fb7e1-9d7f-3f17-bc5e-e18b0f4eeb52" UUID_SUB="eb90b361-94d6-2f38-7727-d386097dce81" LABEL="storagepc:1" TYPE="linux_raid_member" PARTUUID="d2fd127f-01"
regular filesystem checks
Has nothing to do with the problem but defining this via tune2fs
has the advantage, that it will automatically be performed during boot.
tune2fs -C 2 -c 1 /dev/sda1; # check filesystem on every boot (for ext3 takes rather long X-D)
tune2fs -c 10 -i 30 /dev/sda1; # check sda1 every 10 mounts or after 30 days
I just ordered a new server with a 1TB Samsung SSD. Installed Ubuntu 14.04.5 LTS.
After booting into the newly installed system, I see this in my dmesg and /var/lib/syslog. Output of grep error /var/log/syslog
:
May 12 03:47:34 lf5 kernel: [ 0.373789] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:47:34 lf5 kernel: [ 10.382147] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 10.382152] res 40/00:e0:f8:69:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 10.712517] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 10.712521] res 40/00:d0:38:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.119541] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:47:34 lf5 kernel: [ 11.119545] res 40/00:40:30:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526336] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 03:47:34 lf5 kernel: [ 11.526341] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526345] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526348] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 11.526351] res 40/00:60:40:01:7c/00:00:5f:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:47:34 lf5 kernel: [ 21.349950] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 03:51:10 lf5 kernel: [ 0.389787] HEST: Enabling Firmware First mode for corrected errors.
May 12 03:51:10 lf5 kernel: [ 10.906423] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 10.906429] res 40/00:80:08:00:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 11.488276] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 11.488281] res 40/00:c0:28:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 11.960792] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 11.960796] res 40/00:b8:b0:01:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 12.366482] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 03:51:10 lf5 kernel: [ 12.366486] res 40/00:60:e0:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 03:51:10 lf5 kernel: [ 20.918620] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:07:19 lf5 kernel: [ 0.390011] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:07:19 lf5 kernel: [ 10.349119] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 10.349124] res 40/00:88:a8:6d:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 10.738449] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 10.738453] res 40/00:20:60:6b:70/00:00:74:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 11.072972] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 11.072976] res 40/00:60:50:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 11.471777] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:07:19 lf5 kernel: [ 11.471781] res 40/00:48:c8:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:07:19 lf5 kernel: [ 20.651217] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
May 12 17:18:16 lf5 kernel: [ 0.389808] HEST: Enabling Firmware First mode for corrected errors.
May 12 17:18:17 lf5 kernel: [ 10.762352] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [ 10.762360] res 40/00:40:08:03:00/00:00:00:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 11.338565] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338569] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338572] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 11.338576] res 40/00:b8:20:01:7c/00:00:5f:00:00/40 Emask 0x1 (device error)
May 12 17:18:17 lf5 kernel: [ 20.087229] res 41/84:08:b8:14:7d/00:00:63:00:00/00 Emask 0x410 (ATA bus error) <F>
May 12 17:18:17 lf5 kernel: [ 20.298295] ata8.00: error: { ICRC ABRT }
May 12 17:18:17 lf5 kernel: [ 21.176551] sd 7:0:0:0: [sda] tag#0 Add. Sense: Scsi parity error
May 12 17:18:17 lf5 kernel: [ 21.316632] blk_update_request: I/O error, dev sda, sector 1669074520
May 12 17:18:17 lf5 kernel: [ 21.542013] ata8.00: irq_stat 0x08000000, interface fatal error
May 12 17:18:17 lf5 kernel: [ 21.759477] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.052681] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.347138] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.642363] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 22.938868] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.239764] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.542336] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 23.840288] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.138769] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.439063] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 24.740494] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.047057] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.354884] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.662079] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 25.967498] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.273208] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.579035] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 26.884890] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.190868] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.496523] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 27.801825] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.106876] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.412223] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 28.717662] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.022620] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.326675] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.629826] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 29.932271] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 30.234666] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 30.537024] res 40/00:e8:78:02:7c/00:00:63:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 31.765128] blk_update_request: I/O error, dev sda, sector 1669071496
May 12 17:18:17 lf5 kernel: [ 32.143969] blk_update_request: I/O error, dev sda, sector 1669071504
May 12 17:18:17 lf5 kernel: [ 32.527171] blk_update_request: I/O error, dev sda, sector 1669071512
May 12 17:18:17 lf5 kernel: [ 32.915371] blk_update_request: I/O error, dev sda, sector 1669071544
May 12 17:18:17 lf5 kernel: [ 33.308218] blk_update_request: I/O error, dev sda, sector 1669071552
May 12 17:18:17 lf5 kernel: [ 33.706503] blk_update_request: I/O error, dev sda, sector 1669071520
May 12 17:18:17 lf5 kernel: [ 34.108892] blk_update_request: I/O error, dev sda, sector 1669071528
May 12 17:18:17 lf5 kernel: [ 34.516541] blk_update_request: I/O error, dev sda, sector 1669071536
May 12 17:18:17 lf5 kernel: [ 34.929267] blk_update_request: I/O error, dev sda, sector 1669071368
May 12 17:18:17 lf5 kernel: [ 35.347838] blk_update_request: I/O error, dev sda, sector 1669071376
May 12 17:18:17 lf5 kernel: [ 36.004437] res 41/04:a8:90:d2:89/00:00:5f:00:00/00 Emask 0x401 (device error) <F>
May 12 17:18:17 lf5 kernel: [ 36.257143] ata8.00: error: { ABRT }
May 12 17:18:17 lf5 kernel: [ 37.681581] ata8.00: irq_stat 0x08000008, interface fatal error
May 12 17:18:17 lf5 kernel: [ 37.681586] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681590] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681593] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681596] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681599] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681602] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681605] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681608] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681611] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681615] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681618] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681621] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681624] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681627] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681630] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681633] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681636] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681639] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681642] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681645] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681649] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681652] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 37.681655] res 40/00:b8:e0:04:bc/00:00:70:00:00/40 Emask 0x52 (ATA bus error)
May 12 17:18:17 lf5 kernel: [ 38.005003] blk_update_request: I/O error, dev sda, sector 1891370112
May 12 17:18:17 lf5 kernel: [ 38.005009] blk_update_request: I/O error, dev sda, sector 1891370120
May 12 17:18:17 lf5 kernel: [ 38.005013] blk_update_request: I/O error, dev sda, sector 1891370128
May 12 17:18:17 lf5 kernel: [ 38.005017] blk_update_request: I/O error, dev sda, sector 1891370136
May 12 17:18:17 lf5 kernel: [ 38.005021] blk_update_request: I/O error, dev sda, sector 1891370144
May 12 17:18:17 lf5 kernel: [ 38.005025] blk_update_request: I/O error, dev sda, sector 1891370152
May 12 17:18:17 lf5 kernel: [ 38.005029] blk_update_request: I/O error, dev sda, sector 1891370160
May 12 17:18:17 lf5 kernel: [ 38.005032] blk_update_request: I/O error, dev sda, sector 1891370168
May 12 17:18:17 lf5 kernel: [ 38.005036] blk_update_request: I/O error, dev sda, sector 1891370176
May 12 17:18:17 lf5 kernel: [ 38.005040] blk_update_request: I/O error, dev sda, sector 1891370184
May 12 17:18:17 lf5 kernel: [ 49.093973] EXT4-fs (sda2): re-mounted. Opts: errors=remount-ro
I am mostly concerned about these entries: blk_update_request: I/O error, dev sda, sector xxxxxxxxxxx
I ran badblocks -v /dev/sda
which returned no errors.
I then ran smartctl --all /dev/sda
, which also returned no errors. See output below. This one includes a short self test
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.4.0-31-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: Samsung SSD 850 EVO 1TB
Serial Number: S3PHNF0JC00710K
LU WWN Device Id: 5 002538 d428254a0
Firmware Version: EMT03B6Q
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat May 12 19:08:22 2018 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 512) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 8
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 31
177 Wear_Leveling_Count 0x0013 100 100 000 Pre-fail Always - 0
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 099 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 069 067 000 Old_age Always - 31
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 099 099 000 Old_age Always - 20
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 25
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 55078112
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 8 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 116055040 116120575 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
My question is simple: What do you think might be wrong? The SSD should be brand new. It’s hard for me, in good conscience, to put this server into production with those errors in the logs. And the box is otherwise acting normal.
It is odd… this array was working well in Windows on the pre-flashed Perc h710p, I’m not saying you’re wrong but it would be very coincidental for the cables to go between switching OSs. Plus the Preclean scripts had no issues erasing the disks nor writing zeros across all sectors.
When I tail the /var/log/syslog and attempt to add a disk to the array, there is some output with this block seeming to be the most relevant:
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 Sense Key : 0x7 [current]
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 ASC=0x20 ASCQ=0x2
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#3435 CDB: opcode=0x28 28 00 00 00 00 00 00 00 20 00
Apr 19 15:24:11 Tower kernel: blk_update_request: critical target error, dev sdh, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=0s
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 Sense Key : 0x7 [current]
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 ASC=0x20 ASCQ=0x2
Apr 19 15:24:11 Tower kernel: sd 1:0:6:0: [sdh] tag#5249 CDB: opcode=0x28 28 00 00 00 00 00 00 00 08 00
Apr 19 15:24:11 Tower kernel: blk_update_request: critical target error, dev sdh, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Apr 19 15:24:11 Tower kernel: Buffer I/O error on dev sdh, logical block 0, async page read
Apr 19 15:24:11 Tower emhttpd: error: ckmbr, 2197: Input/output error (5): read: /dev/sdh
Apr 19 15:24:11 Tower emhttpd: ckmbr error: -1
The iDRAC on this box (pre flash) used to show seven out of the eight disks as «secured», I wonder if there isn’t some lockdown that Dell has in place that ensures genuine disks or something similar.
A friend of mine gave me his external 2TB Seagate HDD which appeared to be somewhat faulty.
And, it is indeed pretty faulty.
First, I did try a lot of «common» commands, spent a few hours googling stuff, tried Linux and Windows (for chkdsk), opened the HDD case to plug it directly in SATA and I’ll add that I do not need to recover the data, I just need to format it.
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1,8T 0 disk
Here, sda
is the disk, its size, 1,8T seems correct.
In GParted, the disk only appears to be ~1.9GB. I can create a partition table but I cannot create a valid partition. And even if I could, it could only be 1.9GB.
dd if=/dev/zero of=/dev/sda
dd: error writing '/dev/sda': No space left on device
3782129+0 records in
3782128+0 records out
1936449536 bytes (1,9 GB, 1,8 GiB) copied, 7,04022 s, 275 MB/s
smartctl -a /dev/sda
Read Device Identity failed: Invalid argument
parted -l
Error: Unable to open /dev/sda - unrecognised disk label.
Model: (file)
Disk /dev/sda : 1936MB
Sector size (logical/physical): 512B/512B
Partition table : unknown
dmesg
[ 7925.612174] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 7925.862625] sd 2:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
[ 7931.193045] sd 2:0:0:0: [sda] 3809353968 512-byte logical blocks: (1.95 TB/1.77 TiB)
[ 7931.193049] sd 2:0:0:0: [sda] 4096-byte physical blocks
[ 7931.193313] sd 2:0:0:0: [sda] Write Protect is off
[ 7931.193316] sd 2:0:0:0: [sda] Mode Sense: 2f 00 00 00
[ 7931.193593] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 7931.193995] sd 2:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[ 7931.390515] sd 2:0:0:0: [sda] tag#18 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.390523] sd 2:0:0:0: [sda] tag#18 Sense Key : Illegal Request [current]
[ 7931.390529] sd 2:0:0:0: [sda] tag#18 Add. Sense: Invalid command operation code
[ 7931.390536] sd 2:0:0:0: [sda] tag#18 CDB: Read(6) 08 00 00 00 08 00
[ 7931.390545] blk_update_request: critical target error, dev sda, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 7931.390558] Buffer I/O error on dev sda, logical block 0, async page read
[ 7931.500384] sd 2:0:0:0: [sda] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 7931.500451] sd 2:0:0:0: [sda] tag#19 Sense Key : Illegal Request [current]
[ 7931.500461] sd 2:0:0:0: [sda] tag#19 Add. Sense: Invalid command operation code
[ 7931.500472] sd 2:0:0:0: [sda] tag#19 CDB: Read(6) 08 00 00 00 08 00
Do you have any idea? I guess the HDD may be dead, but I’m not quite sure.
What I find intriguing is the 1.8TB size with lsblk
and 1.9GB elsewhere.
And again, I do not need to recover previous data (and since I did write a lot of 0’s, they’re probably gone for good :p). I just want to format the disk to make it usable again.
Thanks for your time
I am currently setting up a SAN for diskless boot. My backend consists of ZFS-Vol shared via iSCSI. So far everything is working just fine except for TRIM/UNMAP. For test puposes I setup two VMs running Ubuntu20.04 in VirtualBox networked together via an internal network with static IPv4 addresses. On the target (tgt) got a second virtual drive formatted with ZFS. On this zpool I created a zVol and formatted it with GPT and ext4.
/etc/tgt/conf.d/iscsi.conf
<target example.com:lun1>
<backing-store /dev/zvol/tank/iscsi_share>
params thin_provisioning=1
</backing-store>
initiator-address 192.168.0.2
</target>
On the initiator (open-iscsi) I use this command to provoke a TRIM operation:
sudo mount /dev/sdb1 /iscsi-share
sudo dd if=/dev/zero of=/iscsi-share/zero bs=1M count=512
sudo rm /iscsi-share/zero
sudo fstrim /iscsi-share
but the shell responds with «fstrim: /iscsi-share: the discard option is not supported». If I issue those commands on the target machine the «REFER» property of the zVol decreases as expected.
As I found nothing while searching the web I found no hint as to why this is not working or if this is even possible at all.
Edit:
As I got the advice to use the option thin_provisioning.
After I repartitioned the drive and mounted it on the initiator I got error message blk_update_request: critical target error, dev sdb, sector 23784 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
for several sectors and after creating and deleting my testfile, fstrim
send the message
blk_update_request: I/O error, dev sdb, sector 68968 op 0x3:(DISCARD) flags 0x800 phys_seg 1 prio class 0
fstrim: iscsi-share: FITRIM ioctl failed: Input/output error
Edit:
As there were Answers refering to LIO I now also tried targetcli. There I setup a target with my zVol under /backstores/block/iscsi and set attribute emultate_tpu=1
. After importing this into my initiator I repartitioned, formatted and mounted it on the initiator. Then I created my test file, deletetd it and issued the fstrim
command and it worked. Thanks for the help.