Ext4 fs error count since last fsck

Question

Mihail_Boyanskiy

- Share

Привет. Есть два вопроса. Первый — О чём говорят данные ошибки при загрузке роутера? Второй — Как исправить?

EXT4-fs (sda2): error count since last fsck: 64
Июн 17 17:29:43 kernel
EXT4-fs (sda2): initial error at time 1533467161: mb_free_blocks:1303: inode 110226: block 233994
Июн 17 17:29:43 kernel
EXT4-fs (sda2): last error at time 1623671029: ext4_mb_generate_buddy:756

Quote

Link to comment

Share on other sites

Join the conversation

You can post now and register later.

If you have an account, sign in now to post with your account.

Note: Your post will require moderator approval before it will be visible.

Источник

Well that disk started as sdo:

Oct 14 20:07:28 Brunnhilde kernel: usb-storage 4-2.2:1.0: USB Mass Storage device detected
Oct 14 20:07:28 Brunnhilde kernel: scsi host10: usb-storage 4-2.2:1.0
Oct 14 20:07:29 Brunnhilde kernel: scsi 10:0:0:0: Direct-Access     Elite    Pro USB          0    PQ: 0 ANSI: 6
Oct 14 20:07:29 Brunnhilde kernel: sd 10:0:0:0: Attached scsi generic sg12 type 0
Oct 14 20:07:29 Brunnhilde kernel: sd 10:0:0:0: [sdo] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Oct 14 20:07:29 Brunnhilde kernel: sd 10:0:0:0: [sdo] Write Protect is off
Oct 14 20:07:29 Brunnhilde kernel: sd 10:0:0:0: [sdo] Mode Sense: 43 00 00 00
Oct 14 20:07:29 Brunnhilde kernel: sd 10:0:0:0: [sdo] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 14 20:07:33 Brunnhilde kernel: sdo: sdo1

The it got disconnect and reconnect as sdp:

Oct 14 20:44:23 Brunnhilde kernel: usb 4-2.2: USB disconnect, device number 7
Oct 14 20:44:23 Brunnhilde kernel: blk_update_request: I/O error, dev sdo, sector 0
Oct 14 20:44:23 Brunnhilde kernel: sd 10:0:0:0: [sdo] Synchronizing SCSI cache
Oct 14 20:44:23 Brunnhilde kernel: sd 10:0:0:0: [sdo] Synchronize Cache(10) failed: Result: hostbyte=0x01 driverbyte=0x00
Oct 14 20:44:23 Brunnhilde rc.diskinfo[17670]: PHP Warning: Missing argument 2 for force_reload() in /etc/rc.d/rc.diskinfo on line 691
Oct 14 20:44:23 Brunnhilde rc.diskinfo[17670]: SIGHUP received, forcing refresh of disks info.
Oct 14 20:44:23 Brunnhilde kernel: usb 4-2: new SuperSpeed USB device number 8 using xhci_hcd
Oct 14 20:44:23 Brunnhilde kernel: hub 4-2:1.0: USB hub found
Oct 14 20:44:23 Brunnhilde kernel: hub 4-2:1.0: 4 ports detected
Oct 14 20:44:23 Brunnhilde kernel: usb 4-2.2: new SuperSpeed USB device number 9 using xhci_hcd
Oct 14 20:44:23 Brunnhilde kernel: usb-storage 4-2.2:1.0: USB Mass Storage device detected
Oct 14 20:44:23 Brunnhilde kernel: scsi host11: usb-storage 4-2.2:1.0
Oct 14 20:44:24 Brunnhilde rc.diskinfo[17670]: PHP Warning: Missing argument 2 for force_reload() in /etc/rc.d/rc.diskinfo on line 691
Oct 14 20:44:24 Brunnhilde rc.diskinfo[17670]: SIGHUP received, forcing refresh of disks info.
Oct 14 20:44:25 Brunnhilde kernel: scsi 11:0:0:0: Direct-Access     Elite    Pro USB          0    PQ: 0 ANSI: 6
Oct 14 20:44:25 Brunnhilde kernel: sd 11:0:0:0: Attached scsi generic sg12 type 0
Oct 14 20:44:25 Brunnhilde kernel: sd 11:0:0:0: [sdp] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Oct 14 20:44:25 Brunnhilde kernel: sd 11:0:0:0: [sdp] Write Protect is off
Oct 14 20:44:25 Brunnhilde kernel: sd 11:0:0:0: [sdp] Mode Sense: 43 00 00 00
Oct 14 20:44:25 Brunnhilde kernel: sd 11:0:0:0: [sdp] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 14 20:44:25 Brunnhilde kernel: sdp: sdp1

Same thing repeatedly happened again and it get’s reconnect with a different letter.

Your log is full of USB disconnects, one of the reasons USB devices are not really recommend.

Edited November 5, 2017 by johnnie.black

Источник

I have an 8TiB disk attached via UBS3 and formatted into 3 EXT3 partitions which I use as a backup drive (it’s plugged into a SATA cradle).

The disk has been attached and mounted for several days without being explicitly written to (I backed up some data a couple of days ago).

I happened to take a look at dmesg and spotted the following (this is filtered to show only entries matching the disk name, sdg):

[393945.628890] EXT4-fs (sdg2): error count since last fsck: 4
[393945.628894] EXT4-fs (sdg2): initial error at time 1589268773: ext4_validate_block_bitmap:406
[393945.628897] EXT4-fs (sdg2): last error at time 1589336019: ext4_validate_block_bitmap:406
[394076.698059] EXT4-fs (sdg1): error count since last fsck: 103
[394076.698063] EXT4-fs (sdg1): initial error at time 1589216157: ext4_validate_block_bitmap:406
[394076.698066] EXT4-fs (sdg1): last error at time 1589372294: ext4_lookup:1590: inode 186081476

I’ve not run fsck on this disk since it was partitioned and formatted. Given that fsck has not been run what is finding the errors and how concerned should I be?

When I rebooted the system this morning I checked dmesg again and found (again filtered to show only entries matching sdg)

[  261.721822] sd 9:0:0:0: [sdg] Spinning up disk...
[  274.051062] sd 9:0:0:0: [sdg] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[  274.051065] sd 9:0:0:0: [sdg] 4096-byte physical blocks
[  274.051137] sd 9:0:0:0: [sdg] Write Protect is off
[  274.051140] sd 9:0:0:0: [sdg] Mode Sense: 43 00 00 00
[  274.051297] sd 9:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  274.051498] sd 9:0:0:0: [sdg] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes)
[  274.134309]  sdg: sdg1 sdg2 sdg3
[  274.135296] sd 9:0:0:0: [sdg] Attached SCSI disk
[  274.654835] EXT4-fs (sdg3): mounting ext3 file system using the ext4 subsystem
[  274.696860] EXT4-fs (sdg3): warning: mounting fs with errors, running e2fsck is recommended
[  274.766709] EXT4-fs (sdg1): mounting ext3 file system using the ext4 subsystem
[  274.795109] EXT4-fs (sdg1): warning: mounting fs with errors, running e2fsck is recommended
[  274.825210] EXT4-fs (sdg2): mounting ext3 file system using the ext4 subsystem
[  274.891191] EXT4-fs (sdg2): warning: mounting fs with errors, running e2fsck is recommended
[  275.713323] EXT4-fs (sdg2): mounted filesystem with ordered data mode. Opts: (null)
[  276.460528] EXT4-fs (sdg3): mounted filesystem with ordered data mode. Opts: (null)
[  276.499085] EXT4-fs (sdg1): mounted filesystem with ordered data mode. Opts: (null)
[  578.549827] EXT4-fs (sdg1): error count since last fsck: 103
[  578.549830] EXT4-fs (sdg1): initial error at time 1589216157: ext4_validate_block_bitmap:406
[  578.549832] EXT4-fs (sdg1): last error at time 1589372294: ext4_lookup:1590: inode 186081476
[  578.549836] EXT4-fs (sdg3): error count since last fsck: 47
[  578.549837] EXT4-fs (sdg3): initial error at time 1589268525: htree_dirblock_to_tree:1022: inode 31604737: block 126419458
[  578.549840] EXT4-fs (sdg3): last error at time 1589380312: ext4_lookup:1594: inode 33701921
[  578.549844] EXT4-fs (sdg2): error count since last fsck: 4
[  578.549845] EXT4-fs (sdg2): initial error at time 1589268773: ext4_validate_block_bitmap:406
[  578.549847] EXT4-fs (sdg2): last error at time 1589336019: ext4_validate_block_bitmap:406
[  639.938843] EXT4-fs (sdg1): mounting ext3 file system using the ext4 subsystem
[  640.950738] EXT4-fs (sdg1): mounted filesystem with ordered data mode. Opts: (null)
[  650.900006] EXT4-fs (sdg2): mounting ext3 file system using the ext4 subsystem
[  651.207658] EXT4-fs (sdg2): mounted filesystem with ordered data mode. Opts: (null)
[  658.836040] EXT4-fs (sdg3): mounting ext3 file system using the ext4 subsystem
[  659.084558] EXT4-fs (sdg3): mounted filesystem with ordered data mode. Opts: (null)

So the system knows there are errors and has still mounted the disk without displaying any warnings other than the entries in dmesg.

Roughly 30 minutes later I checked again because I was curious now and found:

[  955.353027] EXT4-fs (sdg2): error count since last fsck: 3248
[  955.353031] EXT4-fs (sdg2): initial error at time 1589268773: ext4_validate_block_bitmap:406
[  955.353033] EXT4-fs (sdg2): last error at time 1589437923: ext4_map_blocks:604: inode 103686210: block 1947002998
[  955.353039] EXT4-fs (sdg1): error count since last fsck: 103
[  955.353040] EXT4-fs (sdg1): initial error at time 1589216157: ext4_validate_block_bitmap:406
[  955.353042] EXT4-fs (sdg1): last error at time 1589372294: ext4_lookup:1590: inode 186081476
[  956.751484] EXT4-fs error (device sdg2): ext4_map_blocks:604: inode #103686210: block 1947002998: comm updatedb.mlocat: lblock 12 mapped to illegal pblock 1947002998 (length 1)
[  956.767496] EXT4-fs error (device sdg2): ext4_map_blocks:604: inode #103686210: block 1947002998: comm updatedb.mlocat: lblock 12 mapped to illegal pblock 1947002998 (length 1)
[  956.782683] EXT4-fs warning (device sdg2): htree_dirblock_to_tree:994: inode #103686210: lblock 12: comm updatedb.mlocat: error -117 reading directory block

Eeek! The error count has increased for sdg2!

Again I’ve not explicitly written to the disk all this time.

Before partitioning & formatting the drive with gparted I used fsck to run a bad block scan (took several days) and no errors were found. This is also a new disk. For this reason, I’m reasonably confident that the hardware is good.

What is possibly going on here? How worried should I be about the integrity of filesystems on this disk? What should my next steps be?

Источник

Hi kind people,

First of all: I do not have physical access to the pi. just ssh.

The problem:
I have a Pi 3b+ booting form an 1tb usb hdd with stretch.
Someone there unplugged the power and the Pi was gone for some days. Now its back online but i found some kernel messages at kern.log:

Jul 8 06:25:51 kernel: [56256.560073] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517646
Jul 8 06:25:51kernel: [56256.575081] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517644
Jul 8 06:25:51 kernel: [56256.586311] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517648
Jul 8 06:25:54 kernel: [56259.776821] EXT4-fs error (device sda2): ext4_lookup:1578: inode #37191: comm updatedb.mlocat: deleted inode referenced: 1517629
Jul 9 06:25:03 kernel: [142608.634548] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517646
Jul 9 06:25:03 kernel: [142608.656301] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517644
Jul 9 06:25:03 kernel: [142608.667414] EXT4-fs error (device sda2): ext4_lookup:1578: inode #1516286: comm updatedb.mlocat: deleted inode referenced: 1517648
Jul 9 06:25:03 kernel: [142608.898870] EXT4-fs error (device sda2): ext4_lookup:1578: inode #37191: comm updatedb.mlocat: deleted inode referenced: 1517629
Jul 9 06:56:14 kernel: [144479.464526] EXT4-fs (sda2): error count since last fsck: 8
Jul 9 06:56:14 kernel: [144479.464535] EXT4-fs (sda2): initial error at time 1562559951: ext4_lookup:1578: inode 1516286
Jul 9 06:56:14 kernel: [144479.464545] EXT4-fs (sda2): last error at time 1562646303: ext4_lookup:1578: inode 37191

I have no idea why its back online (if it was doing something while booting or someone attached power to it again).

Whats the best practice now?

Internet says running fsck at sda2 is a bad idea because its mounted as /.

Adding fsck.mode=force to cmdline.txt is suggested but the official docs doesn’t say anything about this parameter.

I asked at the chat and someone said since its not documented he would run fsck even if the partition is mounted. But im not feeling good with it, to be honest. So im asking for some help here.

My current cmdline.txt is the following:

Code: Select all

dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=PARTUUID=0862402d-02 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait ipv6.disable=1

Thank you in advance!

Источник

Thanks. I tried that but I’m still getting the same error.

Nov 13 09:54:49 pve kernel: EXT4-fs (dm-9): warning: mounting fs with errors, running e2fsck is recommended
Nov 13 09:54:49 pve kernel: EXT4-fs (dm-9): mounted filesystem with ordered data mode. Opts: (null)
…
Nov 13 09:59:49 pve kernel: EXT4-fs (dm-9): error count since last fsck: 4
Nov 13 09:59:49 pve kernel: EXT4-fs (dm-9): initial error at time 1478905387: ext4_journal_check_start:56
Nov 13 09:59:49 pve kernel: EXT4-fs (dm-9): last error at time 1478908776: ext4_put_super:813

It’s a pretty new SSD drive, not that that means anything…

root@pve:~# fdisk -l

Disk /dev/ram0: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram1: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram2: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram3: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram4: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram5: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram6: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram7: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram8: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram9: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram10: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram11: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram12: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram13: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram14: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/ram15: 64 MiB, 67108864 bytes, 131072 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sda: 119.2 GiB, 128035676160 bytes, 250069680 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F6E39634-6FBF-4BCE-ADDD-F30AB03DA489

Device Start End Sectors Size Type
/dev/sda1 34 2047 2014 1007K BIOS boot
/dev/sda2 2048 262143 260096 127M EFI System
/dev/sda3 262144 250069646 249807503 119.1G Linux LVM

Disk /dev/mapper/pve-root: 29.8 GiB, 31943819264 bytes, 62390272 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/pve-swap: 7 GiB, 7516192768 bytes, 14680064 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/pve-vm—100—state—Oct_2016: 4.5 GiB, 4819255296 bytes, 9412608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk /dev/mapper/pve-vm—100—state—Nov_2016: 4.5 GiB, 4819255296 bytes, 9412608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk /dev/mapper/pve-vm—102—disk—1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk /dev/mapper/pve-vm—101—disk—1: 18 GiB, 19327352832 bytes, 37748736 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk /dev/mapper/pve-vm—103—disk—1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disk /dev/mapper/pve-vm—100—disk—1: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0x0002290a

Device Boot Start End Sectors Size Id Type
/dev/mapper/pve-vm—100—disk—1p1 * 2048 718848 716801 350M 83 Linux
/dev/mapper/pve-vm—100—disk—1p2 718849 4913152 4194304 2G 82 Linux swap / Solaris
/dev/mapper/pve-vm—100—disk—1p3 4913153 7010304 2097152 1G 83 Linux
/dev/mapper/pve-vm—100—disk—1p4 7010305 67108863 60098559 28.7G f W95 Ext’d (LBA)
/dev/mapper/pve-vm—100—disk—1p5 7010306 27076608 20066303 9.6G 83 Linux
/dev/mapper/pve-vm—100—disk—1p6 27076610 38340608 11263999 5.4G 83 Linux
/dev/mapper/pve-vm—100—disk—1p7 38340610 64704512 26363903 12.6G 83 Linux
/dev/mapper/pve-vm—100—disk—1p8 64704514 66648064 1943551 949M 83 Linux

Partition 3 does not start on physical sector boundary.

Partition 4 does not start on physical sector boundary.

Partition 5 does not start on physical sector boundary.

Partition 6 does not start on physical sector boundary.

Partition 7 does not start on physical sector boundary.

Partition 8 does not start on physical sector boundary.

Partition 9 does not start on physical sector boundary.

Disk /dev/mapper/pve-vm—100—state—Nov_2016_2: 4.5 GiB, 4819255296 bytes, 9412608 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes

Источник

View previous topic :: View next topic Author Message Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Sun Dec 17, 2017 6:40 pm Post subject: EXT errors

I’ve been getting these errors everyone so often:

Code:

Dec 09 22:07:17 [kernel] [784634.848851] EXT4-fs (sda3): error count since last fsck: 1

Dec 09 22:07:17 [kernel] [784634.848855] EXT4-fs (sda3): initial error at time 1512259322: ext4_mb_generate_buddy:758

Dec 09 22:07:17 [kernel] [784634.848859] EXT4-fs (sda3): last error at time 1512259322: ext4_mb_generate_buddy:758

and earlier:

Code:

Nov 29 11:19:58 [kernel] [1552192.479561] EXT4-fs (sda3): error count since last fsck: 710

Nov 29 11:19:58 [kernel] [1552192.479565] EXT4-fs (sda3): initial error at time 1511004226: ext4_mb_complex_scan_group:1972

Nov 29 11:19:58 [kernel] [1552192.479569] EXT4-fs (sda3): last error at time 1511596639: ext4_mb_generate_buddy:758

Code:

Nov 28 10:51:58 [kernel] [1464112.095562] EXT4-fs (sda3): error count since last fsck: 710

Nov 28 10:51:58 [kernel] [1464112.095566] EXT4-fs (sda3): initial error at time 1511004226: ext4_mb_complex_scan_group:1972

Nov 28 10:51:58 [kernel] [1464112.095570] EXT4-fs (sda3): last error at time 1511596639: ext4_mb_generate_buddy:758

Is this a sign of a bad disk? A bad motherboard? Something else?

This is in a quite-old (it has a Core 2 Duo) laptop with an about 6 year old SSD in it.

Any suggestions about how to proceed would be helpful.

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Sun Dec 17, 2017 6:47 pm Post subject:

Nicias,

Maybe all of these things, maybe none of them.

What other errors are there in dmesg?

Put it all on a pastebin site please.

The output of

Code:

smartctl -a /dev/sda

would be useful.

Don’t run fsck unless you have a known good set of backups.

fsck often makes a bad situation worse.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-

those that do backups

those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Mon Dec 18, 2017 11:48 am Post subject:

got a ton of new errors in the last day:

https://pastebin.com/s7KVxKmR

but SMART looks fine.

Code:

# smartctl -a /dev/sda

smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.12.12-gentoo] (local build)

Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Family: Samsung based SSDs

Device Model: Samsung SSD 840 Series

Serial Number: S14CNEACA81371V

LU WWN Device Id: 5 002538 55002d356

Add. Product Id: 00000000

Firmware Version: DXT06B0Q

User Capacity: 120,034,123,776 bytes [120 GB]

Sector Size: 512 bytes logical/physical

Rotation Rate: Solid State Device

Device is: In smartctl database [for details use: -P show]

ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c

SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)

Local Time is: Mon Dec 18 06:47:20 2017 EST

SMART support is: Available — device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x80) Offline data collection activity

            was never started.

            Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

            without error or no self-test has ever

            been run.

Total time to complete Offline

data collection:    ( 240) seconds.

Offline data collection

capabilities:       (0x53) SMART execute Offline immediate.

            Auto Offline data collection on/off support.

            Suspend Offline collection upon new

            command.

            No Offline surface scan supported.

            Self-test supported.

            No Conveyance Self-test supported.

            Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

            power-saving mode.

            Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

            General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 30) minutes.

SCT capabilities:    (0x003d) SCT Status supported.

            SCT Error Recovery Control supported.

            SCT Feature Control supported.

            SCT Data Table supported.

SMART Attributes Data Structure revision number: 1

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always — 0

9 Power_On_Hours 0x0032 095 095 000 Old_age Always — 24961

12 Power_Cycle_Count 0x0032 097 097 000 Old_age Always — 2075

177 Wear_Leveling_Count 0x0013 094 094 000 Pre-fail Always — 52

179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always — 0

181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always — 0

182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always — 0

183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always — 0

187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always — 0

190 Airflow_Temperature_Cel 0x0032 067 053 000 Old_age Always — 33

195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always — 0

199 CRC_Error_Count 0x003e 100 100 000 Old_age Always — 0

235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always — 214

241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always — 2562291420

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 1035 —

# 2 Extended offline Completed without error 00% 1023 —

# 3 Short offline Completed without error 00% 1000 —

# 4 Short offline Completed without error 00% 999 —

# 5 Short offline Completed without error 00% 975 —

# 6 Short offline Completed without error 00% 951 —

# 7 Short offline Completed without error 00% 927 —

# 8 Short offline Completed without error 00% 903 —

# 9 Short offline Completed without error 00% 869 —

#10 Extended offline Completed without error 00% 855 —

#11 Short offline Completed without error 00% 832 —

#12 Short offline Completed without error 00% 831 —

#13 Short offline Completed without error 00% 807 —

#14 Short offline Completed without error 00% 783 —

#15 Short offline Completed without error 00% 759 —

#16 Short offline Completed without error 00% 735 —

#17 Short offline Completed without error 00% 711 —

#18 Extended offline Completed without error 00% 687 —

#19 Short offline Completed without error 00% 664 —

#20 Short offline Completed without error 00% 663 —

#21 Short offline Completed without error 00% 633 —

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

255 0 65535 Read_scanning was never started

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

I have a known good backup.

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Mon Dec 18, 2017 12:15 pm Post subject:

Nicias, There are no underlying drive errors in dmesg nor in smartclt. If you know your backup is good, remake the filesystem and restore the backup. If the backup was made with those filesystem errors, you don’t know that its good, even if it seems to be. Try fsck but be warned that all it does it make the filesystem metadata self consistent. It may trash your user data in the process of fixing the metadata. That’s because in the face of missing or conflicting information, it guesses and it can guess incorrectly. All the bits that fsck doesn’t know what to do with end up in /lost+found, which should always be empty. You cannot fsck a mounted partition. _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Mon Dec 18, 2017 3:22 pm Post subject:

I have done fsck as recently as a couple of weeks ago. (from a sysrescuecd usb) This is just the system drive, it has no actual data on it, so I’m not worried about data loss. I’d clobber the whole thing and do a reinstall except for the time that would take. I’ll wipe the disk and reinstall from the last backup. Why would these errors keep popping up?

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Mon Dec 18, 2017 3:49 pm Post subject:

Nicias, Unclean shutdows, PSU problems of some sort. Maybe even RAM issues. Its worth a few cycles of memtest86. Be aware that memtest86 uses most of the rest of the system, so not all errors reported by memtest are due to RAM. _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Wed Dec 20, 2017 3:06 am Post subject:

Memtest ran for 24 hours and found no errors. fsck found a ton of errors on sda3. sdb1 had no errors. sda is an internal sata ssd. sdb1 is externally (usb) attached spinning rust. Any suggestions? Bad drive? Bad motherboard/controler?

Ant P.
Watchman

Joined: 18 Apr 2009
Posts: 6920

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Wed Dec 20, 2017 11:10 am Post subject:

Nicias,

Bad SSD firmware ?

Do you use trim/discard?

There are SSDs with problem firmware where trim can erase the wrong things.

There is one famous example where LBA 0 (the boot sector) would be trimmed, making the system impossible to boot.

— edit —

Hmm …

Code:

Device Model: Samsung SSD 840 Series

Lets just add that that device has some history
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-

those that do backups

those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Wed Dec 20, 2017 12:47 pm Post subject:

I don’t have trim or discard set. So it seems like maybe I should get a new drive :/

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Wed Dec 20, 2017 1:26 pm Post subject:

Nicias,

I wouldn’t go that far yet.

You have

Code:

Model Family: Samsung based SSDs

Device Model: Samsung SSD 840 Series

with

Code:

Firmware Version: DXT06B0Q

Is there a newer firmware?

What does it fix?

This tool may help. Its probably Windows only.

In increasing order of risk.

Its worth doing nothing, and see if the problems recur.

Its worth making a new filesystem and restoring from backup.

That will issue a trim command to the entire partition at the start of mke2fs.

If the backup is not known to be good, it may not help.

Reinstall after making a new filesystem.

Very last — update the drive firmware, if there is an update.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-

those that do backups

those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Wed Dec 20, 2017 8:31 pm Post subject:

The backup is file-level not file-system level. Doesn’t that mean that if it is an accurate copy of a bad file-system that it will just have files that are screwed up, not a corrupted file-system? So if I reformat the drive, recreate the file system and restore from backup then I might just have some bad files, not a bad file-system. In that case would doing a emerge -e world (and recompile the kernel) fix those files?

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Wed Dec 20, 2017 9:02 pm Post subject:

Nicias, That’s correct. You run the risk that something important like glibc is broken, so you won’t be able to boot, or something in the toolchain is broken, so you won’t be able to build packages. However, challenges like that can be fixed if they arise. You would have noticed both of those particular examples already though but you get the idea. Its possible that the restored backup will not work as expected. _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Wed Dec 20, 2017 9:56 pm Post subject:

Nicias, Good luck! _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Fri Dec 22, 2017 2:29 pm Post subject:

So far everything is running smoothly. Reformatted and reinstalled from backup, then rebuilt toolchain, kernel, and world. Now doing the gcc upgrade for PIE (and world rebuild) no fs errors yet. When this world rebuild is done I’ll reboot to a live usb to check for fs errors. In terms of trim/discard, it seems like best practice is to do that via a cron job. Is this correct?

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Fri Dec 22, 2017 5:58 pm Post subject:

Nicias, There are divided opinions on the use of trim/discard. Once you issue a trim command, your data can be removed by the drive at any time. There is generally no possibility of recovering data from trimmed space. If that might matter to you, run fstrim manually when you are sure you wont want anything back. Beware that some drives take a long time to become ready after a fstrim. I have one that takes over 10min. If they are online, that’s fine, if you reboot, you might get a fright as the drive seems to have failed. Personally, I have the discard option in /etc/fstab but only the installed system is on the SSD. /home is on rotating rust, so trim/discard does not apply. _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Nicias
Guru
Guru

Joined: 06 Dec 2005
Posts: 446

Posted: Fri Dec 22, 2017 8:20 pm Post subject:

NeddySeagoon, There is not data on the SSD here either, so I put a daily fstrim cron job. Thanks for all of your help. After it finished emerging, I restarted from a thumb drive. Checked the filesystems, no errors. Hopefully this fixes it. -Nick

NeddySeagoon
Administrator

Joined: 05 Jul 2003
Posts: 51961
Location: 56N 3W

Posted: Fri Dec 22, 2017 11:31 pm Post subject:

Nicias, We don’t know what happened, so cannot take any steps to stop it happening again. All you can do is to watch for the errors recurring. _________________ Regards, NeddySeagoon Computer users fall into two groups:- those that do backups those that have never had a hard drive fail.

Jaglover
Watchman

Joined: 29 May 2005
Posts: 8291
Location: Saint Amant, Acadiana

Joined: 06 Dec 2005
Posts: 446

Display posts from previous:

Источник

Question

Link to comment

Share on other sites

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Читайте также: