Linux mce hardware error

My system was working fine but after the changes in past two days I started getting this problem. ​ The Changes I made: I added a 2GB 800mgz RAM stick and replaced my old HDD with a new SSD two day...

My system was working fine but after the changes in past two days I started getting this problem.

The Changes I made: I added a 2GB 800mgz RAM stick and replaced my old HDD with a new SSD two days ago and installed Linux on it. And yesterday there was a kernel update in the update manager.

How problem started occurring: After the update and 3 or 4 reboots, when I was installing wine through terminal system freezed and I had to do hard reboot, then again when wine was installed I was installing Docky from Ubuntu 18.04’s repository it again freezed.

The third and fourth time it freezed right after booting up when I launched firefox.

Fifth time it happened again while editing a post on linuxmint forum.

After some googled suggestions I did a smartctl. So can you tell me weather it’s a system error or the SSD’s broken.

Here are smartctl‘s results in the given : link

I also did a check for bad sectors in the SSD on which the system is installed.

So are the problems due to SSD or the kernel or something else is wrong?

Here are it’s results:

$ sudo fdisk -l
[sudo] password for mihir:     
Disk /dev/sda: 223.58 GiB, 240057409536 bytes, 468862128 sectors
Disk model: WDC WDS240G2G0A-
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x662c6f2a

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1  *       2048   1050623   1048576   512M  b W95 FAT32
/dev/sda2       1052670 468860927 467808258 223.1G  5 Extended
/dev/sda5       1052672 468860927 467808256 223.1G 83 Linux


Disk /dev/sdb: 465.78 GiB, 500107862016 bytes, 976773168 sectors
Disk model: ST500LT012-9WS14
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xb540dc3c

Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1        2048 976771071 976769024 465.8G 83 Linux

$ sudo badblocks -v /dev/sda1 > badsectors.txt
Checking blocks 0 to 524287
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

$ sudo badblocks -v /dev/sda2 > badsectors.txt
Checking blocks 0 to 0
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

sudo badblocks -v /dev/sda5 > badsectors.txt
Checking blocks 0 to 233904127
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)


2

2

Ночи доброй ЛОР. Писал я код в Android Studio, все как обычно, но тут вывалился ABRT с сообщением «A problem occurred in kernel package», открыл в самом ABRT, внизу подпись

The kernel log indicates that hardware errors were detected.
This is most likely not a software problem.

Это меня насторожило, только полез в гугл, как вдруг ABRT репортит еще об одной ошибке, такой же, причем у обоих статус Can’t be reported.
Заглянул в dmesg, последние строки там такие

[28201.933132] kvm [12089]: vcpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
[30362.987032] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
[30722.332510] mce: [Hardware Error]: Machine check events logged
[30782.289185] mce: [Hardware Error]: Machine check events logged

Соб-но у меня вопросы такие: Что это было? Какой на самом деле корень проблемы и куда копать? Паранойя потому, что это новая система, собранная пару недель назад, ОЗУ проверял, ошибок нет, S.M.A.R.T. ссд накопителя смотрел, тоже ничего подозрительного. Помоги разобраться, ЛОР.

UPD: Нашел лог MCE, но что значит?

Apr 20 22:07:01 workstation.localdomain mcelog[750]: Hardware event. This is not a software error.
Apr 20 22:07:01 workstation.localdomain mcelog[750]: MCE 0
Apr 20 22:07:01 workstation.localdomain mcelog[750]: CPU 3 BANK 0
Apr 20 22:07:01 workstation.localdomain mcelog[750]: TIME 1429549621 Mon Apr 20 22:07:01 2015
Apr 20 22:07:01 workstation.localdomain mcelog[750]: MCG status:
Apr 20 22:07:01 workstation.localdomain mcelog[750]: MCi status:
Apr 20 22:07:01 workstation.localdomain mcelog[750]: Corrected error
Apr 20 22:07:01 workstation.localdomain mcelog[750]: Error enabled
Apr 20 22:07:01 workstation.localdomain mcelog[750]: MCA: Internal parity error
Apr 20 22:07:01 workstation.localdomain mcelog[750]: STATUS 90000040000f0005 MCGSTATUS 0
Apr 20 22:07:01 workstation.localdomain mcelog[750]: MCGCAP c09 APICID 6 SOCKETID 0
Apr 20 22:07:01 workstation.localdomain mcelog[750]: CPUID Vendor Intel Family 6 Model 60
Apr 20 22:08:01 workstation.localdomain mcelog[750]: Hardware event. This is not a software error.
Apr 20 22:08:01 workstation.localdomain mcelog[750]: MCE 0
Apr 20 22:08:01 workstation.localdomain mcelog[750]: CPU 0 BANK 0
Apr 20 22:08:01 workstation.localdomain mcelog[750]: TIME 1429549681 Mon Apr 20 22:08:01 2015
Apr 20 22:08:01 workstation.localdomain mcelog[750]: MCG status:
Apr 20 22:08:01 workstation.localdomain mcelog[750]: MCi status:
Apr 20 22:08:01 workstation.localdomain mcelog[750]: Corrected error
Apr 20 22:08:01 workstation.localdomain mcelog[750]: Error enabled
Apr 20 22:08:01 workstation.localdomain mcelog[750]: MCA: Internal parity error
Apr 20 22:08:01 workstation.localdomain mcelog[750]: STATUS 90000040000f0005 MCGSTATUS 0
Apr 20 22:08:01 workstation.localdomain mcelog[750]: MCGCAP c09 APICID 0 SOCKETID 0
Apr 20 22:08:01 workstation.localdomain mcelog[750]: CPUID Vendor Intel Family 6 Model 60

UPD2:
Нагуглил вот что: http://unix.stackexchange.com/questions/165222/mce-error-mca-internal-parity-…
Там внизу пишут про связь KVM, 32 бит и этой ошибки.
Эмулятор ведра как раз 32 битный, но пока не уверен до конца, из-за этого ли.

Перемещено JB из talks

Недавно мы рассмотрели ODROID-H2 с Ubuntu 19.04 и заметили некоторые сообщения об ошибках в журнале ядра одноплатного компьютера Intel Celeron J4105 при выполнении теста SBC-Bench: 

[180422.405294] mce: [Hardware Error]: Machine check events logged

[180425.656449] mce: [Hardware Error]: Machine check events logged

[180483.582825] mce_notify_irq: 17 callbacks suppressed

[180483.582827] mce: [Hardware Error]: Machine check events logged

[180484.991484] mce: [Hardware Error]: Machine check events logged

[180594.700684] mce_notify_irq: 13 callbacks suppressed

[180594.700686] mce: [Hardware Error]: Machine check events logged

[180858.202115] mce: [Hardware Error]: Machine check events logged

[181178.047031] mce: [Hardware Error]: Machine check events logged

Не понятно, что делать с этими ошибками, но нам сказали, что мы получим более подробную информацию с помощью mcelog, который можно установить следующим образом: 

Есть только одна маленькая проблема: его нет в репозитории Ubuntu 19.04, а в отчете об ошибке упоминается, что mcelog устарел, и его необходимо удалить из Ubuntu 18.04 Bionic. Вместо этого нам сообщают, что функциональность пакета mcelog была заменена на rasdaemon.

Но, прежде чем изучать утилиты, давайте выясним, что такое Machine Check Exception (MCE) из ArchLinux Wiki:

Machine Check Exception  (MCE) — это ошибка, генерируемая ЦП, когда ЦП обнаруживает, что произошла аппаратная ошибка или сбой.

Machine Check Exception (MCE) могут возникать по разным причинам: от нежелательных или не соответствующих напряжений от источника питания, от космического излучения, изменяющего биты в модулях памяти DIMM или ЦП, или от других различных сбоев, включая сбой программного обеспечения вызывающий аппаратные ошибки.

Аппаратную ошибку, вероятно, следует воспринимать всерьез. Давайте рассмотрим, как запустить инструменты. Сначала попытаемся установить mcelog из Ubuntu 16.04: 

wget  http : / / archive .ubuntu .com / ubuntu / pool / universe / m / mcelog / mcelog_128 + dfsg 1_amd64.deb

sudo  dpkg   i   mcelog_128 + dfsg 1_amd64.deb

Отлично!  Запустим несколько команд: 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

sudo mcelog

[sudo] password for odroid:

mcelog: Family 6 Model 7a CPU: only decoding architectural errors

mcelog: warning: 32 bytes ignored in each record

mcelog: consider an update

odroid@ODROIDH2:~$ sudo mcelog client

Memory errors

SOCKET 1 CHANNEL 5 DIMM 0

DMI_NAME «A1_DIMM0» DMI_LOCATION «A1_BANK0»

corrected memory errors:

0 total

0 in 24h

uncorrected memory errors:

0 total

0 in 24h

SOCKET 1 CHANNEL 5 DIMM 1

DMI_NAME «A1_DIMM1» DMI_LOCATION «A1_BANK1»

corrected memory errors:

0 total

0 in 24h

uncorrected memory errors:

0 total

0 in 24h

Ничего интересного,  файл /var/log/mcelog запущен, и мы можем увидеть подробности об ошибках:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

cat  /var/log/mcelog

mcelog: Family 6 Model 7a CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 1 TSC bd2ee6710

TIME 1563095601 Sun Jul 14 16:13:21 2019

MCG status:

MCi status:

Corrected error

Error enabled

Threshold based error status: green

MCA: corrected filtering (some unreported errors in same region)

Generic CACHE Level2 Generic Error

STATUS 902000460082110a MCGSTATUS 0

MCGCAP c07 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 122

...

Попробуем также рекомендуемый rasdaemon, чтобы увидеть, сможем ли мы получить аналогичные детали.

Установка: 

sudo apt install rasdaemon

Похоже, что служба не запустится автоматически после установки, может потребоваться перезагрузка или просто выполнить следующую команду: 

Выполняем несколько команд, и вначале нам показалось, что может понадобиться какой-то драйвер: 

rasmcctl mainboard

rasmcctl: mainboard: HARDKERNEL model ODROIDH2

sudo rasmcctl status

rasmcctl: drivers not loaded.

Это должно быть связано с драйверами EDAC, которые используются для памяти ECC в соответствии с соединением в Grokbase. Процессоры Gemini Lake не поддерживают память ECC, поэтому, вероятно, она нам не понадобится.

Запустим еще одну команду, чтобы показать сводку ошибок, и получаем: 

sudo rasmcctl summary

No Memory errors.

No PCIe AER errors.

No Extlog errors.

MCE records summary:

12 corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error errors

12-ая исправленная ошибка, связанная с кешем L2. Мы можем получить полную информацию с помощью соответствующей команды: 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

sudo rasmcctl errors

No Memory errors.

No PCIe AER errors.

No Extlog errors.

MCE events:

1 20190715 20:41:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x243e9f840, tsc=0x8b99a7f84108, walltime=0x5d2c8276, cpuid=0x000706a1, bank=0x00000001

2 20190716 01:34:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x24b9df840, tsc=0xa38afb430944, walltime=0x5d2cc722, cpuid=0x000706a1, bank=0x00000001

3 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d95741ee28, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

4 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957436320, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

5 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957451d82, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

6 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957456482, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

7 20190716 03:20:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000400082110a, tsc=0xac3468f91976, walltime=0x5d2cdffa, cpuid=0x000706a1, bank=0x00000001

8 20190716 03:20:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000400082110a, tsc=0xac3468fb7a3a, walltime=0x5d2cdffa, cpuid=0x000706a1, bank=0x00000001

9 20190716 15:08:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000460082110a, tsc=0xe60f3181c782, walltime=0x5d2d85ea, cpuid=0x000706a1, bank=0x00000001

10 20190716 15:08:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000460082110a, tsc=0xe60f31852002, walltime=0x5d2d85ea, cpuid=0x000706a1, bank=0x00000001

11 20190717 02:52:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x249c5f840, tsc=0x11f964ae442b2, walltime=0x5d2e2aea, cpuid=0x000706a1, bank=0x00000001

12 20190717 15:24:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000440082110a, tsc=0x15d0984e5de54, walltime=0x5d2edb2a, cpuid=0x000706a1, bank=0x00000001

Статус зеленый, что означает, что все по-прежнему работает, но утилита сообщает о «большом количестве исправленных ошибок кэша» и «система работает, но может вскоре привести к неисправленным ошибкам» (см.Исходный код). Это происходит несколько раз в день, и не понятно, что можно сделать с кешем, поскольку его нельзя изменить, так как он встроен в процессор, возможно, это просто проблема с процессором, который мы используем. Если у кого-то работает ODROID-H2, может быть полезно проверить журнал ядра с помощью dmesg, чтобы увидеть, есть ли у вас такие же ошибки. Если да, укажите также, у вас плата из первой партии (ноябрь 2018 г.) или одна из новых плат ODROID-H2 Rev B.

Выражаем свою благодарность источнику из которого взята и переведена статья, сайту cnx-software.com.

Оригинал статьи вы можете прочитать здесь.

Machine Check Exception Error Linux

I recently reviewed ODROID-H2 with Ubuntu 19.04, and noticed some errors messages in the kernel log of the Intel Celeron J4105 single board computer while running SBC-Bench benchmark:

[180422.405294] mce: [Hardware Error]: Machine check events logged

[180425.656449] mce: [Hardware Error]: Machine check events logged

[180483.582825] mce_notify_irq: 17 callbacks suppressed

[180483.582827] mce: [Hardware Error]: Machine check events logged

[180484.991484] mce: [Hardware Error]: Machine check events logged

[180594.700684] mce_notify_irq: 13 callbacks suppressed

[180594.700686] mce: [Hardware Error]: Machine check events logged

[180858.202115] mce: [Hardware Error]: Machine check events logged

[181178.047031] mce: [Hardware Error]: Machine check events logged

I did not know what do make of those errors, but I was told I would get more details with mcelog which can be installed as follows:

There’s just one little problem: it’s not in Ubuntu 19.04 repository, and a bug report mentions mcelog is not deprecated, and remove from Ubuntu 18.04 Bionic onwards. Instead, we’re being told the mcelog package functionality has been replaced by rasdaemon.

But before looking into the utilities, let’s find out what Machine Check Exception (MCE) is all about from ArchLinux Wiki:

A machine check exception (MCE) is an error generated by the CPU when the CPU detects that a hardware error or failure has occurred.

Machine check exceptions (MCEs) can occur for a variety of reasons ranging from undesired or out-of-spec voltages from the power supply, from cosmic radiation flipping bits in memory DIMMs or the CPU, or from other miscellaneous faults, including faulty software triggering hardware errors.

Hardware error should probably be taken seriously. Let’s investigate how to run the tools. First, I try to install mcelog from Ubuntu 16.04:

wget http://archive.ubuntu.com/ubuntu/pool/universe/m/mcelog/mcelog_128+dfsg1_amd64.deb

sudo dpkg i mcelog_128+dfsg1_amd64.deb

Oh good! It could install… Let’s run some commands:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

sudo mcelog

[sudo] password for odroid:

mcelog: Family 6 Model 7a CPU: only decoding architectural errors

mcelog: warning: 32 bytes ignored in each record

mcelog: consider an update

odroid@ODROIDH2:~$ sudo mcelog client

Memory errors

SOCKET 1 CHANNEL 5 DIMM 0

DMI_NAME «A1_DIMM0» DMI_LOCATION «A1_BANK0»

corrected memory errors:

0 total

0 in 24h

uncorrected memory errors:

0 total

0 in 24h

SOCKET 1 CHANNEL 5 DIMM 1

DMI_NAME «A1_DIMM1» DMI_LOCATION «A1_BANK1»

corrected memory errors:

0 total

0 in 24h

uncorrected memory errors:

0 total

0 in 24h

Nothing interesting shows up here, but the file /var/log/mcelog is now up, and we can see details about the errors:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

cat  /var/log/mcelog

mcelog: Family 6 Model 7a CPU: only decoding architectural errors

Hardware event. This is not a software error.

MCE 0

CPU 0 BANK 1 TSC bd2ee6710

TIME 1563095601 Sun Jul 14 16:13:21 2019

MCG status:

MCi status:

Corrected error

Error enabled

Threshold based error status: green

MCA: corrected filtering (some unreported errors in same region)

Generic CACHE Level2 Generic Error

STATUS 902000460082110a MCGSTATUS 0

MCGCAP c07 APICID 0 SOCKETID 0

CPUID Vendor Intel Family 6 Model 122

...

But let’s also try the recommended rasdaemon to see if we can get similar details.

Installation:

sudo apt install rasdaemon

It looks like the service will not start automatically upon installation, so a reboot may be needed, or simply run the following command:

I ran a few commands and at first, it looked like some driver may be needed:

rasmcctl mainboard

rasmcctl: mainboard: HARDKERNEL model ODROIDH2

sudo rasmcctl status

rasmcctl: drivers not loaded.

This should be related to EDAC drivers that are used for ECC memory according to a thread on Grokbase. Gemini Lake processors do not support ECC memory, so I probably don’t need it.

Running one more command to show the summary of errors, and we’re getting somewhere:

sudo rasmcctl summary

No Memory errors.

No PCIe AER errors.

No Extlog errors.

MCE records summary:

12 corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error errors

12 corrected error related to the L2 cache. We can get the full details with the appropriate command:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

sudo rasmcctl errors

No Memory errors.

No PCIe AER errors.

No Extlog errors.

MCE events:

1 20190715 20:41:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x243e9f840, tsc=0x8b99a7f84108, walltime=0x5d2c8276, cpuid=0x000706a1, bank=0x00000001

2 20190716 01:34:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x24b9df840, tsc=0xa38afb430944, walltime=0x5d2cc722, cpuid=0x000706a1, bank=0x00000001

3 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d95741ee28, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

4 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957436320, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

5 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957451d82, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

6 20190716 01:50:08 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000420082110a, tsc=0xa4d957456482, walltime=0x5d2ccae1, cpuid=0x000706a1, bank=0x00000001

7 20190716 03:20:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000400082110a, tsc=0xac3468f91976, walltime=0x5d2cdffa, cpuid=0x000706a1, bank=0x00000001

8 20190716 03:20:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000400082110a, tsc=0xac3468fb7a3a, walltime=0x5d2cdffa, cpuid=0x000706a1, bank=0x00000001

9 20190716 15:08:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000460082110a, tsc=0xe60f3181c782, walltime=0x5d2d85ea, cpuid=0x000706a1, bank=0x00000001

10 20190716 15:08:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000460082110a, tsc=0xe60f31852002, walltime=0x5d2d85ea, cpuid=0x000706a1, bank=0x00000001

11 20190717 02:52:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x942000460082110a, addr=0x249c5f840, tsc=0x11f964ae442b2, walltime=0x5d2e2aea, cpuid=0x000706a1, bank=0x00000001

12 20190717 15:24:09 +0700 error: corrected filtering (some unreported errors in same region) Generic CACHE Level2 Generic Error, mcg mcgstatus=0, mci Corrected_error Error_enabled Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c07, status=0x902000440082110a, tsc=0x15d0984e5de54, walltime=0x5d2edb2a, cpuid=0x000706a1, bank=0x00000001

The status is green which means everything still works, but the utility reports a “large number of corrected cache errors”, and the “system (is) operating, but might lead to uncorrected errors soon” (See source code). It happens only a few times a day, and I’m not sure what can be done about the cache since it’s not something that can be changed as it’s embedded into the processor, maybe it’s just an issue with the processor I’m running. If somebody has an ODROID-H2 running, it may be useful to check out the kernel log with dmesg to see if you’ve got the same errors. If you do, please also indicate whether you have a board from the first batch (November 2018) or one of the new ODROID-H2 Rev B boards.

jean-luc aufranc cnxsoft

Jean-Luc started CNX Software in 2010 as a part-time endeavor, before quitting his job as a software engineering manager, and starting to write daily news, and reviews full time later in 2011.

Support CNX Software! Donate via cryptocurrencies or become a Patron on Patreon

ROCK Pi 4C Plus

My system was working fine but after the changes in past two days I started getting this problem.

The Changes I made: I added a 2GB 800mgz RAM stick and replaced my old HDD with a new SSD two days ago and installed Linux on it. And yesterday there was a kernel update in the update manager.

How problem started occurring: After the update and 3 or 4 reboots, when I was installing wine through terminal system freezed and I had to do hard reboot, then again when wine was installed I was installing Docky from Ubuntu 18.04’s repository it again freezed.

The third and fourth time it freezed right after booting up when I launched firefox.

Fifth time it happened again while editing a post on linuxmint forum.

After some googled suggestions I did a smartctl. So can you tell me weather it’s a system error or the SSD’s broken.

Here are smartctl‘s results in the given : link

I also did a check for bad sectors in the SSD on which the system is installed.

So are the problems due to SSD or the kernel or something else is wrong?

Here are it’s results:

$ sudo fdisk -l
[sudo] password for mihir:     
Disk /dev/sda: 223.58 GiB, 240057409536 bytes, 468862128 sectors
Disk model: WDC WDS240G2G0A-
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x662c6f2a

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1  *       2048   1050623   1048576   512M  b W95 FAT32
/dev/sda2       1052670 468860927 467808258 223.1G  5 Extended
/dev/sda5       1052672 468860927 467808256 223.1G 83 Linux


Disk /dev/sdb: 465.78 GiB, 500107862016 bytes, 976773168 sectors
Disk model: ST500LT012-9WS14
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xb540dc3c

Device     Boot Start       End   Sectors   Size Id Type
/dev/sdb1        2048 976771071 976769024 465.8G 83 Linux

$ sudo badblocks -v /dev/sda1 > badsectors.txt
Checking blocks 0 to 524287
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

$ sudo badblocks -v /dev/sda2 > badsectors.txt
Checking blocks 0 to 0
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

sudo badblocks -v /dev/sda5 > badsectors.txt
Checking blocks 0 to 233904127
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

Понравилась статья? Поделить с друзьями:

Читайте также:

  • Linux ip addr del failed external program exited with error status 2
  • Linux grub error no such device
  • Linux find error log
  • Linux find error code
  • Linux fdisk input output error

  • 0 0 голоса
    Рейтинг статьи
    Подписаться
    Уведомить о
    guest

    0 комментариев
    Старые
    Новые Популярные
    Межтекстовые Отзывы
    Посмотреть все комментарии