Mce hardware error cpu 0 machine check 0 bank - Исправление ошибок и поиск оптимальных решений проблем

Index
» Kernel & Hardware
» mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Pages: 1

#1 2021-05-09 22:13:41

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I just built a new machine this weekend: an Intel i3-10320 CPU on a MSI MAG B560 Torpedo motherboard. I’ve never built a machine before. After building it, I immediately updated the BIOS. So, here is the problem: the only way I can get Arch Linux to boot, whether from a live USB or as it is currently installed on my new machine, is by adding nomodeset to kernel boot line. Per dmesg | grep Error, I am getting the following errors at boot:

[ 0.107865] mce: [Hardware Error]: Machine check events logged
[ 0.107866] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110a
[ 0.107869] mce: [Hardware Error]: TSC 0 ADDR fef20080 MISC 3880000086
[ 0.107872] mce: [Hardware Error]: PROCESSOR 0:a0653 TIME 1620583348 SOCKET 0 APIC 0 microcode e0
[ 0.794166] ACPI BIOS Error (bug): Could not resolve symbol [_SB.PC00.PGON.PBGE], AE_NOT_FOUND (20210105/psargs-330)
[ 0.794174] ACPI Error: Aborting method _SB.PC00.PGON due to previous error (AE_NOT_FOUND) (20210105/psparse-529)
[ 0.794178] ACPI Error: Aborting method _SB.PC00.PEG1.PG01._ON due to previous error (AE_NOT_FOUND) (20210105/psparse-529)
[ 1.012493] RAS: Correctable Errors collector initialized.

I’m particularly concerned about the hardware errors. From what I’ve been able to gather so far (and I’ve done a fair bit research over the past few days — though I should disclaim that I am a total amateur), I may actually have two issues. And I’m not sure if they are related. The hardware errors may be due to a bad CPU or mobo socket, or it may be a firmware or microcode issue. The ACPI errors may, or may not, be related, but are probably firmware or microcode driven. I’ve updated the BIOS and added the most recent intel-ucode to my kernel boot line as well. So I think I am current. It seems a number of people are having the ACPI issues at the moment. Does anyone have any insights?

#2 2021-05-09 22:22:07

graysky: Wiki Maintainer; From: :wq; Registered: 2008-12-01; Posts: 10,472; Website

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Are you overclocking this machine or undervolting it? It’s been years since I played with overlocking and undervolting but the error sparks a memory, https://wiki.archlinux.org/title/Stress … ing_Errors

#3 2021-05-09 22:53:53

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Not overclocking. I don’t think I’m undervolting, but I will double-check cpu power hookups to the psu just to make sure that is as it should be.

#4 2021-05-10 06:12:38

orlfman: Member; Registered: 2007-11-20; Posts: 121

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

as far i’m aware the i3-10320 is locked, so really no overclocking capability. outside the motherboard messing with power limits. which is common with intel boards. in the bios i would make sure power limits are truly intel stock and not messed with. tdp for the 10320 is 65 watts for the p1, p2 should be 90 watts. tau of 28 seconds.

googling i found this with a similar mce error to yours: https://community.intel.com/t5/Graphics … d-p/711594
but good ol’ intel doesn’t appear to offer any help as «linux isn’t validated by intel.»

there could be something wrong with your cpu. if you can, i would test with windows first. windows unfortunately has the better monitoring / stability testing / benchmarking tools. i’m curious to see if windows picks up any whea errors with event viewer in the system pane.

Last edited by orlfman (2021-05-10 06:13:51)

#5 2021-05-10 07:59:20

d_fajardo: Member; Registered: 2017-07-28; Posts: 1,435

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

You could also run MemTest86 to check the CPU memory registers and caches for error.

#6 2021-05-10 17:40:37

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Thanks everyone for the suggestions. I have verified the hardware is hooked up as it should be. As graysky suggested, I tried to find the Fedora version of Intel Processor Diagnostic Tool (IPDT), but all the links to the tool on the Internet appear to be broken. I did a fair bit of searching for the tool, but only hit dead ends. Per orlfman, I probably will load Windows and then get IPDT for Windows just to see if that will narrow down any issues. In the mean time, I will verify the power limits in the BIOS tonight. May try to run stress and MemTest86 too.

#7 2021-05-10 20:28:24

Ropid: Member; Registered: 2015-03-09; Posts: 1,068

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

That MCE hardware event in your log happened early at boot. Do you also get MCE log entries later while you are using the computer?

If it’s only happening at boot, you perhaps shouldn’t worry too much about it. I remember seeing other people with those kind of mysterious MCE events that only happen at boot but don’t happen later, their computer ran fine otherwise.

The log you shared is the output of ‘dmesg’? Those entries should also be in systemd’s journal. You can then search for old entries from previous boots in the «journalctl» output.

The «nomodeset» issue should be something else. You didn’t mention what graphics card you are using.

#8 2021-05-10 23:55:21

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Yes, it happens early each time I boot, and only then. I’ve run journalctl | grep Error and confirmed it is always the same series of messages early in boot, every boot. I’m not using a graphics card; I’m relying on the cpu’s built in graphics. FYI… I’ve built this machine to serve as headless home NAS, and plan to leave it on 24/7.

I was just in BIOS and one anomaly sticks out in bold text below:

VOLTAGE
CPU Core: 0.970V
CPU IO: 0.956V
CPU IO2: 65.535V
CPU SA: 1.054V
System 3.3V: 3.360V
System 12V: 12.120V
DRAM: 1.204V

I don’t even know what CPU IO2 is, but 65V seems kinda high for something on the cpu. I’m guessing it is a un/dis-connected sensor.

#9 2021-05-11 06:19:17

seth: Member; Registered: 2012-09-03; Posts: 35,316

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

The error is (very most likely) from the last boot.
Do you restart cleanly, does the reboot process hang w/ some errors during the shutdown and is there a difference in MCE messages between a cold and a warm reboot?

Ceterum censeo and since it was mentioned: 3rd link in my signature.

#10 2021-05-15 01:18:43

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

The error is the same (except for the timestamp) regardless of whether it is a cold or warm boot. No errors during shutdown.

The machine seems to ***mostly run*** despite the CPU and ACPI errors listed above, but of course I have to boot with nomodeset to use a monitor. I’ve tried disabling fastboot: no effect. I’ve reinstalled Arch from scratch. I’ve replace Arch with Ubuntu, then replace Ubuntu with Linux Mint. I get the exact same messages all the time at the same place during boot and have to use nomodeset in all cases. I’m in the process of loading Windows 10 to see what happens.

#11 2021-05-15 02:51:11

Ropid: Member; Registered: 2015-03-09; Posts: 1,068

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I stumbled onto something interesting in the kernel documentation yesterday that fits with the CPU problem here:

First, I saw this here in kernel-parameters.txt:

        mce=option      [X86-64] See Documentation/x86/x86_64/boot-options.rst

Then next looking through the file boot-options.rst, one of the options it describes is this:

   mce=bootlog
                Enable logging of machine checks left over from booting.
                Disabled by default on AMD Fam10h and older because some BIOS
                leave bogus ones.
                If your BIOS doesn't do that it's a good idea to enable though
                to make sure you log even machine check events that result
                in a reboot. On Intel systems it is enabled by default.
   mce=nobootlog
                Disable boot machine check logging.

You could try this «mce=nobootlog» kernel command line parameter and see what happens. If it hides the MCE event messages in dmesg and journalctl, this should then mean that they were events from before the Linux kernel was loaded. They were then events from early at boot when the UEFI was still in control of the machine.

If this «mce=nobootlog» works, I would then not worry about this anymore. The text in the documentation mentions there’s machines that always create those MCE events at boot. I guess your machine is then one of those and there’s nothing to do about it.

#12 2021-05-15 13:36:14

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Yes, adding mce=nobootlog to the as a kernel boot parameter did suppress the MCE hardware errors. The other errors remain, as expected. I have one quick question: Why would suppressing these errors via the kernel boot parameters indicate the events were driven when the UEFI was still in control of the machine, and not the kernel?

#13 2021-05-15 14:34:27

seth: Member; Registered: 2012-09-03; Posts: 35,316

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

https://en.wikipedia.org/wiki/Machine-check_exception
It’s either that or the error is carried over from the last boot.
If there were no issues w/ the shutdown (you shut down the system cleanly, rather than it somehow powered off out of nowhere) and the errors are reproducible (always the same), they’re detected at boot.

Whether they’re bogus or a genuine error can’t be told, but google finds that exact error at the exact address and bank quite some times…

#14 2021-05-15 15:12:47

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

As far as I can determine, the machine runs fine despite the MCE error. I will proceed as if they are harmless. I have other issues (e.g. graphics problems) that I need to troubleshoot too, but those are topics for another thread.

#15 2021-07-26 12:01:39

zeronullity: Member; Registered: 2021-07-26; Posts: 3

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I’m dealing with this same exact issue and trust me when I say it’s not «fine» or a bug.. and has a high-risk of leading to more serious issues down the road. This error is typically going to be a bad memory controller/cache on your cpu 75% of the time, bad chipset on your motherboard, bent pins on your cpu socket, or a bad trace on your motherboard.. although on rare occasions other hardware that shorts out the motherboard such as a bad power supply, pci card, or bad caps on the motherboard can cause it also, however unless it’s directly linked to the CPU it’s extremely rare.). I’ve decided it’s time for a upgrade and to error on the side of caution and just replace my motherboard, cpus, memory, power supply, etc.. This is the second identical motherboard and cpu set I’ve had with memory / CPU related issues within a few years.. I see no point other then trying to penny pitch $$$ to cause myself headaches for the future.. I can always pinpoint the issue in the future if I have nothing to do and use the hardware for less important things but as my main system it simply will not do. Also bad hardware like this can cause a daisy chain of failures.. bad cpu causes bad motherboard you replace motherboard.. cpu causes bad motherboard again.. you replace bad cpu motherboard causes cpu to go bad again.. extremely rare but I’ve seen it happen in a controlled engineering environment with other hardware.

At this point in time with the same error I can’t even put my root device in rw the kernel forbids it without forcing it’s hand even with a clean fsck mount won’t work. No other hardware errors show. No issues with my RAID drives. Removing all memory sticks from CPU1 bank 1, 2 & 3 and not CPU0 fixes the error (I say fixes, but it’s not a fix, it just «hides» the error from the kernel but it will still cause data corruption) however trying another memory stick from CPU0 bank fails every time.. I tried 15 «known good» memory sticks with the same results. Also I can run CPU/memory burn test and «most» basic test will pass without issue.

Also I wanted to say that the forum sign up on this server with the date/uname/hash should be changed in my opinion.. for people like me that are having hardware issues without access to another Linux system. I actually had to stop and think.. and notice that the security question changed every time.. from hash 256 to 512.. different date %V %J outputs.. and make sure my time is in par with the servers with epoch or day of the month format.. it really just wasted my time which is something I really hate. Forums are usually for people who need help, which can include date/time/rtc/hardware/kernel issues with Linux systems. Only elitist n00bs who think they are supreme/clever/elite and better than other people would use that type of captcha.. but that’s only my opinion. Hunting down a online sha tool.. and trying multiple «near correct dates» hashes until I get the right one takes more time then most of the hardest captachas.. and that’s from someone who is very experienced with Unix/Linux systems.

It’s kind of like Arch forum admins are telling me : «Well, if you don’t have current access to a Linux system with the correct date set your not getting into our forums without some work and wasted time finding a proper hash that works» It just leaves a bad taste in my mouth.. and I’ve been using Unix/Linux systems for over 27 years and that’s my point of view.

Last edited by zeronullity (2021-07-26 12:25:01)

#16 2021-07-26 12:23:56

seth: Member; Registered: 2012-09-03; Posts: 35,316

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I’m dealing with this same exact issue

«Bank 6: ee2000000040110a» and *only* «Bank 6: ee2000000040110a»?

For the rest of your post, please see https://gitlab.archlinux.org/archlinux/ … opicsrants

#17 2021-07-26 12:49:53

zeronullity: Member; Registered: 2021-07-26; Posts: 3

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Yes Seth, the same error, different address range, and yes it will effect each system differently based on the hardware in use and ranges effected.. It’s a very serious error even if you don’t have issues currently.. it’s true you may never have an issue but that doesn’t mean the hardware doesn’t have a real physical failure.. How it effects your system depends on many factors. It’s like me telling you my shirt is ripped, I’m going to throw it away. And you asking «but is it ripped where it can be seen?» It doesn’t matter, it’s ripped, I’m throwing it or giving it away.. because it’s most likely only going to get worse.

I’m sorry for breaking the forum rules Seth, I just call it how I see it when things seem to be blatantly wrong from my point of view. It was more or less mean’t as a teaching moment to share common human kindness & hospitality instead of turning a way new Linux users because they are not smart enough to answer the question or don’t have easy access to checksum tools at the moment. And I was more harsh than I normally am because I saw other forum posts with members/admins boasting about this very thing and it irked me quite a bit.. to be careless/thoughtless about «new members».

Last edited by zeronullity (2021-07-26 13:35:09)

#18 2021-07-26 14:55:04

seth: Member; Registered: 2012-09-03; Posts: 35,316

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I don’t have that error but it is all over the internet with this *specific* address (which is a bit specific and probably does matter and it also will likely matter whether the address is varying).
And whether there more and different MCEs along it.
There’re 10 hits for this forum alone (most of which do actually not concern that error but are just random dmesg posts)
Apparently it was introduced w/ linux 4.10, https://bbs.archlinux.org/viewtopic.php … 1#p1698801

I’m not saying it’s harmless for sure, but having an itiching toe right before your house burned down doesn’t mean that the house burned down because of your itching toe…

#19 2021-07-26 17:40:35

zeronullity: Member; Registered: 2021-07-26; Posts: 3

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I could go into great detail explaining what the addresses mean.. and how they would effect certain situations if they are on the same hardware in various scenarios or even the same error on different hardware.. Which is like having a unique hardware fingerprint.

https://en.wikipedia.org/wiki/Machine-check_exception

seth: «I’m not saying it’s harmless for sure, but having an itiching toe right before your house burned down doesn’t mean that the house burned down because of your itching toe…»

First of all I personally wouldn’t use that comparison. Even with the «rare» instances where it’s a Firmware/Kernel/Driver bug you can have the really bad luck of having both.. where it’s both a firmware/kernel bug but you have a hardware issue too. I’m not suggesting that the OP throw his system away.. I would suggest using «known good» parts to rule out the hardware.. If it’s NOT the hardware then swapping out the hardware with a IDENTICAL part# with a IDENTICAL firmware version will make no difference to the error. I definitely wouldn’t take the approach of oh well a few other people had this problem they say it’s nothing and go on. I take the old school approach which usually never fails, not too many ways to reinvent a perfect round-circle.. A square or oblong tire probably wouldn’t be very good to drive on but.. Hey, if you have no other choice, why not right? What’s the worst that could happen?

#20 2021-07-28 16:22:48

seth: Member; Registered: 2012-09-03; Posts: 35,316

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

I could go into great detail explaining

Don’t let me stop you…

MCE is good to analyze HW errors after the fact and preserve errors across boots.
But if the very same error with the very same addresses occurs for many people and always during the boot when the cores are initialized and without any perceivable issues, chances are it’s bogus and not a thermal issue or decay (where you’d expect more randomness)

Alternatively the OP is really active across the web
(Though there’re also many dell systems affected where idk the used chipset)

Here’s someone who got them when switching to the exact same chipset, https://forums.unraid.net/topic/103883- … rd-on-691/

#21 2021-08-11 20:10:09

johnny.honu: Member; Registered: 2021-04-18; Posts: 17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Update on my original post… I eventually got a better Mobo and CPU. That resolved the issue. I did try swapping out the original Mobo with a new, identical Mobo and got the same error. In the end, I discovered the Mobo to be very quirky in a few ways ways, and I don’t think it was entirely compatible with the CPU (or the GPU built into the CPU). The Mobo was really built for Intel 11th gen, and I was using Intel 10th gen. It should have been backward compatible, but just wasn’t.

#22 2021-10-26 03:58:33

kaos77: Member; Registered: 2013-04-26; Posts: 9

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

Hopefully this thread isn’t too old to bump up again, but I ran across this thread for the exact same error. Considering the sheer number of hits around this error and addresses, I’m going to let it ride. Brand new build with quality parts. Burned in the system pretty hard when I first saw the errors. Ran stress tests 24 hours running all 20 vcores pretty solid. Ran into absolutely no issues at all. I’ve run Windows and Linux on this hardware with no issues, other than a few video issues due to lower quality HDMI cables. The errors bother me because they’re there, but knowing so many other people have them also is certainly comforting in a twisted kind of way.

Same CPU gen though, 10900 (non-k) on a Gigabyte Z590 Aorus Master. 64G of Gigabyte listed RAM. CPU runs at 30*C or below at standard load.

Источник

Found my NUC with Proxmox installed in unresponsive state today (first time ever after 2 weeks of use).

On reboot see these errors:
mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: xxx
mce: [Hardware Error]: TSC 0 ADDR fef1ce80 MISC xxx
mce: [Hardware Error]: PROCESSOR 0:a0660 TIME xxx SOCKET 0
APIC 0 microcode ca
(see attached pic — https://i.imgur.com/LYsQyyN.png)

The box booted and seems normal so far but see those errors on boot
Quick memory test did not show any problems so far.

rasdaemon -f, journalctl -f show no obvious problems.

==========================
root@pve:~# numactl —hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 0 size: 64036 MB
node 0 free: 42377 MB
node distances:
node 0
0: 10
(reverse-i-search)`jo’: ^Curnalctl -f
root@pve:~# ras-mc-ctl —errors
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.

===========================
root@pve:~# ras-mc-ctl —errors
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.

root@pve:~# numactl —hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 0 size: 64036 MB
node 0 free: 42345 MB
node distances:
node 0
0: 10
================

I run Intel NUC 7 BXNUC10i7FNH
Here is my CPU info https://pastebin.com/MpXedi1h

Anybody had experience with such errors ? Bad RAM, motherboard ?
Can it be benign?

Thx in advance!

CPU_ERRORS.png

460.4 KB

· Views: 10

Last edited: Sep 6, 2020

Other then the fact that I noticed this after pve was unresponsive (which could be coincidental and unrelated to h/w errors), I see not issues running pve

anybody has seen this ? maybe red herring ?
@wolfgang

Hi,

I guess the nuc gen 10 is too new and has some problems.
But if you like to prove that the memory with cpu is ok run for 30 min stress-ng

Code:

stress-ng --cpu 6 --vm 6 --verify 1 --vm-bytes 80%

If this test does not crash the likelihood is hight that the NUC will work without problems.

@wolfgang

Thank you for a good practical advise !

I ran that for ~40 min with 100% CPU
This is what I saw in the log:

Sep 09 08:04:30 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <7f>
TDT <93>
next_to_use <93>
next_to_clean <7e>
buffer_info[next_to_clean]:
time_stamp <102670de1>
next_to_watch <7f>
jiffies <1026716a8>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>

This NUC is still replaceable, would you suggest to replace it or you suspect it’s more a generic issue?

@evg32

Thank you !

What is interesting that after running stress-ng I did not see «mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6:» during boot

Have you seen the error like mine above too ? In other words, I want to understand why your solution is good for me? (I wish I was an expert in this area )

(
I did try it and it seem like there were much more output generated (without GRUB_CMDLINE_LINUX_DEFAULT=»quiet» ), I did not see my error, but saw
pve kernel: [ 5.659369] Bluetooth: hci0: Failed to load Intel firmware file (-2)
…
/var/log/syslog:Sep 9 09:18:34 pve kernel: [ 6.616670] Bluetooth: hci0: Failed to load Intel firmware file (-2)
/var/log/syslog:Sep 9 09:18:34 pve systemd[1]: apparmor.service: Failed with result ‘exit-code’.

Those maybe unrelated to this at all, guessing …
)

Main problem I am trying to assess now if my NUC h/w is bad and need to be replaced. Based on your post sounds it is not h/w related, correct ?

Thanks again !

I noticed that mce errors occured randomly, I couldn’t correlate them with anything.
Yep, I saw the same errors except that my CPU was i9-9900K.
That’s a CPU bug, as described here https://bugzilla.kernel.org/show_bug.cgi?id=109051

You are lucky with i9-9900K, I got i9-9880H from Hystou and could not even set it up, returned and then got I7

OK I will not replace the NUC then.

Thank you !

@evg32

Have you tried installing «intel-microcode» (apt install intel-microcode) ?
Wonder if that could help as well ?

@evg32

Have you tried installing «intel-microcode» (apt install intel-microcode) ?
Wonder if that could help as well ?

Yep, I tried to upgrade and downgrade cpu firmware. Nothing helped me.

Sep 09 08:04:30 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

Ignore this.
This stress test is not comparable with normal load and it is normal that other non tested parts get no resources and run in errors.

Have you tried installing «intel-microcode» (apt install intel-microcode) ?

As long you update the bios of your NUC, the microcode will bring no benefit.
Because the microcode from Debian comes also from Intel and Intel does a good job with keeping the NUC firmware update.

Yep, I tried to upgrade and downgrade cpu firmware. Nothing helped me.

Do you know what this line is actually doing ?

Code:

GRUB_CMDLINE_LINUX_DEFAULT="consoleblank=0 intel_idle.max_cstate=1"

It disables power saving.

It disables power saving.

I never setup sound on VMs and was reading about it today. That seems to be around grub settings as well.
Have you configured sound as well with that line ?

thx !

Nope, I never touched sounds configs. I just needed stable VMs and host server.

Источник

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Post

by GeNe64 » 2020-07-24 07:13

Hello,

I have a server with Debian 10 (Proxmox) that restarts autumatically. Frankly, I can’t fingure out where the isse is.

Code: Select all

ras-mc-ctl --errors
No Memory errors.

No PCIe AER errors.

No Extlog errors.

MCE events:
1 2020-07-19 03:43:06 +0200 error: Instruction CACHE Level-0 Instruction-Fetch Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x00000c0e, status=0x9400004000040150, addr=0x1ffff9c8e93c0, tsc=0x199d94e3f312c, walltime=0x5f13a52a, cpu=0x00000001, cpuid=0x000906ec, apicid=0x00000002
2 2020-07-19 03:55:10 +0200 error: Internal parity error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x00000c0e, status=0x9000004000010005, tsc=0x19c37efad6712, walltime=0x5f13a7fe, cpu=0x00000001, cpuid=0x000906ec, apicid=0x00000002
3 2020-07-23 15:37:51 +0200 error: Instruction CACHE Level-0 Instruction-Fetch Error, mcg mcgstatus=0, mci Corrected_error Error_enabled, mcgcap=0x00000c0e, status=0x9400004000040150, addr=0x974d56e7, tsc=0x91c13254a62a, walltime=0x5f1992af, cpu=0x00000001, cpuid=0x000906ec, apicid=0x00000002

Code: Select all

Jul 23 16:30:10 E2S kernel: smpboot: CPU0: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xc)
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: Machine check events logged
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: TSC 0 ADDR 63de0dd1 MISC 63de0dd1
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: PROCESSOR 0:906ec TIME 1595514604 SOCKET 0 APIC 0 microcode d6
...
Jul 23 16:30:10 E2S kernel: .... node  #0, CPUs:        #1
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: Machine check events logged
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: be00000000800400
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: TSC 0 ADDR 63de0dd1 MISC 63de0dd1
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: PROCESSOR 0:906ec TIME 1595514604 SOCKET 0 APIC 2 microcode d6

Is that hardware, firmware of software error?
Where should I dig in?

Thanks.

LE_746F6D617A7A69: Posts: 934; Joined: 2020-05-03 14:16; Has thanked: 7 times; Been thanked: 64 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Post

by LE_746F6D617A7A69 » 2020-07-24 09:41

Machine check exceptions are triggered by hardware faults — caused by physical problems with the hardware (overheating, unstable power, damaged CPU) or by regressions in the firmware.

I9 are prone to overheating — what temps do You have?
Will it crash if You set constant CPU clock well below the maximum (using linux-cpupower or cpufrequtils)?

If You have upgraded the firmware/BIOS recently, You may try to use the previous version — to confirm or reject the possibility of firmware regression.

Anyway, I would say that You should contact Proxmox regarding this issue.

Bill Gates: «(…) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system.»
The_full_story and Nothing_have_changed

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Post

by GeNe64 » 2020-07-24 13:40

LE_746F6D617A7A69 wrote:Machine check exceptions are triggered by hardware faults — caused by physical problems with the hardware (overheating, unstable power, damaged CPU) or by regressions in the firmware.

I9 are prone to overheating — what temps do You have?
Will it crash if You set constant CPU clock well below the maximum (using linux-cpupower or cpufrequtils)?

If You have upgraded the firmware/BIOS recently, You may try to use the previous version — to confirm or reject the possibility of firmware regression.

Anyway, I would say that You should contact Proxmox regarding this issue.

Thanks for the hints.
I didn’t monitor CPU temp yet, but noticed that server restarts after increased loading.

Is it possible to load CPU firmware that I want to?
Because I have the same CPU and software that work perfectly, but it has lower firmware. I’d like to test it as well.

I posted it on Proxmox forum, but they didn’t response it. Maybe didn’t notice, so I have to find a solution myself.

Bulkley: Posts: 6332; Joined: 2006-02-11 18:35; Has thanked: 2 times; Been thanked: 19 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Post

by Bulkley » 2020-07-24 19:48

A weak power supply can drive you crazy. So can a flaky on-off switch.

I’d blow the dust out and then re-seat every electrical connection including boards and memory.

I once had a mother board with a cold-solder joint causing intermittent freezes.

Sometimes you can find the problem while running with the cover off and very carefully tapping components with something that does not conduct electricity. A drinking straw, a plastic pencil or a fine wood dowel work. Be gentle. Set your machine to run something demanding like a movie and poke around among the works. If the machine quits while you are knocking around you know roughly where the problem is. I can’t stress enough to be careful at this and if you are not comfortable with it don’t do it.

CwF: Section Moderator; Posts: 1775; Joined: 2018-06-20 15:16; Location: Colorado; Has thanked: 13 times; Been thanked: 53 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#10

Post

by CwF » 2020-07-24 20:40

Bulkley wrote: You probably have too

Oh ya, just saying somewhere ‘in a datacenter’ means it may not be touched. Time for the browser plugin tunneling into the public ipmi data connect to check the bios, and we’ll just hope the roomba is around to clean it.
..just kidding..

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#11

Post

by GeNe64 » 2020-07-25 07:14

LE_746F6D617A7A69 wrote:

GeNe64 wrote:Is it possible to load CPU firmware that I want to?

Of course. There are at least 2 ways to do this:
1. Use apt-cache policy intel-microcode to view available versions and then install older version using apt-get install intel-microcode=<version>
2. Download older version of the intel-microcode package from http://snapshot.debian.org/

Many thanks, I’ve downgraded firmware to 3.20191115.2~deb10u1.
At least I don’t see now these errors

Code: Select all

Jul 23 16:30:10 E2S kernel: smpboot: CPU0: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xc)
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: Machine check events logged
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: TSC 0 ADDR 63de0dd1 MISC 63de0dd1
Jul 23 16:30:10 E2S kernel: mce: [Hardware Error]: PROCESSOR 0:906ec TIME 1595514604 SOCKET 0 APIC 0 microcode d6

CPU Temperature test 89°C) for 10 hrs was passed.
Testing it now with downgraded firmware…

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#12

Post

by GeNe64 » 2020-07-25 07:33

Bulkley wrote:A weak power supply can drive you crazy. So can a flaky on-off switch.

I’d blow the dust out and then re-seat every electrical connection including boards and memory.

I once had a mother board with a cold-solder joint causing intermittent freezes.

Sometimes you can find the problem while running with the cover off and very carefully tapping components with something that does not conduct electricity. A drinking straw, a plastic pencil or a fine wood dowel work. Be gentle. Set your machine to run something demanding like a movie and poke around among the works. If the machine quits while you are knocking around you know roughly where the problem is. I can’t stress enough to be careful at this and if you are not comfortable with it don’t do it.

The server was ordered as a dedicated server, I have remote access only.

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#13

Post

by GeNe64 » 2020-08-01 06:07

Guys, I couldn’t resolve the issues with the server above and ordered new one with the same specs (Intel® Core™ i9-9900K etc), but I’m still getting errors like these

Code: Select all

Jul 30 03:01:16 E3S kernel: [21083.991177] mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991178] mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991179] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991179] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991180] mce: CPU13: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991181] mce: CPU14: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991182] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.991340] mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1)
Jul 30 03:01:16 E3S kernel: [21083.992167] mce: CPU7: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992168] mce: CPU4: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992168] mce: CPU11: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992169] mce: CPU15: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992170] mce: CPU12: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992171] mce: CPU3: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992204] mce: CPU2: Core temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992205] mce: CPU2: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992208] mce: CPU5: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992209] mce: CPU13: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992210] mce: CPU0: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992210] mce: CPU8: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992211] mce: CPU1: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992212] mce: CPU9: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992213] mce: CPU6: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992213] mce: CPU14: Package temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.992235] mce: CPU10: Core temperature/speed normal
Jul 30 03:01:16 E3S kernel: [21083.995378] mce: CPU10: Package temperature/speed normal
Jul 30 03:50:03 E3S kernel: [24010.129044] perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Jul 30 05:40:02 E3S kernel: [30609.900970] mce: [Hardware Error]: Machine check events logged
Jul 31 00:00:01 E3S rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="725" x-info="https://www.rsyslog.com"] rsyslogd was HUPed
Jul 31 03:01:12 E3S kernel: [107480.111168] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Jul 31 03:01:12 E3S kernel: [107480.111168] mce: CPU8: Core temperature above threshold, cpu clock throttled (total events = 1)
Jul 31 03:01:12 E3S kernel: [107480.111169] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111171] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111172] mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111173] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111201] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111203] mce: CPU14: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111204] mce: CPU15: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111205] mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111210] mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111210] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111211] mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 2)
Jul 31 03:01:12 E3S kernel: [107480.111212] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 2)

Then I lose access to my server.
Any tips?

Deb-fan: Posts: 1047; Joined: 2012-08-14 12:27; Been thanked: 4 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#15

Post

by Deb-fan » 2020-08-01 14:32

Only a random thought, same challenges faced by Debian stable users with newer hardware on (desktop), certainly has to hold true in any other use(servers.) Perhaps consider installing a newer kernel and firmware versions etc. Rather than downgrading would likely be going the other way, hopefully providing improved support for the chosen hardware. Should install some monitoring software onto a server anyway. Dead simple way to rule out hardware and determine if it is Os misconfig, install a gnu/nix distro like Ubuntu onto it, does that install show the same quirks and problems ?

If runs smoother/better w/o displaying similar negative behavior, hardware’s fine, Debian’s not setup right for that system.

Most powerful FREE tech-support tool on the planet * HERE. *

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#16

Post

by GeNe64 » 2020-08-01 15:30

cuckooflew wrote:

The server was ordered as a dedicated server, I have remote access only.

I think I would be communicating with the provider of this server, if it is hardware, and it sounds like it is, then someone with physical access will need to do the «mechanic» part.

They’ve tested it and said it’s ok.

GeNe64: Posts: 10; Joined: 2020-07-24 07:05

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#17

Post

by GeNe64 » 2020-08-01 15:37

Deb-fan wrote:Only a random thought, same challenges faced by Debian stable users with newer hardware on (desktop), certainly has to hold true in any other use(servers.) Perhaps consider installing a newer kernel and firmware versions etc. Rather than downgrading would likely be going the other way, hopefully providing improved support for the chosen hardware. Should install some monitoring software onto a server anyway. Dead simple way to rule out hardware and determine if it is Os misconfig, install a gnu/nix distro like Ubuntu onto it, does that install show the same quirks and problems ?

If runs smoother/better w/o displaying similar negative behavior, hardware’s fine, Debian’s not setup right for that system.

I need only Debian to install Proxmox on it. When I run it unloaded (OS, Proxmox and a 10 unloaded VMs) it’s ok and works for 6+ days. If I start to load VMs (CPU, SSD, Network) then it crashes or something else in 1/2 days on both servers in different datacenters.

I can’t find any useful in standard logs. Any suggestions regarding monitoring software?

Deb-fan: Posts: 1047; Joined: 2012-08-14 12:27; Been thanked: 4 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#18

Post

by Deb-fan » 2020-08-01 15:59

Nope .. only vaguely aware of wth proxmox even is. A virtualization container type thing. Badly lacking in learning about virtualization all around. So what/which tools or how to approach trouble shooting it(proxmox) is beyond me … Sorry, though from what those techs apparently said to you and what you’re describing, not hardware but software problems. If they have help forums and likely do would spend time and ask the people there, who use that software for help n pointers on running down issues and possible fixes.

Most powerful FREE tech-support tool on the planet * HERE. *

LE_746F6D617A7A69: Posts: 934; Joined: 2020-05-03 14:16; Has thanked: 7 times; Been thanked: 64 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#19

Post

by LE_746F6D617A7A69 » 2020-08-01 16:02

GeNe64 wrote:

cuckooflew wrote:

The server was ordered as a dedicated server, I have remote access only.

I think I would be communicating with the provider of this server, if it is hardware, and it sounds like it is, then someone with physical access will need to do the «mechanic» part.

They’ve tested it and said it’s ok.

Thermal throttling alerts are saying something else — the machine has a problem with cooling, what suggest that f.e. it’s too hot in the server room.
Temperatures from SMART could show a more clear picture of what is happening in that Data Center.

Bill Gates: «(…) In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system.»
The_full_story and Nothing_have_changed

Deb-fan: Posts: 1047; Joined: 2012-08-14 12:27; Been thanked: 4 times

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

#20

Post

by Deb-fan » 2020-08-01 16:23

Come on now. The server room is too hot in a couple datacenters? Errr or, incorrectly configged software w processes hammering hell out of cpus causing heat and crash issues ? Which of these more likely? Still again … asking this in a mostly desktop oriented Debian gnu/nix community vs asking in proxmox or kvm ones?

Most powerful FREE tech-support tool on the planet * HERE. *

Источник

Hello Team,

I’ve purchased an Intel i9-10900K processor and motherboard for a new system.

During boot process, the following error is generated:

[ 0.158444] x86/cpu: SGX disabled by BIOS
[ 0.158464] mce: CPU0: Thermal monitoring enabled (TM1)
[ 0.158490] process: using mwait in idle threads
[ 0.158492] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[ 0.158493] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[ 0.158495] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.158496] Spectre V2 : Mitigation: Enhanced IBRS
[ 0.158497] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.158498] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.158499] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.158691] Freeing SMP alternatives memory: 40K
[ 0.160277] smpboot: Estimated ratio of average max frequency by base frequency (times 1024): 1356
[ 0.160300] smpboot: CPU0: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz (family: 0x6, model: 0xa5, stepping: 0x5)
[ 0.160352] mce: [Hardware Error]: Machine check events logged
[ 0.160353] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a
[ 0.160356] mce: [Hardware Error]: TSC 0 ADDR fef20300 MISC 3880000086
[ 0.160359] mce: [Hardware Error]: PROCESSOR 0:a0655 TIME 1631860523 SOCKET 0 APIC 0 microcode ec
[ 0.160403] Performance Events: PEBS fmt3+, Skylake events, 32-deep LBR, full-width counters, Intel PMU driver.
[ 0.160412] … version: 4
[ 0.160412] … bit width: 48
[ 0.160413] … generic registers: 4
[ 0.160413] … value mask: 0000ffffffffffff
[ 0.160414] … max period: 00007fffffffffff
[ 0.160414] … fixed-purpose events: 3
[ 0.160415] … event mask: 000000070000000f
[ 0.160482] rcu: Hierarchical SRCU implementation.
[ 0.161313] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[ 0.161420] smp: Bringing up secondary CPUs …
[ 0.161478] x86: Booting SMP configuration:
[ 0.161479] …. node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19
[ 0.183475] smp: Brought up 1 node, 20 CPUs

Processing the error with ‘mcelog’ returns the following:

# mcelog —ascii < error
Machine check events logged
Hardware event. This is not a software error.
CPU 0 BANK 6
MISC 3880000086 ADDR fef20300
TIME 1631860523 Fri Sep 17 01:35:23 2021
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-3 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 165 Step 5
SOCKET 0 APIC 0 microcode ec

It appears that the error is related to L3 Cache on the processor.

I contacted the hardware vendor, and they advised that I must contact the manufacturer.

I ran the Intel SSU utility available at the URL below.

Intel® System Support Utility for the Linux

https://www.intel.com/content/www/us/en/download/18895/26735/intel-system-support-utility-for-the-li…

SSU Output

# ./ssu.sh -d=0 -l=0 -m=0 -b=0 -n=0 -os=0 -o=CPU_Only.txt -p=0 -c=1 -s=0

# SSU Scan Information
Scan Info:
Version:»1.0.0.0″
Scan Date:»2021/09/17″
Scan Time:»06:31:40″

## Scanned Hardware
Computer:
— Processor
— «Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz»
Architecture:»x86_64″
Available:»Offline»
Byte Order:»Little Endian»
Cache Size:»20480 KB»
Caption:»Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz»
— Characteristics
64-bit capable
Enhanced Virtualization
Execute Protection
Hardware Thread
Multi-Core
Power/Performance Control
CPU Speed (Minimum):»1000.000″
CPU Speed (Maximum):»5300 MHz»
Current Voltage:»1.0 V»
External Clock:»100 MHz»
Family:»Not Available»
Flags:»Not Available»
ID:»55 06 0A 00 FF FB EB BF»
Level 1 Cache:»32K»
Level 2 Cache:»256K»
Level 3 Cache:»20480K»
Load:»load average: 0.40, 0.43, 0.18″
Manufacturer:»Intel(R) Corporation»
Model:»165″
Name:»Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz»
Number of Cores:»10″
Number of Cores — Enabled:»10″
Part Number:»To Be Filled By O.E.M.»
Socket Designation:»U3E1″
Status:»Populated, Enabled»
Version:»Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz»
Voltage:»1.0 V»
Virtualization:»Not Available»

Can you please review and provide guidance regarding the next step?

Thanks in advance, your help is very much appreciated!

Источник

Как правильно задавать вопросы

Правильно сформулированный вопрос и его грамотное оформление способствует высокой вероятности получения достаточно содержательного и по существу ответа. Общая рекомендация по составлению тем: 1. Для начала воспользуйтесь поиском форума. 2. Укажите версию ОС вместе с разрядностью. Пример: LM 19.3 x64, LM Sarah x32 3. DE. Если вопрос касается двух, то через запятую. (xfce, KDE, cinnamon, mate) 4. Какое железо. (достаточно вывод inxi -Fxz в спойлере (как пользоваться спойлером смотрим здесь)) или же дать ссылку на hw-probe 5. Суть. Желательно с выводом консоли, логами. 6. Скрин. Просьба указывать 2, 3 и 4 независимо от того, имеет ли это отношение к вопросу или нет. Так же не забываем об общих правилах Как пример вот

Denys: Сообщения: 3; Зарегистрирован: 18 фев 2018, 11:22; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 11:35

После того как выбираю устоновить выбивает ошибку, что это ???? И как с ним быть ??

Вложения

: После того как выбираю устоновить выбивает ошибку, что это ???? И как с ним быть ??

Chocobo: Сообщения: 9954; Зарегистрирован: 27 авг 2016, 22:57; Решено: 214; Откуда: НН; Благодарил (а): 795 раз; Поблагодарили: 2980 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 12:10

Denys,
1. Что за проц?
2. Какая версия дистрибутива?
3. почему не минт, или почему пост не на форуме убунты?

Denys: Сообщения: 3; Зарегистрирован: 18 фев 2018, 11:22; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 13:10

I7-7700hq
Версия 17,
Ставил минт18, было какое то мерцание и та же ошибка.
Посоветовали здесь

AlexZ: Сообщения: 1395; Зарегистрирован: 06 янв 2018, 21:06; Решено: 3; Откуда: Горно-Алтайск; Благодарил (а): 212 раз; Поблагодарили: 177 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 13:13

Denys писал(а): ↑

18 фев 2018, 11:35

После того как выбираю устоновить выбивает ошибку, что это ???? И как с ним быть ??

У меня на новых ядрах это появилось, вот на 4.14:
kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ae0000000040110a
kernel: mce: [Hardware Error]: TSC 0 ADDR fef86e00 MISC 78a0000086
kernel: mce: [Hardware Error]: PROCESSOR 0:40651 TIME 1518826617 SOCKET 0 APIC 0 microcode 20
Хотя на загрузку не влияет..
Попробуй загрузиться с параметром nomodeset

Denys: Сообщения: 3; Зарегистрирован: 18 фев 2018, 11:22; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 13:33

Как выбрать эти параметры?

x230: Сообщения: 2094; Зарегистрирован: 02 сен 2016, 22:07; Решено: 5; Благодарил (а): 406 раз; Поблагодарили: 487 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 13:47

Denys писал(а): ↑

18 фев 2018, 13:33

Как выбрать эти параметры?

Тут и тут смотри про параметр.
Не прокатит с модесет, попробуй отключить UEFI (в смысле включить Legacy boot) и начни установку поновой.

Chocobo: Сообщения: 9954; Зарегистрирован: 27 авг 2016, 22:57; Решено: 214; Откуда: НН; Благодарил (а): 795 раз; Поблагодарили: 2980 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 13:50

Denys писал(а): ↑

18 фев 2018, 13:10

I7-7700hq
Версия 17

Убунты — 17.04 /17.10? 4.10 из Zesty может еще не подойти. Проц тут явно ругается
ядро 4.13 из 17.10 уже в принципе должно взлететь.

Также можно попробовать

федору

последнюю, или 18-й минт я тут

пересобирал

с 4.13 ядром образ.

x230: Сообщения: 2094; Зарегистрирован: 02 сен 2016, 22:07; Решено: 5; Благодарил (а): 406 раз; Поблагодарили: 487 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 15:47

Chocobo писал(а): ↑

18 фев 2018, 13:50

пересобирал

чем и как? — бо интерсуюсь

Chocobo: Сообщения: 9954; Зарегистрирован: 27 авг 2016, 22:57; Решено: 214; Откуда: НН; Благодарил (а): 795 раз; Поблагодарили: 2980 раз; Контактная информация:

Ошибка при установке ubuntu

18 фев 2018, 15:54

Не по теме

x230, руками как обычно ибо слишком часто издавна меня подводили эти приблуды, которые предназначены были помогать.
Разово разобравшись с процессом по пунктам — оно не кажется чем-то страшным, все становится на свои места. При желании и тебя научим, если что-либо не ясно:)

x230: Сообщения: 2094; Зарегистрирован: 02 сен 2016, 22:07; Решено: 5; Благодарил (а): 406 раз; Поблагодарили: 487 раз; Контактная информация:

Ошибка при установке ubuntu

#10

18 фев 2018, 16:22

Chocobo писал(а): ↑

18 фев 2018, 15:54

При желании и тебя научим, если что-либо не ясно:)

всё неясно, поэтому и пользуюсь gui-костылями

Источник

# 1 год, 6 месяцев назад
Темы: 33 Сообщения: 516 Участник с: 30 мая 2019	Добрый день. Ситуация такова: новая материнка gigabyte чип b560, проц. i5-10400. При загрузке windows 10 всё впорядке — короткий сигнал спикера (значит POST пройден нормально), дальше нормальная загрузка. При загрузке Archlinux: короткий сигнал спикера, начинается загрузка, вылетают аппаратные ошибки типа: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a [Hardware Error]: TSC 0 ADDR fef20000 MISC 43880000086 [Hardware Error]: PROCESSOR 0:a0655 TIME 1626537016 SOCKET 0 APIC 0 microcode e2. Однако загрузка доходит до конца успешно и Arch нормально работает дальше. Мукрокод Intel установлен. Может это несовершенство микрокода и ошибки можно игнорировать? Или что-то другое? Подскажите пожалуйста если кто может.

#
1 год, 6 месяцев назад

Темы:

Сообщения:

516

Участник с: 30 мая 2019

Добрый день. Ситуация такова: новая материнка gigabyte чип b560, проц. i5-10400.
При загрузке windows 10 всё впорядке — короткий сигнал спикера (значит POST пройден нормально), дальше нормальная загрузка.
При загрузке Archlinux: короткий сигнал спикера, начинается загрузка, вылетают аппаратные ошибки типа:
[Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee0000000040110a
[Hardware Error]: TSC 0 ADDR fef20000 MISC 43880000086
[Hardware Error]: PROCESSOR 0:a0655 TIME 1626537016 SOCKET 0 APIC 0 microcode e2.
Однако загрузка доходит до конца успешно и Arch нормально работает дальше. Мукрокод Intel установлен. Может это несовершенство микрокода и ошибки можно игнорировать? Или что-то другое? Подскажите пожалуйста если кто может.

indeviral	# 1 год, 6 месяцев назад
Темы: 39 Сообщения: 3170 Участник с: 10 августа 2013	Разобраться с такими ошибками сложно(( Если всё работает, можно просто спрятать mce=nobootlog Ошибки в тексте-неповторимый стиль автора©

indeviral

#
1 год, 6 месяцев назад

Темы:

Сообщения:

3170

Участник с: 10 августа 2013

Разобраться с такими ошибками сложно((
Если всё работает, можно просто спрятать mce=nobootlog

rutgerg	# 1 год, 6 месяцев назад
Темы: 33 Сообщения: 516 Участник с: 30 мая 2019	indeviral просто спрятать mce=nobootlog А куда это вставлять? Простите за невежество.

rutgerg

#
1 год, 6 месяцев назад

Темы:

Сообщения:

516

Участник с: 30 мая 2019

indeviral
просто спрятать mce=nobootlog

А куда это вставлять? Простите за невежество.

indeviral	# 1 год, 6 месяцев назад (отредактировано 1 год, 6 месяцев назад)
Темы: 39 Сообщения: 3170 Участник с: 10 августа 2013	В параметры инициализации ядра В конфигурацию grub или что там нынче в моде… Ошибки в тексте-неповторимый стиль автора©

indeviral

#
1 год, 6 месяцев назад

(отредактировано

1 год, 6 месяцев назад)

Темы:

Сообщения:

3170

Участник с: 10 августа 2013

В параметры инициализации ядра
В конфигурацию grub или что там нынче в моде…

rutgerg	# 1 год, 6 месяцев назад
Темы: 33 Сообщения: 516 Участник с: 30 мая 2019	Спасибо. Попробую.

rutgerg

#
1 год, 6 месяцев назад

Темы:

Сообщения:

516

Участник с: 30 мая 2019

Спасибо. Попробую.

vasek	# 1 год, 6 месяцев назад
Темы: 47 Сообщения: 11417 Участник с: 17 февраля 2013	rutgerg, появился топик на BBS с похожей, но не совсем, ошибкой mce … рекомендую почитать, особенно в части параметра mce=nobootlog — если с этим параметром ошибка больше не появится в логах, то беспокоится не стоит. Ошибки не исчезают с опытом — они просто умнеют

vasek

#
1 год, 6 месяцев назад

Темы:

Сообщения:

11417

Участник с: 17 февраля 2013

rutgerg, появился топик на BBS с похожей, но не совсем, ошибкой mce … рекомендую почитать, особенно в части параметра mce=nobootlog — если с этим параметром ошибка больше не появится в логах, то беспокоится не стоит.

Ошибки не исчезают с опытом — они просто умнеют

Источник

#1 2021-05-09 22:13:41

mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#2 2021-05-09 22:22:07

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#3 2021-05-09 22:53:53

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#4 2021-05-10 06:12:38

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#5 2021-05-10 07:59:20

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#6 2021-05-10 17:40:37

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#7 2021-05-10 20:28:24

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#8 2021-05-10 23:55:21

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#9 2021-05-11 06:19:17

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#10 2021-05-15 01:18:43

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#11 2021-05-15 02:51:11

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#12 2021-05-15 13:36:14

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#13 2021-05-15 14:34:27

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#14 2021-05-15 15:12:47

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#15 2021-07-26 12:01:39

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#16 2021-07-26 12:23:56

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#17 2021-07-26 12:49:53

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#18 2021-07-26 14:55:04

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#19 2021-07-26 17:40:35

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#20 2021-07-28 16:22:48

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#21 2021-08-11 20:10:09

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

#22 2021-10-26 03:58:33

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 6: ee2000000040110

mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Re: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Ошибка при установке ubuntu

Читайте также: