Cache hierarchy error bus interconnect error

I have a new system built and running seemingly well, apart from this "fatal hardware error" which has popped-up at least twice (maybe more).  Admittedly, I have another PSU coming from Corsair that will hopefully fix a 5% to 10% reproducible startup issue where the PSU has to turned off and then o...

I’m just going to share this here since anyone else reading this using the same apps may be able to verify.

Full specs:
Ryzen 5 3600 4.2ghz All core undervolt 1.125v
DDR4 3600mhz 2x8gb ~C16,, HCImemtest 200% stable.
Gigabyte RX 6700 XT Slight undervolt, all big titles stable(Days gone, warzone, tarkov etc.).
Rebar enabled according to GPU-Z.
Agesa 1.1.0.0, updated to 1.2.0.2 mid-may in an attempt to fix the Hierarchy Error, seems the Agesa is innocent.
Radeon Driver 21.5.2(this is also suspect as I jumped on 21.5.1 as soon as it released).

I started getting frequent Cache Heirarchy reboots at the start of May, roughly the same time I started using HWinfo(launch at startup) along side MSI afterburner with a undervolted Ryzen 5 3600, now this CPU is ROCK stable in everything I throw at it including long idle periods(I was concerned about the Vcore being too low at first) but now I suspect it may be an issue with Resizable Bar & GPU behaviour + Compute mode COMBINED with CPU trying to ‘park’ and getting polled for temps too frequently which would obviously be aggravated by multiple sensor programs but Im honestly unsure how the CPU+GPU communicate and if things like Compute mode and Rebar make much difference at all, only a AMD Tech would be able to answer it properly.

So the long story is, I was doing on & off mining, 24hr stable for MONTHS until I started benchmarking games again and set HWinfo to ‘launch at startup’ to integrate with Afterburner OSD. This is when the Heirarchy crashes began(early May). At first I suspected my undervolt immediately as it’s 4.2ghz all core @ 1.125v,, but bumping up to 1.2v made no difference. I did not suspect HWinfo involvement at first, but over 3 weeks of very random and intermittent Heirarchy crashes(mostly at idle) I stumbled across a thread linking it to HWinfo(and other users confirming), so I disabled HWinfo launch at startup from that point, and stopped using all programs except MSI afterburner. 

No Heirarchy crashes until 13 days later, when I ran nicehash for all of 5 minutes but changed my mind and shut the PC down. The next day I was greeted with a Heirarchy crash when loading Cyberpunk shortly after boot. And THIS gave me a hint that a combination of not specifically sensor programs, but Mining software enabling ‘GPU compute’ mode was persisting after reboot and contributing to the Heirarchy crash, after a crash the GPU drivers reset which is why the issue seemed so intermittent. After the crash I loaded up Cyberpunk and tested for hours no problems.

So the TLDR is, Monitoring programs and any Mining software that may be enabling Compute mode are key factors in ‘triggering’ the Heirarchy issue, and it may be GPU related. I went from about 10 crashes between the first 2 weeks of May, to only one crash after removing HWinfo from my startup programs and avoiding ‘compute mode’ set by Nicehash. Its frustrating because while Compute mode can be manually enabled via mining software, its not as simple to disable since the switch back to graphics mode was removed from Radeon software.

The main reason for sharing this info is anyone with ‘similarities’ might have a clue were to start troubleshooting.

Содержание

  1. Processors
  2. Processors
  3. Processors
  4. Processors
  5. Question BSOD- Ryzen 5600x during gaming
  6. DegraNL

Processors

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page

I’ve been struggling with this issue for months now. It started almost immediately after buying a new B550 Aorus Pro AC motherboard, 3700x, and Crucial Ballistix RGB RAM. These errors pop up in event viewer as corrected hardware errors, and the category is Bus/Interconnect Error. Sometimes it will also cause a restart, and many of these restarts are labeled as Cache Hierarchy errors. I have found that the quickest way to make it crash is run Prime 95 on the Large FFTs to stress the memory controller and RAM. I have determined that the RAM itself is not the issue. It is also not the RAM slots on the motherboard. I’m not sure if it’s the CPU, but I want to go through all the other possible causes first before I consider RMAing the CPU. I have also tried a few different BIOS versions, and they didn’t help, but I do still want to test an older version before I rule out the BIOS. I ran the stress test in safe mode to see if any software was to blame. While I don’t think I ran it long enough to be certain it isn’t a hardware or BIOS issue, I did find that the instant replay option in the Radeon Software was writing to the RAM and causing more frequent crashes. I am hoping someone on this forum can help me out, and I’m open to any and all suggestions. Also, please let me know if you need me to provide any other information.

Источник

Processors

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page

I’ve been struggling with this issue for months now. It started almost immediately after buying a new B550 Aorus Pro AC motherboard, 3700x, and Crucial Ballistix RGB RAM. These errors pop up in event viewer as corrected hardware errors, and the category is Bus/Interconnect Error. Sometimes it will also cause a restart, and many of these restarts are labeled as Cache Hierarchy errors. I have found that the quickest way to make it crash is run Prime 95 on the Large FFTs to stress the memory controller and RAM. I have determined that the RAM itself is not the issue. It is also not the RAM slots on the motherboard. I’m not sure if it’s the CPU, but I want to go through all the other possible causes first before I consider RMAing the CPU. I have also tried a few different BIOS versions, and they didn’t help, but I do still want to test an older version before I rule out the BIOS. I ran the stress test in safe mode to see if any software was to blame. While I don’t think I ran it long enough to be certain it isn’t a hardware or BIOS issue, I did find that the instant replay option in the Radeon Software was writing to the RAM and causing more frequent crashes. I am hoping someone on this forum can help me out, and I’m open to any and all suggestions. Also, please let me know if you need me to provide any other information.

Источник

Processors

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page

I’ve been struggling with this issue for months now. It started almost immediately after buying a new B550 Aorus Pro AC motherboard, 3700x, and Crucial Ballistix RGB RAM. These errors pop up in event viewer as corrected hardware errors, and the category is Bus/Interconnect Error. Sometimes it will also cause a restart, and many of these restarts are labeled as Cache Hierarchy errors. I have found that the quickest way to make it crash is run Prime 95 on the Large FFTs to stress the memory controller and RAM. I have determined that the RAM itself is not the issue. It is also not the RAM slots on the motherboard. I’m not sure if it’s the CPU, but I want to go through all the other possible causes first before I consider RMAing the CPU. I have also tried a few different BIOS versions, and they didn’t help, but I do still want to test an older version before I rule out the BIOS. I ran the stress test in safe mode to see if any software was to blame. While I don’t think I ran it long enough to be certain it isn’t a hardware or BIOS issue, I did find that the instant replay option in the Radeon Software was writing to the RAM and causing more frequent crashes. I am hoping someone on this forum can help me out, and I’m open to any and all suggestions. Also, please let me know if you need me to provide any other information.

Источник

Processors

  • Subscribe to RSS Feed
  • Mark Topic as New
  • Mark Topic as Read
  • Float this Topic for Current User
  • Bookmark
  • Subscribe
  • Mute
  • Printer Friendly Page

As the title states, my newly built system just reboots whenever the CPU is under load.

I get the following error in the Windows Event Viewer:

Event 18, WHEA-Logger

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Bus/Interconnect Error
Processor APIC ID: 0

The details view of this entry contains further information.

(I attached the detailed information)

During normal use the system works fine. Under load (gaming/cinebench or Ryzen Master stress test) it reboots.

While playing games it reboots at seemingly random times (sometimes in short succession, sometimes not at all). During stress tests with Cinebench it usually rebooted during the second consecutive run and with the Ryzen Master stress test it rebooted during the CPU test (5 minutes). It did not reboot when I launched a RAM only stress test.

During the stress test Ryzen Master reported the following stats:

70 C
Peak Speed:

4100 MHz

  • PPT: 99% of 88 W
  • CPU Power:

    65 W

  • SOC Power: 12 W
  • TDC (CPU): 82% of 60 A
  • EDC (CPU): 100% of 90 A
  • (might the PPT and EDC be the problem?)

    During the reboots the CPU, VGA and BOOT (booting device) LED light up on the motherboard indicating that they were not detected or failed. Still boots though.

    My system is not overheating (reboots happen when the CPU temp is in the high 60 / low 70) and I have not overclocked anything. Apart from setting the RAM to 3600MHz in the BIOS.

    All drivers are up to date according the MSI Live verification and manual comparison. (Not 100% sure for AMD drivers)

    I also do net get a BSOD, even though I disabled automatic reboots upon a crash.

    These are my components:

    • Processor AMD Ryzen 7 3700X 8-Core Processor (stock cooler)
    • MSI MEG X570 UNIFY (MS-7C35)
    • G.Skill Trident Z Neo (2x, 16GB, DDR4-3600, DIMM 288)
      • set to DDR4-3600 in the BIOS
    • NVIDIA GeForce GTX 980 (salvaged from my previous PC)
    • be quiet! Straight Power 11 (650W)

    Thank you for reading and helping me solve this issue!

    If any further information is required, let me know.

    PS: I also attached the MSI Dragon system information.

    Источник

    Question BSOD- Ryzen 5600x during gaming

    DegraNL

    Since installing my new Aorus A520 ELITE with bios version f12 and my new Ryzen 5600x CPU I’ve been experiencing WHEA Undetectable errors during gaming. However, not all games give these. Some examples which do give are Apex Legends, Valheim and sometimes Escape from Tarkov. Details of the WHEA Undetectable error are as follows:

    [ Name] Microsoft-Windows-WHEA-Logger
    [ Guid]

    [ ProcessID] 4160
    [ ThreadID] 5032

    ErrorSource 3
    ApicId 8
    MCABank 0
    MciStat 0xbc00080001010135
    MciAddr 0x257c89f30
    MciMisc 0xd01a0ffe00000000
    ErrorType 9
    TransactionType 1
    Participation 256
    RequestType 3
    MemorIO 256
    MemHierarchyLvl 1
    Timeout 256
    OperationType 256
    Channel 256
    Length 936
    RawData 435045521002FFFFFFFF03000100000002000000A803000023020D000B0715140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FE6FF5E89C91C54CBA8865ABE14913BB5AF2D6035576D70102000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000001000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000001000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000100000000000000000000000000000000000000000000007F010000000000000002010100010000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000800000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000800000000000000100FA20000080C080B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000F50157A5EFE3DE43AC72249B573FAD2C03000000000000009F004D0400000000309FC857020000000000000000000000000000000000000000000000000000000200000002000000D9C59D045576D70108000000000000000000000000000000000000000000000035010101000800BC309FC8570200000000000000FE0F1AD0000000000800000000000000B00010000000000000000000FD010000270000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    The undetailed error code is ‘Cache hierarchy error’ on Processor-APIC-id: 8

    The second WHEA logger is:

    [ Name] Microsoft-Windows-WHEA-Logger
    [ Guid]

    [ ProcessID] 4160
    [ ThreadID] 4692

    ErrorSource 3
    ApicId 0
    MCABank 27
    MciStat 0xbaa000000000080b
    MciAddr 0x0
    MciMisc 0xd01a0ffe00000000
    ErrorType 10
    TransactionType 256
    Participation 0
    RequestType 0
    MemorIO 2
    MemHierarchyLvl 3
    Timeout 0
    OperationType 256
    Channel 256
    Length 936
    RawData 435045521002FFFFFFFF03000100000002000000A803000023020D000B0715140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FE6FF5E89C91C54CBA8865ABE14913BB59F2D6035576D70102000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000001000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000001000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000100000000000000000000000000000000000000000000007F010000000000000002040000030000100FA2000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007000000000000000000000000000000100FA20000080C000B32D87EFFFB8B170000000000000000000000000000000000000000000000000000000000000000B3F8F31CB1C5A249AA595EEF92FFA63C01000000000000009E07C0060400000000000000000000000000000000000000000000000000000000000000000000000200000002000000D9C59D045576D70100000000000000000000000000000000000000001B0000000B0800000000A0BA000000000000000000000000FE0F1AD00000000000000000000500002E0001000000005D000000007D000000270000000000000000000000000000000000000000000000000010000000000000001000000000000000100000000000000010003B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
    With the undetailed explanation: Bus/interconnect error on processor-APIC id 0

    I have already used OCCT and stressed my cpu to its limits without any bluescreen. I also removed two out of the four RAM sticks twice to see if it was a memory malfunction. I haven’t tried Memtest86 yet. Is my CPU broken or can I fix it? My motherboard also doesnt support OC, So I cant turn off PBO, which I read online might be the cause. Can anyone help me?

    Источник

    In the previous two parts, we examined error packets and error records, now we will begin to discuss the debugging methodology involved with a Stop 0x124 bugcheck, and how to gather useful debugging information from the dump file using WinDbg. I’ve split this final part into two sections: processor type errors and PCIe type errors, since these are both the most common errors you’ll experience when debugging a Stop 0x124 bugcheck.

    Processor Type Errors:

    Upon loading the dump file into WinDbg, you will be greeted with the following parameters:

    BugCheck 124, {0, fffffa80080d2028, f6000d80, 40150}
    
    Probably caused by : GenuineIntel

    The first parameter is the type of error source as discussed in Part 1. As mentioned previously, it is stored within the enumeration called _WHEA_ERROR_SOURCE_TYPE, and from looking at the value of the parameter we know that the error source type was a Machine Check Exception (MCE). A MCE is a troubleshooting mechanism used by the processor to report hardware errors to the operating system. It can be used to report a wide variety of errors, including cache errors, bus errors and memory errors. The most common from my experience is the that MCE reports cache errors.

    The second parameter is the address of the error record, which as explained in Part 2, is represented by _WHEA_ERROR_RECORD. This is the most important parameter of both of the bugcheck types. We will be using the !errrec extension to dump this structure and examine the sections which were also discussed in Part 2.

    The third and fourth parameters are the higher and lower bits of the MCi_STATUS registers, which do not have any significant additional debugging value, apart from self interest of the CPU architecture. If you wish, you can dump the contents in WinDbg using the following:

    0: kd> dt hal!_MCi_STATUS
       +0x000 McaErrorCode     : Uint2B
       +0x002 ModelErrorCode   : Uint2B
       +0x004 OtherInformation : Pos 0, 23 Bits
       +0x004 ActionRequired   : Pos 23, 1 Bit
       +0x004 Signalling       : Pos 24, 1 Bit
       +0x004 ContextCorrupt   : Pos 25, 1 Bit
       +0x004 AddressValid     : Pos 26, 1 Bit
       +0x004 MiscValid        : Pos 27, 1 Bit
       +0x004 ErrorEnabled     : Pos 28, 1 Bit
       +0x004 UncorrectedError : Pos 29, 1 Bit
       +0x004 StatusOverFlow   : Pos 30, 1 Bit
       +0x004 Valid            : Pos 31, 1 Bit
       +0x000 QuadPart         : Uint8B

    You’ll have to dump the other values using the .formats command and then comparing the MCi_STATUS structure to the bit values.

    I’ve highlighted the GenuineIntel string since some users make the mistake of assuming automatically that the processor is at fault, and this is simply not true. The string is used to identify if the system is using a real Intel processor. For informational purposes, the string can be found in the following structure:

    0: kd> dt nt!_KPRCB -y VendorString
       +0x4bb8 VendorString : [13] UChar

    From this particular example, we can find the address of the PRCB by using the !prcb extension and then using the given address on the above mentioned structure.

    0: kd> !prcb
    PRCB for Processor 0 at fffff780ffff0000:
    Current IRQL -- 15
    Threads--  Current fffffa8007618b50 Next 0000000000000000 Idle fffff8000345fcc0
    Processor Index 0 Number (0, 0) GroupSetMember 1
    Interrupt Count -- 000b8afa
    Times -- Dpc    00000100 Interrupt 00000023 
             Kernel 0002724c User      00000698
    0: kd> dt nt!_KPRCB -y VendorString fffff780ffff0000
       +0x4bb8 VendorString : [13]  "GenuineIntel"

    Okay, we have now established the meaning of the parameters, and have discovered that error was reported the processor through MCE. We will now need to dump the error record.

    0: kd> !errrec fffffa80080d2028
    ===============================================================================
    Common Platform Error Record @ fffffa80080d2028
    -------------------------------------------------------------------------------
    Record Id     : 01d090ac66cc4cb7
    Severity      : Fatal (1)
    Length        : 928
    Creator       : Microsoft
    Notify Type   : Machine Check Exception
    Timestamp     : 5/17/2015 15:10:51 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa80080d20a8
    Section       @ fffffa80080d2180
    Offset        : 344
    Length        : 192
    Flags         : 0x00000001 Primary
    Severity      : Fatal
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    Error Type    : Cache error
    Operation     : Instruction Execute
    Flags         : 0x00
    Level         : 0
    CPU Version   : 0x00000000000106e5
    Processor ID  : 0x0000000000000000
    
    ===============================================================================
    Section 1     : x86/x64 Processor Specific
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa80080d20f0
    Section       @ fffffa80080d2240
    Offset        : 536
    Length        : 128
    Flags         : 0x00000000
    Severity      : Fatal
    
    Local APIC Id : 0x0000000000000000
    CPU Id        : e5 06 01 00 00 08 10 00 - fd e3 98 00 ff fb eb bf
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
    
    Proc. Info 0  @ fffffa80080d2240
    
    ===============================================================================
    Section 2     : x86/x64 MCA
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa80080d2138
    Section       @ fffffa80080d22c0
    Offset        : 664
    Length        : 264
    Flags         : 0x00000000
    Severity      : Fatal
    
    Error         : ICACHEL0_IRD_ERR (Proc 0 Bank 2)
      Status      : 0xf6000d8000040150
      Address     : 0x00000000004b6990
      Misc.       : 0x0000000000000000

    The most important section is Section 2: x86/x64 MCA, since this section contains data specific to the data which the MCE would have reported to WHEA. We that error severity was fatal, thus leading to the creation of the bugcheck to begin with. I’ll go back to Section 0 in a moment. In Section 2, the Error field contains a mnemonic to type of error which is shown in Section 0. The mnemonic can be deciphered using the Intel processor documentation from following page 2352.

    There is 4 different error classifications if you don’t consider the generic processor error type. These error classifications come to form what is known as a compound error code. In our example I’ve highlighted the sections which can vary and take different values depending upon the situation.

    The current compound error classifications are:

    Type Interpretation
    Generic Cache Hierarchy Generic Cache Hierarchy Error
    TLB Errors {TT}TLB{LL}_ERR
    Memory Controller Errors {MMM}_CHANNEL_{CCCC}_ERR
    Cache Hierarchy Errors {TT}CACHE{LL}_{RRRR}_ERR
    Bus and Interconnect Errors BUS{LL}_{PP}_{RRRR}_{II}_{T}_ERR

    I’ve taken Table 15-9 (Section 15.9.2) from the Intel documentation and cut it down into two sections, for our example we will be looking at the Cache Hierarchy Errors. Just to clarify, when dumping the error record and looking at the CPU mnemonic, attempt to decipher which error interpretation should be applied so your able to investigate the cause of the error in greater depth. The following mnemonics have been copied from the Intel documentation, I’ve added the section numbers if you wish to check yourself.

    The {TT} variable is known as the Transaction Type (Section 15.9.2.2) and can take the following values:

    • I = Instruction
    • D = Data
    • G = Generic

    In our particular type, the transaction type was an instruction, so we know that the error was based around the execution of some instruction.

    The {LL} variable is known as the Level (Level of the Memory Hierarchy (Section 15.9.2.3)) and points to the type of cache which has experienced the error condition. It can take the following values:

    • L0 = Level 0
    • L1 = Level 1
    • L2 = Level 2
    • LG = Level Generic

    The {RRRR} is known as the Request Type (Section 15.9.2.4) field and indicates the type of instruction or action which was being carried out at the time of the error. The variable can take the following values:

    • ERR = Generic Error
    • RD = Generic Read
    • WR = Generic Write
    • DRD = Data Read
    • DWR = Data Write
    • IRD = Instruction Fetch
    • PREFETCH = Prefetch
    • EVICT = Eviction
    • SNOOP = Snoop

    I will quickly provide the meanings of the other sub-fields to save having to trawl through the Intel documentation. The {MMM} and {CCCC} fields primarily apply to Memory Controller errors. {MMM} is a 3-bit field called the Memory Transaction Type, whereas, {CCCC} is a 4-bit field for Channels. The memory controller error mnemonics can be found in Section 15.9.2.5.

    The {MMM} field has the meanings of:

    • GEN = Generic undefined request
    • RD = Memory Read Error
    • WR = Memory Write Error
    • AC = Address/Command Error
    • MS = Memory Scrubbing Error

    The {CCCC} field has one meaning which is CHN corresponds to the Channel Number.

    Bus and Interconnect errors have three additional fields called {PP} for Participation; {T} for Timeout and {II} for I/O or Memory. The bus and interconnect error mnemonics can be found in Section 15.9.2.6.

    {PP} defines how the processor participated within the request, and thus:

    • SRC = Local processor originated request
    • RES = Local processor responded to the request
    • OBS = Local processor observed the error as a third party

    {T} defines if the processor requested for a timeout of the error or not:

    • TIMEOUT = Request timed out
    • NOTIMEOUT = Request didn’t time out

    {II} defines the processor bus asked for memory access or I/O access.

    • M = Memory Access
    • I/O = IO

    We have examined the error condition and have a good understanding of what the error is pertains to, however, we will need to gather some general hardware information to either check for patches or to provide greater troubleshooting information to the hardware manufacturer. There are several WinDbg extensions which enable us to achieve this.

    2: kd> !sysinfo machineid
    Machine ID Information [From Smbios 2.7, DMIVersion 39, Size=3456]
    BiosMajorRelease = 4
    BiosMinorRelease = 6
    BiosVendor = American Megatrends Inc.
    BiosVersion = 1005
    BiosReleaseDate = 10/11/2012
    SystemManufacturer = System manufacturer
    SystemProductName = System Product Name
    SystemFamily = To be filled by O.E.M.
    SystemVersion = System Version
    SystemSKU = SKU
    BaseBoardManufacturer = ASUSTeK COMPUTER INC.
    BaseBoardProduct = P8H77-M LE
    BaseBoardVersion = Rev X.0x

    The !sysinfo machineid extension can give motherboard and BIOS information, from here I would be able to check the motherboard documentation to ensure that the hardware is compatible and if any patches have been released to resolve any issues the user may be experiencing.

    2: kd> !sysinfo cpuspeed
    CPUID:        "Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz"
    MaxSpeed:     3200
    CurrentSpeed: 3192

    The !sysinfo cpuspeed extension enables us to iinvestigate the clockspeed of the processor and if the user has been overclocking their processor. As commonly stated, overclocking can use system instablity and produce excessive heat production which could be affecting the normal operation of the system.

    2: kd> !sysinfo cpuinfo
    [CPU Information]
    ~MHz = REG_DWORD 3192
    Component Information = REG_BINARY 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    Configuration Data = REG_FULL_RESOURCE_DESCRIPTOR ff,ff,ff,ff,ff,ff,ff,ff,0,0,0,0,0,0,0,0
    Identifier = REG_SZ Intel64 Family 6 Model 58 Stepping 9
    ProcessorNameString = REG_SZ Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
    Update Signature = REG_BINARY 0,0,0,0,12,0,0,0
    Update Status = REG_DWORD 2
    VendorIdentifier = REG_SZ GenuineIntel
    MSR8B = REG_QWORD 1200000000

    The !sysinfo cpuinfo extension provides some greater depth into the processor family and model, which is great when reporting a bug to Intel or AMD, since they will be to investigate the issue further and provide any insight into if the error is specific to a certain processor type. You may also noticed that I’m using a different dump file from previous part of this tutorial too!

    The !sysinfo microcode provides similar information to the previous extension:

    2: kd> !sysinfo cpumicrocode
    Initial Microcode Version: 00000012:00000000
     Cached Microcode Version: 00000012:00000000
             Processor Family: 06
              Processor Model: 3a
           Processor Stepping: 09

    Again, simply as personal preference, you will wish to dump the model information about all the processors on the system with !cpuinfo:

    2: kd> !cpuinfo
    CP  F/M/S Manufacturer  MHz PRCB Signature    MSR 8B Signature Features
     0  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
     1  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
     2  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
     3  6,58,9 GenuineIntel 3192 0000001200000000                   21193ffe
                          Cached Update Signature 0000001200000000
                         Initial Update Signature 0000001200000000

    Alternatively, you could use !cpuid which provides the exact same information:

    2: kd> !cpuid
    CP  F/M/S  Manufacturer     MHz
     0  6,58,9  GenuineIntel    3192
     1  6,58,9  GenuineIntel    3192
     2  6,58,9  GenuineIntel    3192
     3  6,58,9  GenuineIntel    3192

    You can gather temperature information about the system through the use !tz and !tzinfo extensions, I won’t directly discuss the purpose of thermal zones and how they work in this tutorial since it would needlessly go out of scope and produce another page of writing. You can find more information about thermal zones in the ACPI documentation or through this discussion thread created by myself and Patrick when we first discovered the extensions.

    2: kd> !tz
    0 - ThermalZone @ 0xfffffa8004073310
      State:         Read                Flags:              0x00000002 Initialized
      Mode:          Active              PendingMode:        Active  
      ActivePoint:   0x00000002          PendingActivePoint: 0x00000002
      Throttle:      0x00000064
      SampleRate:    0x00000000          ThrottleReasons:    0
      LastTime:      0x0000000000000000  LastTemp:           0x00000000 (0.0K)
      PassiveTimer:  0xfffffa8004073340
      PassiveDpc:    0xfffffa8004073380
      OverThrottled: 0xfffffa80040733c0
      Irp:           0xfffffa8004680c80
      Device:        0x00000000
      Thermal Info:  0xfffffa80040733e0
    1 - ThermalZone @ 0xfffffa8003679310
      State:         Read                Flags:              0x00000002 Initialized
      Mode:          Active              PendingMode:        Active  
      ActivePoint:   0x00000000          PendingActivePoint: 0x00000000
      Throttle:      0x00000064
      SampleRate:    0x00000000          ThrottleReasons:    0
      LastTime:      0x0000000000000000  LastTemp:           0x00000000 (0.0K)
      PassiveTimer:  0xfffffa8003679340
      PassiveDpc:    0xfffffa8003679380
      OverThrottled: 0xfffffa80036793c0
      Irp:           0xfffffa8004074310
      Device:        0x00000000
      Thermal Info:  0xfffffa80036793e0

    The !tzinfo extension provides information about a specific thermal zone:

    2: kd> !tzinfo 0xfffffa80036793e0
    ThermalInfo @ 0xfffffa80036793e0
      Stamp:         0x00000007  Constant1:  0x00000001  Constant2:   0x00000005
      Period:        0x0000000a  ActiveCnt:  0x00000000  AffinityEx:  0xfffffa80036793f0
      Current Temperature:                   0x00000bd6 (303.0K)
      Passive TripPoint Temperature:         0x00000ed0 (379.2K)
      Hibernate TripPoint Temperature:       0x00000000 (0.0K)
      Critical TripPoint Temperature:        0x00000ed0 (379.2K)

    PCIe Type Errors:

    As shown before, we are going to dump the error record and then examine the relevant sections. The sections displayed with PCIe crashes are generally more lengthy and complex to understand. I would advise the full use of the PCIe documentation if available.

    3: kd> !errrec 869348d4
    ===============================================================================
    Common Platform Error Record @ 869348d4
    -------------------------------------------------------------------------------
    Record Id     : 01cd07d8bce4740f
    Severity      : Fatal (1)
    Length        : 672
    Creator       : Microsoft
    Notify Type   : PCI Express Error
    Timestamp     : 3/22/2012 3:06:44 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : PCI Express
    -------------------------------------------------------------------------------
    Descriptor    @ 86934954
    Section       @ 869349e4
    Offset        : 272
    Length        : 208
    Flags         : 0x00000001 Primary
    Severity      : Recoverable
    
    Port Type     : Root Port
    Version       : 1.1
    Command/Status: 0x4010/0x0507
    Device Id     :
      VenId:DevId : 8086:340a
      Class code  : 030400
      Function No : 0x00
      Device No   : 0x03
      Segment     : 0x0000
      Primary Bus : 0x00
      Second. Bus : 0x00
      Slot        : 0x0000
    Dev. Serial # : 0000000000000000
    Express Capability Information @ 86934a18
      Device Caps : 00008021 Role-Based Error Reporting: 1
      Device Ctl  : 0107 ur FE NF CE
      Dev Status  : 0003 ur fe NF CE
       Root Ctl   : 0008 fs nfs cs
    
    AER Information @ ffffffff86934a54
      Uncorrectable Error Status    : 00000020 ur ecrc mtlp rof uc ca cto fcp ptlp SD dlp und
      Uncorrectable Error Mask      : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
      Uncorrectable Error Severity  : 00062010 ur ecrc MTLP ROF uc ca cto FCP ptlp sd DLP und
      Correctable Error Status      : 00000000 adv rtto rnro dllp tlp re
      Correctable Error Mask        : 00000000 adv rtto rnro dllp tlp re
      Caps & Control                : 00000005 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
      Header Log                    : 00000000 00000000 00000000 00000000
      Root Error Command            : 00000000 fen nfen cen
      Root Error Status             : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
      Correctable Error Source ID   : 00,00,00
      Correctable Error Source ID   : 00,00,00
    
    ===============================================================================
    Section 1     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ 8693499c
    Section       @ 86934ab4
    Offset        : 480
    Length        : 192
    Flags         : 0x00000000
    Severity      : Informational
    
    Proc. Type    : x86/x64
    Instr. Set    : x86
    CPU Version   : 0x00000000000106a5
    Processor ID  : 0x0000000000000006

    Section 0 of the error record is the only important section of the error record with this form of bugcheck. I’ll start with identifying the device of the bugcheck using the Vendor ID and Device ID fields. The bugcheck indicates the error occurred at the Root Port, and from the PCI Database, we know that the device was the Intel I/O Hub PCIe Root Port. Unfortunately, this information is far too generic and doesn’t point to the exact cause of the bugcheck. From here, we would need to begin checking the PCIe devices which are connected to the system and how they interact with the other components.

    To gather more information regarding the error, we need to investigate the meanings of the error codes shown in the AER; it is important to remember that we will be using the _PCI_EXPRESS_ROOTPORT_AER_CAPABILITY structure. The Uncorrectable Error Status is presented by the _PCI_EXPRESS_UNCORRECTABLE_ERROR_STATUS structure, which contains the bitfields for the error codes shown in the register. If we dump the this structure, then we can see that the captialised letters correspond to the bitfields which have been set to true.

    3: kd> dt pci!_PCI_EXPRESS_UNCORRECTABLE_ERROR_STATUS
       +0x000 Undefined        : Pos 0, 1 Bit
       +0x000 Reserved1        : Pos 1, 3 Bits
       +0x000 DataLinkProtocolError : Pos 4, 1 Bit
       +0x000 SurpriseDownError : Pos 5, 1 Bit
       +0x000 Reserved2        : Pos 6, 6 Bits
       +0x000 PoisonedTLP      : Pos 12, 1 Bit
       +0x000 FlowControlProtocolError : Pos 13, 1 Bit
       +0x000 CompletionTimeout : Pos 14, 1 Bit
       +0x000 CompleterAbort   : Pos 15, 1 Bit
       +0x000 UnexpectedCompletion : Pos 16, 1 Bit
       +0x000 ReceiverOverflow : Pos 17, 1 Bit
       +0x000 MalformedTLP     : Pos 18, 1 Bit
       +0x000 ECRCError        : Pos 19, 1 Bit
       +0x000 UnsupportedRequestError : Pos 20, 1 Bit
       +0x000 Reserved3        : Pos 21, 11 Bits
       +0x000 AsULONG          : Uint4B

    Since the Suprise Down (SD) error bitfield is the also one which has been set, then we can investigate further into what a exactly a Surprise Down error is. In short, it indicates a loss of connection between two devices, although, I will give a slightly more detailed defintion with the use of the PCIe documentation. I’ve added the section numbers for reference.

    A Surprise Down error occurs when a TLP (Transaction Layer Protocol) request packet is sent numerous times to a device across a link, and then device doesn’t respond positively. TLP’s are similar to IRPs and are present within the Transaction Layer (Section 2) of the PCIe topology, which is responsible for issuing and responding to TLPs.

    For those experienced with debugging, you can imagine this situation as a Stop 0x9F, a IRP is sent but becomes stuck for some unknown reason. Once a threshold can be met, then the link is considered to be inactive or malfunctioning and thus a bugcheck is raised to alert the operating system of this error. The best methodology for this type of error would be to investigate the connections of the devices on the motherboard; check for any loosely seated cards and dust which may have built up inside the slots.

    Moreover, simply as a matter of interest, the Bus Number, Device Number and Function Number are used to map a device into the PCI Configuration Space. We use can the !pci extension to view such information, but please note that you will require a live debugging session with a x86 computer.

    lkd> !pci
    PCI Segment 0 Bus 0
    00:0  1022:1510.00  Cmd[0006:.mb...]  Sts[0220:.6...]  AMD Host Bridge  SubID:1022:1510
    01:0  1002:9806.00  Cmd[0407:imb...]  Sts[0010:c....]  ATI VGA Compatible Controller  SubID:103c:3387
    01:1  1002:1314.00  Cmd[0006:.mb...]  Sts[0010:c....]  ATI Class:4:3:0  SubID:103c:3387
    04:0  1022:1512.00  Cmd[0004:..b...]  Sts[0010:c....]  AMD PCI-PCI Bridge 0->0x1-0x1
    11:0  1002:4394.00  Cmd[0007:imb...]  Sts[0230:c6...]  ATI Class:1:6:1  SubID:103c:3387
    12:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
    12:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
    13:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
    13:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
    14:0  1002:4385.42  Cmd[0403:im....]  Sts[0220:.6...]  ATI SMBus Controller  SubID:103c:3387
    14:2  1002:4383.40  Cmd[0006:.mb...]  Sts[0410:c....]  ATI Class:4:3:0  SubID:103c:3387
    14:3  1002:439d.40  Cmd[000f:imb...]  Sts[0220:.6...]  ATI ISA Bridge  SubID:103c:3387
    14:4  1002:4384.40  Cmd[0407:imb...]  Sts[02a0:.6...]  ATI PCI-PCI Bridge 0->0x2-0x2
    15:0  1002:43a0.00  Cmd[0007:imb...]  Sts[0810:c..A.]  ATI PCI-PCI Bridge 0->0x3-0x6
    15:1  1002:43a1.00  Cmd[0007:imb...]  Sts[0010:c....]  ATI PCI-PCI Bridge 0->0x7-0x7
    16:0  1002:4397.00  Cmd[0016:.mb...]  Sts[02a0:.6...]  ATI USB Controller  SubID:103c:3387
    16:2  1002:4396.00  Cmd[0016:.mb...]  Sts[02b0:c6...]  ATI USB2 Controller  SubID:103c:3387
    18:0  1022:1700.43  Cmd[0000:......]  Sts[0010:c....]  AMD Host Bridge
    18:1  1022:1701.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
    18:2  1022:1702.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
    18:3  1022:1703.00  Cmd[0000:......]  Sts[0010:c....]  AMD Host Bridge
    18:4  1022:1704.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
    18:5  1022:1718.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
    18:6  1022:1716.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge
    18:7  1022:1719.00  Cmd[0000:......]  Sts[0000:.....]  AMD Host Bridge

    The first column indicates the device number and the second column shows the function number of that particular device. We can gather even further information using the !pcitree extension which shows all the devices which have enumerated on the bus:

    lkd> !pcitree
    Bus 0x0 (FDO Ext 86688ae0)
      (d=0,  f=0) 10221510 devext 0x8665cc10 devstack 0x8665cb58 0600 Bridge/HOST to PCI
      (d=1,  f=0) 10029806 devext 0x8665c738 devstack 0x8665c680 0300 Display Controller/VGA
      (d=1,  f=1) 10021314 devext 0x866720e8 devstack 0x86672030 0403 Multimedia Device/Unknown Sub Class
      (d=4,  f=0) 10221512 devext 0x86672c10 devstack 0x86672b58 0604 Bridge/PCI to PCI
      Bus 0x1 (FDO Ext 86680d18)
        No devices have been enumerated on this bus.
      (d=11, f=0) 10024394 devext 0x86672738 devstack 0x86672680 0106 Mass Storage Controller/Unknown Sub Class
      (d=12, f=0) 10024397 devext 0x866730e8 devstack 0x86673030 0c03 Serial Bus Controller/USB
      (d=12, f=2) 10024396 devext 0x86673c10 devstack 0x86673b58 0c03 Serial Bus Controller/USB
      (d=13, f=0) 10024397 devext 0x86673738 devstack 0x86673680 0c03 Serial Bus Controller/USB
      (d=13, f=2) 10024396 devext 0x866740e8 devstack 0x86674030 0c03 Serial Bus Controller/USB
      (d=14, f=0) 10024385 devext 0x86674c10 devstack 0x86674b58 0c05 Serial Bus Controller/Unknown Sub Class
      (d=14, f=2) 10024383 devext 0x86674738 devstack 0x86674680 0403 Multimedia Device/Unknown Sub Class
      (d=14, f=3) 1002439d devext 0x866750e8 devstack 0x86675030 0601 Bridge/PCI to ISA
      (d=14, f=4) 10024384 devext 0x86675c10 devstack 0x86675b58 0604 Bridge/PCI to PCI
      Bus 0x2 (FDO Ext 86685888)
        No devices have been enumerated on this bus.
      (d=15, f=0) 100243a0 devext 0x86675738 devstack 0x86675680 0604 Bridge/PCI to PCI
      Bus 0x3 (FDO Ext 866853e0)
        (d=0,  f=0) 14e44727 devext 0x86a787c8 devstack 0x86a78710 0280 Network Controller/'Other'
      (d=15, f=1) 100243a1 devext 0x8667c0e8 devstack 0x8667c030 0604 Bridge/PCI to PCI
      Bus 0x7 (FDO Ext 8668fea8)
        (d=0,  f=0) 10ec8168 devext 0x86a7dc10 devstack 0x86a7db58 0200 Network Controller/Ethernet
      (d=16, f=0) 10024397 devext 0x8667cc10 devstack 0x8667cb58 0c03 Serial Bus Controller/USB
      (d=16, f=2) 10024396 devext 0x8667c738 devstack 0x8667c680 0c03 Serial Bus Controller/USB
      (d=18, f=0) 10221700 devext 0x8667d0e8 devstack 0x8667d030 0600 Bridge/HOST to PCI
      (d=18, f=1) 10221701 devext 0x8667dc10 devstack 0x8667db58 0600 Bridge/HOST to PCI
      (d=18, f=2) 10221702 devext 0x8667d738 devstack 0x8667d680 0600 Bridge/HOST to PCI
      (d=18, f=3) 10221703 devext 0x8667e0e8 devstack 0x8667e030 0600 Bridge/HOST to PCI
      (d=18, f=4) 10221704 devext 0x8667ec10 devstack 0x8667eb58 0600 Bridge/HOST to PCI
      (d=18, f=5) 10221718 devext 0x8667e738 devstack 0x8667e680 0600 Bridge/HOST to PCI
      (d=18, f=6) 10221716 devext 0x8667f0e8 devstack 0x8667f030 0600 Bridge/HOST to PCI
      (d=18, f=7) 10221719 devext 0x8667fc10 devstack 0x8667fb58 0600 Bridge/HOST to PCI
    Total PCI Root busses processed = 1
    Total PCI Segments processed = 1

    The D represents the device number, the F represents the function number, and the first block of highlighted charachters indicates the Device ID with the subsequent block of characters being used to show the Vendor ID of the device.

    lkd> !devext 0x8665cc10
    PDO Extension, Bus 0x0, Device 0, Function 0.
      DevObj 0x8665cb58  Parent FDO DevExt 0x86688ae0
      Device State = PciStarted
      Vendor ID 1022 (ADVANCED MICRO DEVICES)  Device ID 1510
      Subsystem Vendor ID 1022 (ADVANCED MICRO DEVICES)  Subsystem ID 1510
      Header Type 0, Class Base/Sub 06/00  (Bridge/HOST to PCI)
      Programming Interface: 00, Revision: 00, IntPin: 00, RawLine 00
      Possible Decodes ((cmd & 7) = 7): BMI   Capabilities: Ptr = <none>
      Logical Device Power State: D0
      Device Wake Level:          Unspecified
      WaitWakeIrp:                <none>
      Device Requirements structure has changed size.  Update extension.
      Device Resources structure has changed size.  Update extension.
      Interrupt Requirement: <none>
      Interrupt Resource: <none>

    The !devext extension can provide some additional information about the device, which is useful for debugging purposes. There another field within the error record which I had forgotten to mention, and that is the Class Code register.

    The register is very useful for identifying the type of device which could be causing the problem. The register is divided into three different parts: Class, Sub-Class and Prog. I/F. From our example, we can see that the class is 0x3, the sub-class is 0x4 and the Prog. I/F is 0x0. If we were to check the meanings of these values, then we would reach the conclusion a display controller had reported the error to the operating system.

    0x3 is the class number for display controllers, and the sub-class points to general category of display controllers. This makes sense in the context of this dump since the issue lied with a TV tuner card (the dump was previously debugged by Vir Gnarus). A complete list of PCI Class codes can be found here – http://wiki.xomb.org/index.php?title=PCI_Class_Codes

    I hope this tutorial series given a in-depth insight into the internals of a Stop 0x124 and some of the debugging methodologies we could use to debug such bugchecks. I didn’t wish to delve too deeply into the technical details of PCIe and x86/x64 architectures since it would leave the scope of this tutorial and generate too much ‘fluff’ and ‘filler’.

    I hope you enjoyed this tutorial, and if you wish to suggest any amendments or corrections then please comment/post below. Moreover, please note that I’ve created a list of reference material regarding PCI-e and CPU architecture in this thread – Hardware Architecture Documentation Links

    0 / 0 / 0

    Регистрация: 03.03.2022

    Сообщений: 6

    1

    03.03.2022, 21:48. Показов 8036. Ответов 14


    Доброго времени суток. Сразу извиняюсь за воду и лирику, но уж очень люблю лить текст и писать развернуто))
    В общем, на протяжении примерно года боролся с (видимо, но это не точно) проблемой совместимости моего процессора и оперативной памяти. В поисках ответа были объявлены и проведены 2 крестовых похода по просторам интернета, но истины так найдено и не было. Десятки людей с аналогичной проблемой и еще больше вариантов решений, которые носили чисто индивидуальный характер для авторов и тех, кто подключался к обсуждению, либо просто все приводило в тупик и темы закрывались. Перейду дальше сразу к делу.
    С сентября по декабрь 2020 года собрал ПК со следующей конфигурацией:

    Процессор: Ryzen 5 3600x (Gammaxx 400 RGB)
    Материнская плата: Gigabyte Aorus B450 Elite
    Оперативная память: DDR4 HyperX Fury Black DDR4 2x8Gb 3200 МГЦ
    Видео карта: Gigabyte RTX 3060ti
    Диск: Samsung 860 EVO 500 gb
    Б/П: Super Flower Leadex Silver 550W (Многие сейчас подумают, ну и что за умник ставит 550 на данную сборку и будут отчасти правы. Но данный бп ни разу не подвел и в той сборке, которая указана выше. Никаких проблем не возникало в любых нагрузках и любых тестах).
    Корпус: Materexx 70.

    Через какое то время путем долгих тыканий пыканий разогнал оперативку до 3400, все что смог с этих планок выжать с достойными таймингами. Система работала как часы, никаких проблем вообще, со всеми поставленными задачами справлялась. Будь то тяжелые ААА проекты или стресс тесты в Aida64, Cinebench, OCCT и тому подобное.
    Спустя 3-4 месяца родилась идея, а че бы не подкинуть оперу получше и не обновиться до нормальных 3600 МГц. В итоге были приобретены 2 планки по 8 гб G.Skill TradientZ NEO RGB и с этого момента началось веселье.
    После включения профиля ХМР система не хотела стабильно работать от слова совсем. В играх комп клинило, он зависал намертво, при этом не уходя в BSOD и не оставляя дампов памяти. Помогал только ребут с кнопки. В журнале событий после ребута были только ошибки типа:

    Произошла неустранимая аппаратная ошибка.

    Сообщивший компонент: ядро процессора
    Источник ошибки: Machine Check Exception
    Тип ошибки: Cache Hierarchy Error
    ИД APIC процессора: 2

    Стресс тесты сыпали ошибки кэша. И это было только при включённом ХМР В это же время в хате была точно такая же сборка (опера из первой конфигурации переехала как раз в нее), за исключением видяхи, там стояла 1050Ti. Было принято решение впихнуть оперу во вторую сборку и вуалям, все летает идеально. ХМР завелся с полоборота, никаких клинов, зависаний и ошибок в стрессах. С тех пор она и прожила на второй сборке, а в текущую вернулся НуперХ и как то забылось все.
    И вот через какое то время серфя интернеты наткнулся на совет после которого я почувствовал себя полным дегенератом. Писали, что при установке ХМР профиля нужно еще выбрать один из трех предустановленных профилей, для якобы частот. Решил проверить, все воткнул и ничего себе, все заработало. Тесты на без проблем, никаких ошибок, все летает. В это же время снова втянулся в мир Ведьмака 3 и все было замечательно. Ура, подумал я, но не тут то было. Как раз вышел God of war на ПК и я решил в него погонять. И тут комп начал перезагружаться. Решив что проблема в игре, почитал в сети что у нее были какие то проблемы с версией, которая засерала то ли озу, то ли виртуальную память до перезагрузки ПК, в общем вроде как подумал что это нормально и особо не стал разбираться. Пройдя его вернулся в BF1 и тут ХЕРАКС, опять перезагрузка без BSODA и без дампа памяти, а в журнале событий снова мой любимый:

    Сообщивший компонент: ядро процессора
    Источник ошибки: Machine Check Exception
    Тип ошибки: Cache Hierarchy Error
    ИД APIC процессора: 2

    Тесты — все хорошо. Начал опять серфить инет, проверил все что можно, схему питания винды, фаст бут в биосе (Кста биос регулярно проверяю и обновляю). Многие писали, что скорее всего брак камня, но при этом все стресс тесты он проходил без проблем. Мем тесть тоже ставил на ночь, ошибок 0.
    В общем перепробовал все и тут пришло в голову, если это в игре происходит, то го проверим видяху. Скачал Furmark, поставил тест в ФХД и ХЕРАКС, перезагрузка на 20 минуте теста.
    Собственно это анамнез на данный момент есть. Ребят, есть идеи что с этим г делать, а то уже хочется плакать?(

    __________________
    Помощь в написании контрольных, курсовых и дипломных работ, диссертаций здесь



    0



    • #1

    WHEA errors. BSOD in games.

    Computer: MSI MS-7C84
    CPU: AMD Ryzen 9 5950X (Vermeer, VMR-B0)
    3400 MHz (34.00×100.0) @ 3600 MHz (36.00×100.0)
    Motherboard: MSI MAG X570 TOMAHAWK WIFI (MS-7C84)
    BIOS: 1.40, 10/29/2020
    Chipset: AMD X570 (Bixby)
    Memory: 32768 MBytes @ 1600 MHz, 16-19-19-39
    — 16384 MB PC17000 DDR4 SDRAM — G Skill F4-3600C16-16GVKC
    — 16384 MB PC17000 DDR4 SDRAM — G Skill F4-3600C16-16GVKC
    Graphics: GIGABYTE RTX 2070 Super Gaming OC (GV-N207SGAMING OC-8GC)
    NVIDIA GeForce RTX 2070 Super, 8192 MB GDDR6 SDRAM
    Drive: Samsung SSD 970 EVO Plus 500GB, 488.4 GB, NVMe
    Drive: WDC WD20EARX-00PASB0, 1953.5 GB, Serial ATA 6Gb/s @ 6Gb/s
    Drive: OCZ-AGILITY3, 117.2 GB, Serial ATA 6Gb/s @ 6Gb/s
    Drive: Samsung SSD 860 PRO 512GB, 500.1 GB, Serial ATA 6Gb/s @ 6Gb/s
    Sound: NVIDIA TU104 — High Definition Audio Controller
    Sound: AMD Starship/Matisse/Vermeer — HD Audio Controller
    Network: RealTek Semiconductor RTL8125 Gaming 2.5GbE Family Ethernet Controller
    Network: Intel Wi-Fi 6 AX200 160MHz
    OS: Microsoft Windows 10 Enterprise (x64) Build 19041.450 (2004/May 2020 Update)

    Last edited: Nov 17, 2020

    darkhawk


    • #2

    Please use the actual BIOS number. Saying AGESA version doesn’t help a whole lot……

    • #3

    Please use the actual BIOS number. Saying AGESA version doesn’t help a whole lot……

    7C84v14
    Release Date
    2020-11-04
    Description
    — Updated AMD AGESA ComboAm4v2PI 1.1.0.0 Patch C

    Alan J T


    • #4

    Ya it a bit wonky and not working correctly with Memory timings over 3200 and the and 5000 series CPU and what is the rest of you system details.

    • #5

    Ya it a bit wonky and not working correctly with Memory timings over 3200 and the and 5000 series CPU and what is the rest of you system details.

    edited first post

    J00ni3


    • #6

    I have the same problem as well (5800x + B550 gaming edge wifi) anything above 3200mhz gives a lot of WHEA errors. I used to do 3700x + 3800 CL16 but now I’m stuck with 3200 CL14 instead which’s fine at the moment.

    • #7

    Same here.
    Fresh ryzen 5900x
    F4-3600C16D-16GTZR 4×8 gb
    new x570 tomahawk bios:
    7C84v14
    Release Date
    2020-11-04
    Description
    — Updated AMD AGESA ComboAm4v2PI 1.1.0.0 Patch C

    WHEA errors
    BTW try running linpack xtreme — you can’t:D Hardware error detected (stress test)
    no matter Ram settings WHEA errors are present….

    • #8

    Ryzen 5000 like 3000 top out at DDR4-3200 speeds.

    • #9

    Ryzen 5000 like 3000 top out at DDR4-3200 speeds.

    At 3200 cinebench r20 score should be 9700? Single core 610? Rly?

    Last edited: Nov 14, 2020

    • #10

    At 3200 cinebench r20 score should be 9700? Single core 610? Rly?

    Not much can be done about that, a BIOS update should surface soon to perk up compatibility and performance

    Alan J T


    • #11

    Same here.
    Fresh ryzen 5900x
    F4-3600C16D-16GTZR 4×8 gb
    new x570 tomahawk bios:
    7C84v14
    Release Date
    2020-11-04
    Description
    — Updated AMD AGESA ComboAm4v2PI 1.1.0.0 Patch C

    WHEA errors
    BTW try running linpack xtreme — you can’t:D Hardware error detected (stress test)
    no matter Ram settings WHEA errors are present….

    Not much can be done about that, a BIOS update should surface soon to perk up compatibility and performance

    A new Bios was released last night with new memory compatibility it is a Beta bio Fingers crossed and Good luck perhaps mem timings are now fixed.

    At 3200 cinebench r20 score should be 9700? Single core 610? Rly?

    • #12

    Beta bios? do you know where i can find it? nothing on main site.

    Alan J T


    • #13

    Beta bios? do you know where i can find it? nothing on main site.

    Looks like they only released for a couple of boards for Reviewers to test the New GPU with the new CPU from AMD

    • #14

    Usually the beta BIOS is tested for a while before it is released more widely

    • #15

    Bsod
    —————————-
    Machine Check Exception
    Bus/Interconnect Error
    —————————-

    • #16

    make sure you have the latest chipset drivers

    • #17

    make sure you have the latest chipset drivers

    didn’t help

    • #18

    I have lots of spare M.2 SSD and I have several that are available for testing.

    Back when my MB was new, I had problems galore with it but over time it has stabilized

    I have also installed windows fresh with ever ISO that surfaces just to try to get ahead of mangled setups

    • #19

    Reinstall Windows 10 to 20H2 19042.572
    cinebench r20 score:
    19041.450 -single core 610
    19042.572 — single core 635

    • #20

    Microsoft (R) Windows Debugger Version 10.0.19041.1 AMD64
    Copyright (c) Microsoft Corporation. All rights reserved.

    Loading Dump File [F:WindowsMinidump112720-9828-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available
    Symbol search path is: srv*
    Executable search path is:
    Windows 10 Kernel Version 19041 MP (32 procs) Free x64
    Product: WinNt, suite: TerminalServer SingleUserTS
    Built by: 19041.1.amd64fre.vb_release.191206-1406
    Machine Name:
    Kernel base = 0xfffff803`72400000 PsLoadedModuleList = 0xfffff803`7302a3b0
    Debug session time: Fri Nov 27 07:47:56.096 2020 (UTC + 3:00)
    System Uptime: 0 days 11:15:48.909
    Loading Kernel Symbols
    ………………………………………………………
    ……………………………………………………….
    ……………………………………………………….
    ………………..
    Loading User Symbols
    Loading unloaded module list
    …………….
    For analysis of this file, run !analyze -v
    12: kd> !analyze -v
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon.
    Arguments:
    Arg1: 0000000000000000, Machine Check Exception
    Arg2: ffff9887c38c7028, Address of the WHEA_ERROR_RECORD structure.
    Arg3: 00000000bc800800, High order 32-bits of the MCi_STATUS value.
    Arg4: 00000000060c0859, Low order 32-bits of the MCi_STATUS value.
    Debugging Details:
    ——————
    *** WARNING: Unable to verify timestamp for nvlddmkm.sys
    fffff8037300f340: Unable to get Flags value from nt!KdVersionBlock
    fffff8037300f340: Unable to get Flags value from nt!KdVersionBlock
    *************************************************************************
    *** ***
    *** ***
    *** Either you specified an unqualified symbol, or your debugger ***
    *** doesn’t have full symbol information. Unqualified symbol ***
    *** resolution is turned off by default. Please either specify a ***
    *** fully qualified symbol module!symbolname, or enable resolution ***
    *** of unqualified symbols by typing «.symopt- 100». Note that ***
    *** enabling unqualified symbol resolution with network symbol ***
    *** server shares in the symbol path may cause the debugger to ***
    *** appear to hang for long periods of time when an incorrect ***
    *** symbol name is typed or the network symbol server is down. ***
    *** ***
    *** For some commands to work properly, your symbol path ***
    *** must point to .pdb files that have full type information. ***
    *** ***
    *** Certain .pdb files (such as the public OS symbols) do not ***
    *** contain the required information. Contact the group that ***
    *** provided you with these symbols if you need this command to ***
    *** work. ***
    *** ***
    *** Type referenced: hal!_WHEA_PROCESSOR_GENERIC_ERROR_SECTION ***
    *** ***
    *************************************************************************
    fffff8037300f340: Unable to get Flags value from nt!KdVersionBlock
    *************************************************************************
    *** ***
    *** ***
    *** Either you specified an unqualified symbol, or your debugger ***
    *** doesn’t have full symbol information. Unqualified symbol ***
    *** resolution is turned off by default. Please either specify a ***
    *** fully qualified symbol module!symbolname, or enable resolution ***
    *** of unqualified symbols by typing «.symopt- 100». Note that ***
    *** enabling unqualified symbol resolution with network symbol ***
    *** server shares in the symbol path may cause the debugger to ***
    *** appear to hang for long periods of time when an incorrect ***
    *** symbol name is typed or the network symbol server is down. ***
    *** ***
    *** For some commands to work properly, your symbol path ***
    *** must point to .pdb files that have full type information. ***
    *** ***
    *** Certain .pdb files (such as the public OS symbols) do not ***
    *** contain the required information. Contact the group that ***
    *** provided you with these symbols if you need this command to ***
    *** work. ***
    *** ***
    *** Type referenced: hal!_WHEA_PROCESSOR_GENERIC_ERROR_SECTION ***
    *** ***
    *************************************************************************
    KEY_VALUES_STRING: 1
    Key : Analysis.CPU.Sec
    Value: 2
    Key : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on USER-PC
    Key : Analysis.DebugData
    Value: CreateObject
    Key : Analysis.DebugModel
    Value: CreateObject
    Key : Analysis.Elapsed.Sec
    Value: 20
    Key : Analysis.Memory.CommitPeak.Mb
    Value: 82
    Key : Analysis.System
    Value: CreateObject

    BUGCHECK_CODE: 124
    BUGCHECK_P1: 0
    BUGCHECK_P2: ffff9887c38c7028
    BUGCHECK_P3: bc800800
    BUGCHECK_P4: 60c0859
    BLACKBOXBSD: 1 (!blackboxbsd)

    BLACKBOXNTFS: 1 (!blackboxntfs)

    BLACKBOXPNP: 1 (!blackboxpnp)

    BLACKBOXWINLOGON: 1
    CUSTOMER_CRASH_COUNT: 1
    PROCESS_NAME: dota2.exe
    STACK_TEXT:
    ffffdf80`895e1938 fffff803`728b31ca : 00000000`00000124 00000000`00000000 ffff9887`c38c7028 00000000`bc800800 : nt!KeBugCheckEx
    ffffdf80`895e1940 fffff803`6e5015b0 : 00000000`00000000 ffff9887`c38c7028 ffff9887`b80f2200 ffff9887`c38c7028 : nt!HalBugCheckSystem+0xca
    ffffdf80`895e1980 fffff803`729b4d3e : 00000000`00000000 ffffdf80`895e1a29 ffff9887`c38c7028 ffff9887`b80f2200 : PSHED!PshedBugCheckSystem+0x10
    ffffdf80`895e19b0 fffff803`728b4af1 : ffff9887`c35f7b50 ffff9887`c35f7b50 ffff9887`b80f2250 ffff9887`b80f2200 : nt!WheaReportHwError+0x46e
    ffffdf80`895e1a90 fffff803`728b4e63 : 00000000`0000000c ffff9887`b80f2250 ffff9887`b80f2200 00000000`0000000c : nt!HalpMcaReportError+0xb1
    ffffdf80`895e1c00 fffff803`728b4d40 : ffff9887`b6ef38a0 00000000`00000000 ffffdf80`895e1e00 00000000`00000000 : nt!HalpMceHandlerCore+0xef
    ffffdf80`895e1c50 fffff803`728b4285 : ffff9887`b6ef38a0 ffffdf80`895e1ef0 00000000`00000000 00000000`00000000 : nt!HalpMceHandler+0xe0
    ffffdf80`895e1c90 fffff803`728b6a45 : ffff9887`b6ef38a0 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalpHandleMachineCheck+0xe9
    ffffdf80`895e1cc0 fffff803`7290c179 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!HalHandleMcheck+0x35
    ffffdf80`895e1cf0 fffff803`728042fa : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiHandleMcheck+0x9
    ffffdf80`895e1d20 fffff803`72803fb7 : 00000000`00000000 fffff803`72803eec ffff9887`ca94a000 00000000`00000000 : nt!KxMcheckAbort+0x7a
    ffffdf80`895e1e60 fffff803`85e3e06c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiMcheckAbort+0x277
    ffff9586`430ca9f0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nvlddmkm+0x82e06c

    MODULE_NAME: AuthenticAMD
    IMAGE_NAME: AuthenticAMD.sys
    STACK_COMMAND: .thread ; .cxr ; kb
    FAILURE_BUCKET_ID: 0x124_AuthenticAMD_PROCESSOR__UNKNOWN
    OS_VERSION: 10.0.19041.1
    BUILDLAB_STR: vb_release
    OSPLATFORM_TYPE: x64
    OSNAME: Windows 10
    FAILURE_ID_HASH: {92068ce4-dacf-c0d2-ac6c-91e2b9ac5b67}
    Followup: MachineOwner
    ———

    Понравилась статья? Поделить с друзьями:
  • Cabbage ошибка дестини 2
  • Caa50021 ошибка office 365
  • Caa30194 teams код ошибки
  • Ca559 ошибка на коматсу
  • Ca428 ошибка коматсу