Likely cause of BSODs associated with Desktop Window Manager warnings

09
2014-02
  • RoadWarrior

    My primary PC experiences a BSOD (bug check 124) roughly once a day and has been doing so for several months. These BSODs appear to be related to warnings 500 and 501 in the Windows event log. Both message types say "The Desktop Window Manager is experiencing heavy resource contention". 500 adds "The DWM responsiveness has degraded". 501 adds "Graphics subsystem resources are over-utilized. A consistent degradation in frame rate for the DW".

    After checking that the graphics driver was up-to-date, I replaced the AMD graphics card with a Nvidia card from another machine. Although replacing the graphics card is expensive, I thought it was the most likely suspect, and it's easier than replacing the motherboard or the power supply. But this has made no difference to the problem. Still the same warnings 500/501 and a daily BSOD.

    No hardware events in the event log. No errors or warnings in the device manager. Nothing else unusual that I could find. So I have 3 questions:

    • Any other investigative technique available (short of a voltmeter)?
    • Any alternative to replacing the motherboard and/or the power supply?
    • Any other likely causes for the BSOD?

    EDIT 1: I've run the built-in Windows memory diagnostic twice, and had a clean result both times. But when I ran the Prime95 torture test (blended, lots of RAM testing) twice, it caused the same BSOD both times within 30 seconds. When I ran the Prime95 torture test (small FFTs, RAM not tested much), it ran fine for 10 minutes, although the temperature on a couple of the cores reached a nasty-looking 91C on full boost (33C at idle, ambient temperature 22C). So perhaps a memory hardware or voltage issue.

    EDIT 2: I've changed the memory voltage setting so that it can go as high as 1.6 (from the default of 1.5). The Prime95 blended torture test now runs for 10 minutes without a BSOD, although 3 of the 4 cores reach the terrifying temperature of 98C! I'm going to watch for 500/501 events over the next couple of days.

    EDIT 3: I'm unable to disable the core with the dodgy L2 cache as the bios doesn't allow me to disable specific cores. But changing to a profile with memory voltage raised from 1.5 to 1.6 and over-clocking boost reduced from 4.6 to 4.2 GhZ appears to have eliminated the BSODs.

    System details

    • Motherboard: Asus P8Z68-V LE
    • Graphics: Nvidia GTX 770 2 Gb
    • Power:Corsair 600W
    • CPU: Intel i7 2600K 3.4 GhZ (on-demand to 4.6 Ghz)
    • Cooling: Noctua NH-D14
    • Memory: 16 Gb PC3-10666 1333MHz DDR3
    • OS: Windows 7 Pro with Aero switched-off
    • All device drivers up-to-date. OS fully-patched.
    • Machine is rarely pushed hard - maybe once a month.
  • Answers
  • magicandre1981

    Here's the output of !analyze-v and !errrec for your dump file.

    I'm not that experienced with kernel debugging, but it would be seem that GCACHEL2_ERR_ERR (Proc 0 Bank 8) is a problem with the L2 cache on one of the i7's physical cores.

    Why it does that ... who knows :)

    0: kd> !analyze -v
    *******************************************************************************
    *                                                                             *
    *                        Bugcheck Analysis                                    *
    *                                                                             *
    *******************************************************************************
    
    WHEA_UNCORRECTABLE_ERROR (124)
    A fatal hardware error has occurred. Parameter 1 identifies the type of error
    source that reported the error. Parameter 2 holds the address of the
    WHEA_ERROR_RECORD structure that describes the error conditon.
    Arguments:
    Arg1: 0000000000000000, Machine Check Exception
    Arg2: fffffa800de4e028, Address of the WHEA_ERROR_RECORD structure.
    Arg3: 00000000be200000, High order 32-bits of the MCi_STATUS value.
    Arg4: 000000000005110a, Low order 32-bits of the MCi_STATUS value.
    
    Debugging Details:
    ------------------
    
    
    BUGCHECK_STR:  0x124_GenuineIntel
    CUSTOMER_CRASH_COUNT:  1
    DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT
    PROCESS_NAME:  System
    CURRENT_IRQL:  f
    STACK_TEXT:  
    nt!KeBugCheckEx
    
    
    STACK_COMMAND:  kb
    FOLLOWUP_NAME:  MachineOwner
    MODULE_NAME: GenuineIntel
    IMAGE_NAME:  GenuineIntel
    DEBUG_FLR_IMAGE_TIMESTAMP:  0
    FAILURE_BUCKET_ID:  X64_0x124_GenuineIntel_PROCESSOR_CACHE
    BUCKET_ID:  X64_0x124_GenuineIntel_PROCESSOR_CACHE
    Followup: MachineOwner
    
    0: kd> !errrec fffffa800de4e028
    ===============================================================================
    Common Platform Error Record @ fffffa800de4e028
    -------------------------------------------------------------------------------
    Record Id     : 01cf07525f60f483
    Severity      : Fatal (1)
    Length        : 928
    Creator       : Microsoft
    Notify Type   : Machine Check Exception
    Timestamp     : 1/2/2014 20:45:39 (UTC)
    Flags         : 0x00000000
    
    ===============================================================================
    Section 0     : Processor Generic
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800de4e0a8
    Section       @ fffffa800de4e180
    Offset        : 344
    Length        : 192
    Flags         : 0x00000001 Primary
    Severity      : Fatal
    
    Proc. Type    : x86/x64
    Instr. Set    : x64
    Error Type    : Cache error
    Operation     : Generic
    Flags         : 0x00
    Level         : 2
    CPU Version   : 0x00000000000206a7
    Processor ID  : 0x0000000000000000
    
    ===============================================================================
    Section 1     : x86/x64 Processor Specific
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800de4e0f0
    Section       @ fffffa800de4e240
    Offset        : 536
    Length        : 128
    Flags         : 0x00000000
    Severity      : Fatal
    
    Local APIC Id : 0x0000000000000000
    CPU Id        : a7 06 02 00 00 08 10 00 - bf e3 9a 1f ff fb eb bf
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                    00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
    
    Proc. Info 0  @ fffffa800de4e240
    
    ===============================================================================
    Section 2     : x86/x64 MCA
    -------------------------------------------------------------------------------
    Descriptor    @ fffffa800de4e138
    Section       @ fffffa800de4e2c0
    Offset        : 664
    Length        : 264
    Flags         : 0x00000000
    Severity      : Fatal
    
    Error         : GCACHEL2_ERR_ERR (Proc 0 Bank 8)
      Status      : 0xbe2000000005110a
      Address     : 0x0000000132de9a40
      Misc.       : 0x000000d080034086
    

  • Related Question

    What can cause a BSOD
  • Kells

    I keep getting a blue screen of death (memory parity check error) when trying to watch the olympics online at ctv.ca (silverlight player). I can watch for max 10mins then it goes down.

    So to diagnose the problem I ran memtest86 and the test passed with no errors. Then I tried prime95 (blend) for an hour and had no problems. I tried using a couple different RAM modules and that didn't help.

    What components are most likely causing the BSOD? What else can I do to figure out what the problem is/solve it? If I need to replace parts, what order should I do it in?

    OS: Vista Business 64bit
    MB: Asus p5n32e sli plus
    RAM: Mushkin Silverline Frostbyte PC-6400 (996557)
    CPU: Core 2 Duo E8400
    GPU: ATI HD4800


  • Related Answers
  • geek

    Nowadays it happens mostly because of broken hardware or buggy video drivers.

  • fluxtendu

    You could use Blue Screen Viewer to find the culprit software or driver

  • ryanyama

    What browser version are you using with silverlight? Did you recently install a new version of silverlight player? You can also take a look in the silverlight forums and submit a bug if necessary http://forums.silverlight.net/forums/28.aspx. You should try ripping out silverlight and reinstalling it and see if it resolves your issue.

  • bastibe

    A BSOD is definitely a crashing driver. Nothing else can actually stall the kernel. However, getting a BSOD actually is a good thing, since a BSOD means that the kernel actually cought the problem and stored a crashdump somewhere on your system that you can use to find the actual cause of the problem.

    Note that while the BSOD was caused by a crashing driver this does not necessarily mean that the driver is faulty. Probably, some other software was just using the driver incorrectly and thereby made it crash. With some digging, this is usually recorded in the crashdump, too.

    Reading the crashdump however is rather difficult. Sysinternals has some tool to read crashdumps but honestly, this is not something a non-programmer would want to use. Perhaps someone else can point you to some usable software to that end.

  • Area 51

    Make sure to choose the correct memory voltage setting in the BIOS, speaking of which, update the BIOS to the latest version if necessary.