windows vista - CPU usage shoots to ~50% and stays there until suspended

06
2014-04
  • Daniel R Hicks

    This is the Windows Vista problem, seen, I think, mostly on dual processor laptops, where % CPU suddenly shoots to about 45% and stays there. Once this has happened, % CPU will never return to normal on its own, though, curiously, "sleeping" the box for a few seconds will reset the condition.

    I've seen this happen many times (Sony VGN-CS215J laptop with Intel dual core CPU) when the box is sitting there doing nothing, with only the normal 2-3% background CPU, then suddenly -- BOOM!

    "Process Explorer" shows that the CPU in one of these episodes is being consumed by "Interrupts", rather than any specific program.

    It is definitely the case that this condition is "real", and not just a problem with CPU metering. When it occurs the box slows down, and sometimes particular applications slow to a crawl (tasks that would take ten seconds take ten minutes, eg). In addition, on my laptop the fan takes off at high speed.

    Google searches show that this is a fairly common problem, and many supposed "causes" have been "identified", though they always turn out to be false leads. The problem tends to come and go (it appears that the likelihood of it varies from IPL to IPL, with some IPLs hardly ever doing it and others doing it every ten minutes), so it's easy to get the false impression that the problem has been "cured", only to have it come back.

    Microsoft, of course, denies all knowledge of the problem, even though it occurs on several different brands of system.

    One clue I have is that it doesn't seem to happen when my laptop is running on battery (though of course with the variability of the symptom it's hard to say this with any certainty). But I tried playing with the CPU speed controls (under advanced power options) and that didn't cure it.

    Update 1:

    I've checked several times, and there aren't any new drivers available for my box. (There is a new display driver, but Sony hasn't respun it with their special hooks, so it won't work on this box.)

    I don't see that "walking the stack" would do any good since the "looping" is in interrupts, not any specific process. I suppose I could try to do an interrupt trace, but it would likely take a lot of time that I don't have.

    Update 2:

    Update: Today I experienced the failure while running on battery, the first time that has happened. So I know of no conditions that prevent the failure.

    Re turning off Windows services such as search indexing, I did that a long time ago.

    Update 3: (5/21/11)

    On a whim I unplugged the network cable and have been running wireless at home and at work for the past two days. (I don't generally like to run wireless if I don't need to since I figure there's already too much RF pollution.) No episodes have occurred. Weird.

    Update 4: (5/30/11)

    I've been running for the past 11 or so days, using wireless only. (Not something I normally like to do, since I feel there's too much RF pollution already and no need to add more when a wired connection is available.) And for the past 11 days I've not had an "incident" -- by far the longest incident-free time I've seen. In a day or two I'll start plugging in again and see what happens.

    Update 5: (6/2/11)

    As a result of a wireless router outage at work, I had to use the wired connection there for two days, and the old behavior (40% or so "events" after 30-60 minutes of up-time) returned. Curious thing, though: On both days, when I brought the laptop home and connected to wireless, the problem would recur within a few minutes. But once I did a "sleep" and "reawaken" the problem would be permanently gone.

    To bring the laptop home I'd sleep it, but somehow the "bug" survived through that. Or, quite possibly, the wired interface didn't get reset until after reawakening, and it did something nasty during those few seconds.

    Just for reference, the wired adapter is a "Marvell Yukon 88E8040 PCI-E Fast Ethernet Controller". It would be interesting to know if the same adapter is associated with other cases of this problem.

    Update 6: (6/6/11)

    I'm beginning to suspect that somehow the wireless adapter is the culprit. When it's turned off it can somehow corrupt the system. I say this because the router at work is a little "funky" and I sometimes have to turn the wireless off and back on (via a mechanical switch on the front of the laptop) to get a connection. When I do this, inevitably within a few minutes (not immediately) I get the interrupts back. Sleeping and reawakening the laptop clears the interrupt problem, seemingly permanently (until the next time the wireless is turned off). For the record, the wireless adapter is a "Intel(R) WiFi Link 5100 AGN", though it could be more of a problem with the way the switch is implemented.

    Update 7: (7/5/11)

    I've been running for over a month now on the wireless network adapter (vs hardwired) and the problem has essentially gone away. A few times (due to losing connectivity for some reason) I've turned the adapter off for several seconds and then back on, to reset it. In all but one of these cases, as good as I can remember, I got the 50% CPU problem after the off/on cycle, though, curiously, in several cases the problem didn't appear for 30 minutes or more after the off/on.

    Update 8: (7/18/13)

    About 10 months ago I had to completely restore my system from backup, and since then I've not seen the 50% cpu problem. (Haven't tried to deliberately provoke it, but the radio has been accidentally turned off on several occasions.) Of course, no Windows bug ever goes away completely, so now I have a problem with Open Office crashing, but I guess I can live with that.

  • Answers
  • Mark Sowul

    Take a look at the Windows Performance Toolkit: http://blogs.msdn.com/b/pigscanfly/archive/2009/08/06/stack-walking-in-xperf.aspx

    My money's on crappy drivers.

    I had this happen with crappy Broadcom (that's redundant) network drivers.

  • Bacon Bits

    I would suspect incorrect/bad drivers, flawed BIOS, or outright failing hardware in that order. It's very, very unlikely that this is a problem with Vista itself. Interrupts of the sort showing up in Process Explorer are the result of programs having non-fatal errors or deadlocked conditions while running in real mode. At the top of my list would be Sony's drivers for the swichplate buttons (those above the keyboard) and the keyboard special functions. Having worked with Sony's software before, I can honestly say that it's total crap.

  • harrymc

    I would try to turn off unwanted Windows services, and most notably Windows Search.

    The most authoritative list of services that can be tweaked is on Black Viper's Website.

    You could also have a look at TweakHound's Vista Services Recommendations.

    Create a system restore point before starting. You could also use Autoruns as your tool, since it can save the current configuration on a text file and restore it later.

  • Tom Wijsman

    Can you please run this procedure for about 10 seconds during the 50% CPU?

  • Mehrdad

    I doubt it's the issue, but it kind of sounds like a DMA problem...

    Go to Device Manager, expand IDE ATA/ATAPI controllers, double-click your hard disk controller, and to to the second tab (I think it was called "advanced"). Is DMA turned on?

  • Daniel R Hicks

    Answer my own question so I can close this out. The precise cause of the problem is unclear, but I can prevent it by keeping the "radio" turned on, even when operating off of cable.


  • Related Question

    Cpu usage in the output of top
  • Tim

    I am looking at the output of top.

    top - 16:11:19 up 31 days, 2:37, 10 users, load average: 17.01, 16.99, 17.00

    Tasks: 470 total, 18 running, 452 sleeping, 0 stopped, 0 zombie

    Cpu(s): 76.5%us, 0.0%sy, 0.0%ni, 23.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

    Several questions about CPU usage:

    (1) is "us" part in the third line same as "load average/number of cores" or something I heard people mentioned "CPU utilization"?

    (2) for the us part, "man top" says

    us -- User CPU time

         The time the CPU has spent running users’ processes that are not niced.
    

    What are "processes that are not niced"?

    (3) some says "CPU utilization" is a better measurement than load average. So how to get "CPU utilization"? If I sum up the %CPU column for all processes, is that "CPU utilization" or something else different than "CPU utilization * number of cores" and load average?

    Thanks and regards!


  • Related Answers
  • A Dwarf

    First, a couple of sources:
    * Top: Linux Command,
    * nice article on top usage patterns.

    (1) is "us" part in the third line same as "load average/number of cores" or something I heard people mentioned "CPU utilization"?

    the "us" field shows CPU Time in User Mode. See CPU Modes.

    What are "processes that are not niced"?

    A process "niceness" is an internal numeric value that essentially defines how nice a process is being to the CPU. A low priority process, that sleeps and takes very few processing power (cycles) when active is a "nice" process.
    Niceness can be positive or negative. A negative niceness is a process that is demanding more priority and taking more cycles. Not nice at all. A positive niceness is a process taking few cycles, sleeping most of the time and having low priority. Very nice. See Nice.

    So the "us" field shows you how much CPU time the negative niceness (the not nice) user mode processes are taking. To see the "niced" processes look at the "ni" field.

    (3) some says "CPU utilization" is a better measurement than load average. So how to get "CPU utilization"? If I sum up the %CPU column for all processes, is that "CPU utilization" or something else different than "CPU utilization * number of cores" and load average?

    To see the CPU utilization, look at the "id" field. This is the idle time. CPU utilization is thus, 100 - id.