linux - System frequently hangs/freezes with all processes in D state

06
2014-04
  • bparker

    When I am browsing the web with chromium (with lots of tabs/images/etc.) or running virtual machines with kvm, occasionally (at least a few times a day) my system will suddenly become unresponsive for several minutes, until the OOM killer is invoked (or I can kill e.g. chromium from the console) and the system returns to normal. However, I'm not convinced that it is simply an out-of-memory problem, because:

    • OOM killer is always invoked even when there is plenty of free RAM (at least 4-700MB out of 4GB, I have no swap), see dmesg output below which shows usage of current processes, it does not go over the limit
    • the majority of all processes instantly become stuck in the D state out of nowhere when the hanging occurs
    • killing e.g. qemu or chromium from the console always returns the system to normal (again, when there is still even free RAM)

    After installing ulatencyd I was able to have just enough system responsiveness to run a few commands from the console during the hang:

    There is only one block device mounted, my SSD, which is at /. There are no network mounts, swap or anything else. When the processes are stuck in D state, the hard drive is still responsive from the console, e.g. touch somefile works quickly

    About the D processes, here is output of ps axl | awk '$10 ~ /D/':

    1 0 36 2 20 0 0 0 conges D ? 0:25 [kswapd0]
    4 0 203 187 20 0 598860 297336 conges Dsl+ tty1 31:44 /usr/sbin/X :0 -auth /run/lightdm/root/:0 -nolisten tcp vt1 -novtswitch
    4 0 14486 1 20 0 146056 21264 conges DLsl ? 11:48 /usr/bin/ulatencyd
    0 1000 16630 566 20 0 976948 171812 conges Dl ? 1:46 /usr/lib/chromium/chromium --    incognito --password-store=kwallet
    1 1000 17740 16639 25 5 1054504 70300 conges DNl ? 0:11 /usr/lib/chromium/chromium --type=renderer --disable-databases --lang=en-US --force-fieldtrials=DeferBackgroundExtensionCreation/Deferred/Prefetch/ContentPrefetchPrefetchOn/Prerender/PrerenderControl/PrerenderFromOmnibox/OmniboxPrerenderEnabled/UMA-New-Install-Uniformity-Trial/Control/UMA-Session-Randomized-Uniformity-Trial-5-Percent/group_12/UMA-Uniformity-Trial-1-Percent/group_68/UMA-Uniformity-Trial-10-Percent/group_09/UMA-Uniformity-Trial-100-Percent/group_01/UMA-Uniformity-Trial-20-Percent/group_02/UMA-Uniformity-Trial-5-Percent/group_12/UMA-Uniformity-Trial-50-Percent/group_01/ --enable-deadline-scheduling --disable-client-side-phishing-detection --disable-gl-multisampling --disable-accelerated-2d-canvas --disable-accelerated-video-decode --channel=16630.30.1536067666
    0 1000 25026 1 20 0 179156 840 conges D ? 0:00 journalctl -rn3
    0 1000 25046 25044 20 0 8 4 conges D+ pts/3 0:00 [awk]
    

    "conges" here seems to be the kernel function congestion_wait, I'm assuming it's waiting on the hard drive or it's some driver bug perhaps?

    And a picture of iotop during the hang, showing very high IO of most processes for seemingly no reason: http://i.imgur.com/ZmVFDQ2.jpg

    The problem has been occurring at least since the last few versions of Linux, but I cannot remember exactly how long this has been happening. I have not noticed it with any other applications besides qemu and chromium, but I also don't do a lot of intensive computing outside of those programs.

    Another interesting tidbit is that my USB mouse always appears to disconnect/re-connect itself sometime during the hang, but no other USB device exhibits this problem (keyboard/webcam/etc. never disconnect, just the mouse).

    Every time the system hangs I get this:

    [477439.679672] usb 3-2.2: USB disconnect, device number 13
    [477441.146367] usb 3-2.2: new low-speed USB device number 14 using xhci_hcd
    [477441.164535] usb 3-2.2: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
    [477441.239936] input: Logitech USB Optical Mouse as /devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.2/3-2.2:1.0/input/input23
    [477441.240216] hid-generic 0003:046D:C05A.000C: input,hidraw1: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:00:14.0-2.2/input0
    

    The hardware is Acer W700 tablet with i5-3337U CPU, 256GB mSATA SSD, 4GB ram. I have another laptop (Panasonic CF-SX1 with i5-M) with the same software setup that does not have this problem.

    Additional info:

    Kernel version: 3.12.8-1-ARCH

    dmesg: http://dpaste.com/1572965/

    Invoking the OOM killer manually via Alt+SysRq+F seems to always bring the system back to normal immediately, it always ends up killing the chromium process with the most memory usage. For example the last time I invoked it, the tab it killed was using 1GB ram while I still had 600MB free (so I'm not sure why OOM killer solves anything).

  • Answers
    Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

    Related Question

    windows 7 - System freezes during boot process
  • slugster

    i have a machine running Win7 Ultimate. It was running fine, then it just froze - all the stuff i was doing was still on the screen, but mouse and keyboard input was ignored, any animation that was happening on the screen stopped, the machine literally just froze.

    So i rebooted (power off button), from then on the machine will reboot, but it ultimately freezes again. The instance when this happens will vary - i have made it as far as the Windows login screen, but mostly it will do the POST, then give me the option to press F1 to continue or Del to enter BIOS settings (but of course pressing a key has no effect - it's frozen!).

    I have disconnected everything not necessary for the boot process, the only peripheral that remains attached is the keyboard. (even the network cable is disconnected). Prior to this the machine was operating fine. The install of Win7 is only 2 days old, and it was a fresh reinstall (i.e. not an upgrade or repair).

    Can anyone give me an indication of what may be wrong here?

    I'm not sure if this question should be here or on SuperUser, please migrate it if i have chosen the wrong board.


  • Related Answers
  • Tom Lorentz

    Well, because it even freezes after the bios screen, we know it is definately a hardware issue.

    The problem is a freezing issue could be any piece of hardware in the machine.

    Get some spare parts and start swapping out, easy ones first. Ram, CDrom, video card if it is seperate from the Motherboard, Power Supply.

    To many possibilities, sorry

    Good Luck

  • Area 51

    Like Tom said, its most likely hardware problems.

    It might be heat related. Let the machine cool down for 30 minutes, and see if you get further.

    It could also be memory issues, try running a memtest or diagnostics.