linux - Kernel panics on numerous identical systems running the same OS image

07
2014-07
  • Dave

    I am trying to diagnose an interesting problem that affects perhaps a dozen or more identical computers. We have an increased rate of kernel panic errors lately, yet I cannot figure out how to diagnose the cause.

    The situation is that we purchased numerous identical machines, and we're running Debian Wheezy on these machines to play flash files and AVI content; they sit all day just playing a series of fullscreen visuals. We've been buying these machines for a while (it's an LG-made signage computer) but in the last month we've had a huge increase in kernel panic errors.

    I've taken a photo of each error and generally they cite a fairly random process each time. It's been ntpd, or mplayer, or any number of other seemingly-unrelated processes. When the crash dump outputs to the screen I cannot see anything that definitively identifies why these crashes might be occurring.

    So I ran memtest86 on the machines, on perhaps 8 of them (all machines that had previously suffered a panic) and found no errors. fsck returns no issues with the filesystem.

    I am asking very humbly, as a person with not much experience dealing with linux crashes, for advice on how to try and identify the source of this problem.

    • Originally it seemed correlated with HDMI output, but we switched to VGA output and after a few days of stability, we had three kernel panics
    • The chip is an i5-520M processor running Intel HD Graphics, so as far as I know it's supported by the Wheezy intel driver in kernel 3.2, but perhaps I am mistaken.
    • The panics DO appear correlated with machines manufactured around the same time, which suggests possibly a hardware problem, but for the life of me I cannot discover it.

    I did a bunch of reading on kdump but I'm having trouble figuring out how to install it on Debian.

    Is there anything else I can try? Any logs I can try and plumb through after a kernel-panic'd machine has been removed from site and returned to my office? I would love to either rule out software or hardware and get closer to an explanation. If we have to return these computers or totally remove them from our operations I'd like to be as informed as possible as to why.

    Apologies for the vagueness of my question, but thank you very much for any help.

  • Answers
  • Dave

    I eventually discovered the answer to this problem.

    Pouring over dmesg logs I realized that in some cases the SSD entries had a different hex string depending on what engine I was connected to.

    Since we were imaging these engines with a standard-sized partition, I hadn't realized it but some of the engines featured 64gb Sandisc U100 SSD's, and some had 32gb drives.

    Only the 64gb versions were suffering kernel panics. So I don't know if it was a problem with our kernel, or the SSD firmware, or what, but our solution is now definitively hardware and we can swap the drives and make everything happy.


  • Related Question

    osx - MacBook Pro kernel panics
  • pingbat

    My laptop has been acting up recently. It's seemed especially slow, video and audio has been very jittery, and it seems unable to do simple stuff like playback YouTube videos without jitter.

    This under-performance has been accompanied by a series of kernel panics and crashes. I am currently using the MacBook as my only development machine after a move while I wait for my iMac to make its way to me in a few weeks time.

    In order to try and diagnose the problem I have:

    1. Performed a memtest on 1900MB of my memory by booting into single user mode (I am not sure of a better way to run memtest on a Mac)
    2. Performed a Disk Verification in Disk Utility by booting from the install disk
    3. Reinstalled OS 10.6, updated, etc.

    None of the above have come up with any errors or improved the situation at all.

    I am at a loss as to what I should do. Any advice or insight is welcome.

    I include some of the logs below:

    http://pastebin.com/GmtiaQJz
    http://pastebin.com/PvmDa7i4
    http://pastebin.com/r4h7iRVu

    Update 1:
    It also seems to be the case that en0 (Ethernet Interface) goes down after about an hour, ifconfig reports all fine but a self assigned ip. It might be an unrelated issue.

    Update 2:
    Now I am seeing weird graphical artifacts. Black/multicolored polygonal shapes on windows etc.


  • Related Answers
  • Rafael

    Kernel panics like the spinlock acquisition error cannot be caused by applications. This is either a hardware defect or a kernel extension malfunction. Especially 3rd-party extensions can have bugs. To get a list of all non-Apple extensions you have to go to the terminal and type the following command: ~ kextstat | grep -v com.apple.

    You’ll get a (probably different) list of extensions as a result:

    Index Refs Address            Size       Wired      Name (Version) <Linked Against>
       48    0 0xffffff7f8121b000 0x46000    0x46000    at.obdev.nke.LittleSnitch (3894) <7 5 4 3 1>
       49    0 0xffffff7f81a87000 0x3000     0x3000     com.rogueamoeba.InstantOn (6.0.2) <5 4 3 1>
       90    0 0xffffff7f81a60000 0x22000    0x22000    com.rogueamoeba.InstantOnCore (6.0.2) <89 5 4 3 1>
      100    0 0xffffff7f80fe4000 0x5000     0x5000     com.logmein.driver.LogMeInSoundDriver (1.0.0) <89 5 4 3>
      105    0 0xffffff7f807c2000 0x5000     0x5000     com.Cycling74.driver.Soundflower (1.5.3) <89 5 4 3>
      124    0 0xffffff7f82c01000 0x18000    0x18000    com.github.osxfuse.filesystems.osxfusefs (2.5.4) <7 5 4 3 1>
    

    Try deinstalling these and test if your system stops crashing. If you’ll still get crashes, it’ll probably be a hardware defect. You can either try to run a hardware test or even better: contact Apple support.

  • Sandeep Bansal

    You are receiving a Spinlock acquisition timed out error.

    Looking at the error, it shows Virtual Box and VMware Services. These two may be the culprits, but you can also see if you have any Safari extensions installed that may be causing the problem. But I would consider looking at VBox and VMware.