memory - Hardware error messages from syslogd

06
2014-04
  • Farhat

    I have a 64-core AMD server running CEntOS on which I was running a long job. In the midst of the output, I see these lines. It appears to be a memory error. How severe is this and what exactly does it indicate?

    Message from syslogd@heracles at Nov  7 21:00:02 ...
     kernel:[Hardware Error]: MC4_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdc10410040080a13    
    
    Message from syslogd@heracles at Nov  7 21:00:02 ...
     kernel:[Hardware Error]: Northbridge Error (node 4): DRAM ECC error detected on the NB.
    
    Message from syslogd@heracles at Nov  7 21:00:02 ...
     kernel:[Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)
    
  • Answers
  • Hennes

    on the NB

    The NB is the North Bridge. Old computers used many chips. Eventually these got integrated in about 3 larger generic chips (386/486 time) and later in two. One of those dealt with the CPU, the RAM and other high speed devices. The other ('South bridge' dealt with slow peripherals).

    DRAM ECC error detected

    Dynamic memory is just main memory (as opposed to cache which is usually made from static memory). ECC is memory which is designed to detect and correct single bit corruption.

    The message you get is that the NB tried to read some memory, but detected that it was partially corrupt.

    In that case it can either shut down the machine (remember the old fashioned `Parity error: System halted'), or it can correct it, or it can ignore it. In this case it seems to have corrected it and it threw a warning.


    A single error on memory is no reason to panic. These things happen. Rarely, but they do happen. And with ECC you get a proper warning rather than unexplained crashes or corrupt data.

    In extremely fast environments (e.g. on die accessed cache) they are not even that uncommon. Usually the computer will retry and correct itself. If that fails it will throw a MCE.

    If these things keep occurring: Check if the DIMMS are seated properly. Did they collect a lot of dust? Do they pass memtest? Etc etc.


  • Related Question

    Is it possible to purchase hardware to add Memory to a motherboard?
  • Toby Allen

    Is there a piece of hardware that I can buy that will allow me to use more than the two slots my machine provides for memory? If so will it run as fast on it?


  • Related Answers
  • Nifle

    No.

    There is no way to do that period.

    If you have vista this might give you some performance gains using a USB memory stick:

    http://en.wikipedia.org/wiki/ReadyBoost

  • pcapademic

    I think no. The memory is connected to the CPU by the "Northbridge" of the chip set that supports the CPU:

    http://en.wikipedia.org/wiki/Northbridge_(computing)

    There are some applications that use the video processor and associated memory to do complex computations. Also, there were in the 1990's some motherboard designs by Intel that attempted to be "modular". Outside of that, one would need to try an re-engineer the motherboard and the CPU Chipset.

  • jerryjvl

    I would imagine that even if it existed in the form you are looking for it'd have a hard time beating the piece of hardware called 'motherboard replacement' for price.

    Replacing the motherboard also has the benefit that there aren't going to be any unexpected performance downsides to using a non-standard connection between the processor and whatever memory you manage to add.

  • DHayes

    Gigabyte makes various RAM disk offerings that allow you to add memory modules to your system. This is treated like a solid state drive in essence, so pointing a swap file to it for example would boost performance. It will not be treated as normal RAM to be sure, but it does let you add more, and it can certainly help.

    http://www.gigabyte.com.tw/Products/Storage/Default.aspx