memory - ECC errors, no Mem Test Errors

06
2014-04
  • Damon

    I recently purchased an HP XW6400 workstation (Dual CPU, Quad memory channels) Along with the computer I purchased 2 sticks of RAM that are the same brand and look, but not matching (numbers and secondary stickers don't match, but they were suppose to be matching) and 2 Xeon 5160 CPU's. After putting it all together I had regular ECC corrections that were noted on start up so I bought more RAM sticks that were matching; after installing the next set of memory I got the same errors. So I bought a motherboard and I still got the same errors. The memory controller is not integrated into the processor so I have't paid to much attention to them. I run memtest for a quick 2 hour run on each stick individually and no errors come up on any of the sticks. But I still get ECC corrections on many reboots. Some times it notes it corrected the errors, other times they are a fatal uncorrectable errors.

    They are warm little things, so I turned the fan right above them over so the fan blows at them. Northbridge is cooled by a fan as well. Temps via hardware monitor all seem normal.

    Further, if I put all 4 sticks in, it will lock up within minutes of starting almost every time. Where with 2 sticks, it almost never locks up (used it for 2 weeks before I bought a new board); it just notes the ECC corrections or errors on reboot.

    All memory is DDR2 5300F Fully buffered ECC memory.

    The first set is HP memory, but by the numbers and stickers, they are not a matched pair, but at first glance they look the same. most of the numbers are equivalent too. But they are manufactured in different parts of the world (Singapore and Puerto Rico)

    The second set is Kingston memory but it is a matched pair.

    My hypothesis is that the Kingston memory is having compatibility issues in dual channel mode, and the HP memory is not a matched set which causes issues in dual compatibility mode, and all four together is a compatibility nightmare for quad channels so it locks up. But really, I am just stabbing in the dark. Any ideas?

  • Answers
  • Ramhound

    I think there was a bad BIOS and a bad CPU working in conjuction with each other and I think the memory, while not ideal, is not really the major issue. hence the stabbing in the dark comment.

    In the past, I had a CPU Front Side Bus Error intermittently that I was attributing to memory or motherboard issues. I found an HP document that says the original BIOS actually has issues and to update so I updated the BIOS.

    Then things ran a little better in that I could run with all 4 sticks of memory without crashing, So next I tried troubleshooting the CPUs by running a multitask "test" from passmark on the system which wrote to the memory, and ran prime numbers, and ran the dry and whetstone test all at the same time. Before that, during all the fiddling, I purposefully had swapped CPU locations just in case the FSB error came up again. It very quickly BSOD'd the computer and would not simply restart. Upon restart, (after having a a hell of a time to get it to restart) it gave me a new error message for the CPU for a front side bus error and an additional sub error for the FSB on the same CPU as the one that had the FSB error in the past (different socket). Plus the computer would freeze while poking around in the BIOS and I could not get it to boot into windows. So I removed the suspected bad CPU, restarted, which worked, and ran the same test again but for longer. No crash, no errors (yet) and every thing so far seems stable.

    Sometimes you win with used stuff, sometimes you lose. I think this officially is one of those losing moments in how much time this has all wasted. Lets just hope that's it for problems.


  • Related Question

    memory - Is PC2700 ECC backwards compatible with PC2100 ECC?
  • hyperslug

    Because if it's not, it should be!
    I'm trying to get 2 x 1GB PC2700 ECC Registered DDR333 DIMMs (Micron brand) to work in my computer, but it won't POST: the computer starts, fans and hard drive spin up, but the monitors don't power up.

    • Motherboard: Supermicro X5-DAE, E7505 chipset
    • CPU: (2) Xeon 2.4 GHz
    • BIOS: 1.3b, the latest

    Crucial recommends PC2700. Corsair even recommends PC3200. So I figured PC2700 would be ok. Admittedly, the manual (section 2-5) states

    The X5DAE/X5DA8 supports up to 12 GB of ECC registered DDR-266/200 (PC2100/1600) memory.

    but manufacturers are usually conservative.

    So here's the Q: Is it supposed to work? Can I prep the BIOS beforehand, change some timings or something, or is this type of thing just a no-go in ECC land?

    If you can find an example of others using PC2700 with this board let me know.


    Update: I removed all cards, USB devices, cables and swapped the AGP video card out with a known good PCI just to mix things up.

    Computer gives 1 beep on startup, which is normal. No other beep codes. Monitor still blank. I've tried the PC2700 pair in all 3 banks, no change. My good RAM is 2 x 512MB PC2100 ECC Crucial brand and works in all 3 banks. I tried putting the PC2100 in bank 1 and PC2700 in bank 2 hoping bank 1 would force everything to PC2100. Still nothing. Anything else I miss?


    This has been resolved with the vendor as a high density memory incompatibility issue. Ebay has some literature on it:

    JEDEC standard/guideline specifies that 64Mx8 and 32Mx16 devices are to be used to construct a 1GB Unbuffered module. Any 1GB Unbuffered module constructed by using 128Mx4 device BREAKS all the JEDEC standard/guideline in which is supposed to be designed only for Registered module. Since JEDEC doesn't want the modules to be built that way, so the companies who make them (a lot are generic and 3rd parties), don't put their company label on the Unbuffered modules.


  • Related Answers
  • 8088

    According to Super Micro's "Test Memory List" for the motherboard, PC2700 and PC3200 is supported.

    • The tested memory for PC2700 is:
      • ATP Electronics AG64L72T8SQB3C 512MB (Qimonda chips -- Qimonda is out of business)
    • The tested memory for PC3200 is:
      • Smart Modular Technologies SM6472DDR2N1-1 512MB (Qimonda chips as well)

    When you say the system won't POST, I assume POST code of 00. What POST code are you really getting? If the memory isn't supported then you should get to a POST code in the 20's (28 is a common memory failure POST code). Your motherboard is supposed to try to initialize video and display the POST code on the top line.

    alt text

    Something to check is to make sure the DIMM modules are fully seated.