troubleshooting - How do I troubleshoot hardware issues related to a computer freeze/crash?

05
2014-04
  • Questioner

    What are some common guidelines and issues related to hardware being the issue of a computer crash?

    What should I look for and how do I troubleshoot these problems?

    What are some tools that are useful in diagnosing these hardware related crashes?

    I am looking to be able to isolate the problematic device with specific tools and guidelines. For example if device X is causing system failure how do I go about diagnosing it?

  • Answers
  • Area 51

    Is hardware the problem?

    Some problems are obviously hardware related. When your computer doesn't get past POST it's usually a hardware issue (but could be the software in the hardware, which for this answer we will continue to call a hardware issue). Intermittent crashes are difficult to diagnose but for a wide array of problems the steps to troubleshooting are the same.

    Crashes, freezes, lockups, graphical/audio artifacts and poor performance; all symptoms of either a hardware or software fault. General software troubleshooting involves removing new programs (or programs updated/installed around the time the issue started to appear). Updating drivers, installing older drivers, reinstalling the operating system are all potential ways to determine if it is in fact a hardware issue. For the purposes of this guide we will assume it is a hardware issue.

    Troubleshooting 101

    The process of elimination. You could guess wildly that it might be a particular component at fault and start there, but that is bound to fail. Something is wrong and you're sure it's hardware related, but where to start? At the bare minimum. See if the least amount you can operate a computer with will re-create the problem.

    This is much easier on desktops than on laptops. Your options for elimination on laptops generally will consist of removing all but one stick of memory and potentially swapping out the hard drive. But laptops and desktops will be covered separately.

    Notice: The following instructions are somewhat technically oriented. If you've never seen the inside of a computer before take precautions. Use an anti-static wrist strap if you are operating in an area with the potential to have electrostatic discharge. You may also wish to use a grounding adapter cable. As a general rule of thumb, grounding yourself on the metal chassis of your computer before touching any of the internal components is a very good idea.

    Getting to bare bones:

    • Shut down your computer and unplug it from the wall (if you wish setup your grounding safety equipment at this point)
    • Disconnect all of your storage drives, and optical from your motherboard
    • Remove all of your add-in cards including your video card.
    • Remove all of your RAM except for 1 module and make sure that's in the primary slot (DDR1 or something silkscreened on the board).
    • Unplug everything both internal and external from your computer including power, monitor cable, and keyboard, power and reset switches, internal speaker... (really, everything else--mouse, USB things, audio--it makes life much easier).

    At this point you should clean out the dust bunnies. Pick the larger dust piles out by hand, and use compressed air on any heatsinks that look particularly bad. Clean around fans with cotton swabs/cotton buds or a toothpick (not metal, nice soft plastic or wood).

    You should now have a motherboard with a processor, and a stick of ram. That's it. At this point you should also un-plug the power connections on the motherboard. Check to make sure your ram is seated correctly, did the little side tabs lock into place?

    Now I plug in the 24-pin power connection (I know you've not yet removed it because it's in there correctly, but just humor me, pull it out and then plug it back in). The locking tab on the male end should match up with the latch on the female end. There should be no empty holes in that connector. Plug in the 4-pin CPU power if applicable and make sure it's the correct one, the locking mechanism for that plug should also engage. Attach any other auxilary power to your motherboad, some have 4-pin molex connectors.

    Attach your keyboard to the appropriate port (USB or PS2). And connect the monitor via the onboard video, if available. If your monitor won't work with the VGA port just leave it disconnected. Now find your internal speaker and attach it to the appropriate pins (it should say PC SPKR or something silk-screened by a row of pins).

    At this point there should only be power, monitor, keyboard and PC speaker attached to the motherboard (and the power is unplugged at the power supply).

    Reset the CMOS, on some motherboards this is a jumper on other it is a button, check your motherboard manual for details. While resetting it, let the jumper sit for a while. Now, remove your battery. Walk away, make some tea (remember power is unplugged for this). While your tea is brewing, have some cheese toast, you don't need this much time, but it will allow you to relax. Take some deep soothing breaths.

    Now, check all your power connections. Yup, I know you just put those there, but check them anyway.

    Pop the battery back in and plug the power supply into the wall. Make sure the switch on the power supply has the side with the line depressed.

    Now to turn it on. You can't just tap the power button on your case because you don't have the power switch on the case plugged in. If you're confident that is not the problem, go ahead and plug it in. Or you can take a trusty screwdriver, coin, olympic gold medal, or other conductive material and short the two power switch pins on that chassis header near where you plugged in your PC speaker.

    Does it do anything? Does it beep? If you don't have a monitor connected and it beeps one short beep that's good. If you can see wonderful booting things, hooray!

    If it does nothing, that is bad.

    At this point we have a Choose Your Own Adventure story. If things are good, go to the section labeled "Huzzah!" below. If things are going bad go to the "le sad" section below. When adding or removing components make sure that the power is off to your computer before proceeding (if you don't un-plug the computer at least switch the power supply to "off").

    Huzzah!

    You know your base system works at this point. Turn off the computer and plug in the rest of your chassis header things and reboot it, just to make sure one of them aren't fouling things up.

    It still works, yes?

    Remember, turn off the computer before adding components.

    Good, let's start with say, more memory, because if at bare-bones it works, memory will probably break it (not because your memory is bad, just because sometimes things are wonky). Does it still work with your memory? All of it? If not, go back to one stick and enter the BIOS, set the memory voltage and timings to the manufacturer recommended settings.

    Okay, so you're loaded up with your memory and CPU, now lets get that video card in there. Make sure to put it in the appropriate slot, if it is a PCIE card in the large PCIE slots. If

    You should make sure the card is pushed all the way down, many people don't get their cards in all of the way with the first few times of building their computer. You should probably lock it in place, those fan reverberations can wiggle things loose. Make sure it's screwed down as well.

    Does it still boot? If so, that's good, now attach the rest of the items one by one, powering off in between adding an additional component. You'll either track down the problem or not. If it works with everything connected, congratulations you fixed it!

    If after adding your video card it doesn't boot a couple of things could be happening.

    1. A bad video card
    2. A bad motherboard
    3. Insufficient power

    Try using the card in a different system to rule out #1. You could/should also borrow someone's card to try it in your board.

    Le sad

    • Swap that stick of memory for the other one (check to see if it's still a problem)
    • Now, at this point your fans should be clear, and your heatsinks look like new, right?

    This is where you are left with few culprits. It could be your processor or motherboard. It could be a heat issue or an issue with your power supply, a short in the keyboard (I've seen it), short in the monitor cable (seen it), short in the power cable (this involves fires and melting, you should notice that). In order I would try the following:

    • swap out power cable, keyboard, monitor (one at a time, in that order, because you probably have a bunch of power cables, and probably fewer of the other two)
    • remove the heatsink from the processor, clean the processor top and heatsink until your heatsink looks new. apply thermal paste (remember a very small amount). Cod liver cream, or zinc oxide sunscreen work for a very temporary, risky fix (not recommended). Reseat everything careful here, if you've not done it before, read up and call a buddy, it's easy to snap things off of the heatsink, and forcing the CPU will ruin your day.
    • swap out the power-supply
    • find a buddy with a compatible motherboard and either try your chip in his board, or his chip in yours, try again.
    • remove the motherboard from the case, and the powersupply, set it up on a hunk of plywood after wiping it down. shake the empty case out. run stuff with the motherboard sitting on the plywood.
    • request additional help on superuser in this question

  • Related Question

    Troubleshooting computer hardware
  • GiddyUpHorsey

    I have a machine that has been running fine for the last few years as a home server and now it has problems with freezing and restarting continuously.

    I want to figure out what the problem is with the machine.

    The specs are:

    • AMD Sempron CPU
    • 2GB RAM
    • Windows 2000 Server
    • 20GB HDD (C:)
    • 2x 250GB HDDs in Raid 1 configuration (D:)

    It's hard to diagnose the problem from within Windows because the machine could restart of freeze at any time.

    I've run Memtest86 from a bootable CD and that worked fine - no errors after 13 hours and 11 passes, but trying to boot from an Ubuntu live CD didn't work when I tried it. It halted with a text dump screen of some sort displaying the IP and some memory addresses (can't quite remember exactly what was one there).

    Is there some program I can run from a bootable CD that will diagnose the problem and tell me which piece of hardware is faulty?

    What can I do to quickly diagnose the problem?

    Update

    I finally repaired the machine. Although the PSU had tested ok with the PSU tester, a few days later when swapping out the motherboard it blew up with a pop and a cloud of sparks and smoke.

    I replaced the blown PSU with an Antec EarthWatts 500W PSU and was able to get the system functioning.

    The graphics card was also a bit dodgy so I ordered in a second hand replacement.

    So in summary, I replaced the following parts:

    • PSU
    • Motherboard
    • AGP graphics card

    and was able to get the machine running. The system test software that I used was unable to detect the faulty hardware. Luckily before hooking up the SATA drives I did a backup because setting up RAID on the new motherboard wrecked the data. I had to fiddle around with drivers and Windows to get everything working correctly.


  • Related Answers
  • boot13

    There are plenty of PC diagnostic programs out there, but they all suffer from the same limitation: in order to run the program, most of the important components of a PC have to already be working properly. It's kind of a catch-22. A truly useful diagnostic for a PC would consist of hardware that is separate from and monitors the main PC hardware.

    If, as you say, the computer has been working fine for a while, and nothing on the computer has changed recently, then it sounds like the power supply has gone bad. A bad PSU can cause all kinds of weird, intermittent problems. Normally for random lockups I would suggest display problems, but if this is a server, that means you don't use it directly in normal use - is that correct?

    PC PSUs go bad or flaky all the time. Depending on where you live, the AC coming into the building may not be too reliable and typical PSUs are sensitive to major fluctuations. A lightning strike nearby could certainly do the trick. The protection offered by a typical power bar is not worth much.

    You can buy a power supply tester (about $30), swap the PSU with a spare if you happen to have access to one, or buy another PSU ($100) and swap it.

  • Doug J

    strip the hardware down to bare essentials - disconnect the raid drives, take out the nic, disable/remove the audio card, reduce ram to minimum (if there's two sticks, take one out), unplug all peripheral devices. see if you can boot from a very recent linux live cd (ie ubuntu 10.04). use the boot option on the cd to launch in vga mode. let it run intil you're satified its stable, then start adding back one device at a time. Since its a server and you said the Linux error mentions the IP address, I'd start with the NIC.