How to tame Linux responsiveness, memory, and paging

24
2013-08
  • user76871

    First question on overflow =)... +100 bounty. Couldn't think of something I really cared about until now:

    I'm really fed up with the state of Linux desktop responsiveness, e.g. http://brainstorm.ubuntu.com/item/85/ -- in situations with low free-RAM, or situations with high disk throughput, the system slows to a crawl; this is absolutely terrible for applications which require decent performace. Additionally, the UI is completely unresponsive. Compare this for example with OS X, where if an application is hogging resources, one can always option-click to Force Quit it, whereas in Linux I cannot even alt-tab or switch desktop, or even ctrl-alt-f1 to get a terminal -- well I can, it just takes about 1-2 minutes per operation.

    I use gkrellm so I can see the situation as it unfolds. Typically the memory utilization becomes pretty high, or the disk throughput jumps dramatically.

    It's not bad hardware, with a 2.6GHz quad-core and 4GB of 800MHz DDR2 RAM (would have had 6GB, but due to a hardware incompatibility couldn't mix-and-match with old set). This problem may go away when I inevitable get more RAM, but I don't feel that's the heart of the problem. I even have two swap partitions on different disks.

    I feel the problem is threefold:

    • runaway programs that hog up massive amounts of memory -- the law must be laid down for these programs, with limits on their
      • (e.g. tabs on Chrome, each of which is 20-50MB, some of which can use hundreds of MB)
      • (e.g. other programs like update-db and indexers which I've had to disable and remove from cron because they were slowing the system to a crawl whenever they ran, etc.)
    • something terrible going in the kernel or bus contention of some sort, such that high-disk-throughput situations slow the entire system to a crawl (perhaps by paging out important programs)
    • the kernel is not prioritizing UI or important programs in terms of resources, such as memory, paging, even processor utilization

    Upvotes go to:

    I am thus looking for a solution where all such programs go away. In particular, I am looking for a solution such that the processes will slow down proportionally, while the system and other programs remains entirely unaffected and responsive long enough to manually kill something. Also the window manager process (and anything else that might affect UI responsiveness) should be responsive under all circumstances.

    In particular I am intrigued by /etc/security/limits.conf (man limits.conf), but am worried this only gives per-user control, and the commented examples in the file seem rather opaque in terms of description or where to begin. I'm hoping that a limits.conf works, but would not be surprised if it didn't even work, or if it was not an appropriate solution for my problem, or as granular as I'm trying to achieve. A per-process-name limits.conf would be ideal, assuming again that limits.conf works. I'd be happy to try out a limits.conf that people provide, to test if it works, though I'm open to all solutions at this point.

    It might also be useful to have any insights on how OS X manages to keep up such good UI responsiveness.

    I have already tweaked my /tmp and cache folders to be on tmpfs, and in general disk utilization is near-zero.

    Vaguely-related topics:

    • memory overcommit

    Answers I do not think will work:

    • swapoff (this still lets memory hog programs get away with murder, and the system permanently freezing if memory is really bad -- upvotes to anyone who can suggest a tweak that invoked the OOM-killer earlier before swapping and targets specific programs)
    • echo ?? > /sys/.../swappiness (no discernable effect)
    • nice (has never worked)
    • ionice (never noticed a difference)
    • selinux (program incompatibility seems to be a nightmare)
    • realtime linux, i.e. can interrupt kernel (don't want to deal with compiling and updating custom kernel; might be okay if it has migrated into repositories)
    • *
  • Answers
  • Turbo J

    Sounds like your system goes into heavy swapping. Using vmstat 1 may reveal some details - just let it run in a terminal window and switch to it when the slowdown kicks in.

    Rather than putting /tmp and "cache" into tmpfs, I would use a normal disk filesystem mounted with the noatime option. Often used data will stay in the caches anyway, and older data can be written to disk to free some RAM for applications. If /tmp and/or cache grows bigger, this might help a lot.

  • ultrasawblade

    Putting all your temporary and cache files on a tmpfs is lowering the amount of free RAM you have, so you might be causing the system to go to swap sooner than it would need to without this.

    It sounds like you have some applications that are relying on some sort of kernel facility or driver that is getting overloaded. You don't go into too much detail about what types of applications other than you are using browsers and indexers, and that you've disabled the indexers.

    You might try switching to a desktop environment or window manager that consumes less resources, such as LXDE or IceWM. At work I use a Linux system with LXDE installed and ROX-Filer for a very minimal desktop environment. The purpose of this Linux system is to run VMWare Player so that I can run Windows XP and Windows 7 simultaneously. It's similar hardware specs to what you say and I don't have too many responsiveness issues under this heavy load I'm putting the hardware through. I don't have any responsiveness issues with Linux itself (it's usually the VMs that sometimes make me wait a second, and sharing 1 disk between 2 VMs + 1 OS this is expected) and have always been able to suspend or shutdown the VMs whenever I want to. This includes having Firefox running on Linux often in the background.

    So to me it points to some issue with specific applications you are running.

    Is DMA enabled for your disk drives? (use hdparm) If you are using full-disk encryption, that requires all disk traffic to go through the CPU which negates much of the benefit of DMA. The effect of that would be that high disk traffic causes CPU to spike which would then slow the entire system down. (EDIT: to clarify, having DMA disabled OR using dm-crypt will cause high CPU during high disk traffic)

  • Lamnk

    This is a common problem with Linux's scheduler. The system slows down to a crawl whenever IO heavy activities occur. There aren't really many things you could do to improve the situation unless you're into kernel hacking :)

    Maybe these can help:

    http://www.phoronix.com/scan.php?page=article&item=linux_2637_video&num=1

    http://www.osnews.com/story/24223/Alternative_to_the_200_Lines_Kernel_Patch_that_Does_Wonders_


  • Related Question

    linux - Disk operations freeze Debian
  • Grzenio

    I have just installed Debian testing on my new desktop and I am not very happy with performance - when I perform a disk intensive operation, e.g. upgrade packages in the system, everything seems to freeze, e.g. changing tabs in Iceweasel takes 3 seconds. I run the Debian on my 3 year old Thinkpad X60 ultra-portable, and I don't have these issues. (every single parameter of the laptop is much worse than the desktop).

    I am using the default packaged kernel and scripts.

    I run

    hdparm -t /dev/sda1
    

    And I got around 96GB/s, which is expected. What else can I try to make it work better?

    EDIT:

    grzes:/home/ga# hdparm -i /dev/sda
    
    /dev/sda:
    
     Model=WDC WD15EARS-00Z5B1, FwRev=80.00A80, SerialNo=WD-WMAVU1362357
     Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
     RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
     BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
     CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=2930277168
     IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
     PIO modes:  pio0 pio3 pio4
     DMA modes:  mdma0 mdma1 mdma2
     UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
     AdvancedPM=no WriteCache=enabled
     Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7
    
     * signifies the current active mode
    

    EDIT2: Even my wife said "on this new computer I can't do anything when I copy the photos from the camera and its much worse than on the old one". So it must be serious.

    EDIT3: Updated to 2.6.32, but still no improvement

    EDIT4: I forgot to mention that the new disk is ext4, the old was ext3.

    EDIT5: Still not solved. I have a P43 ASUS P5QL-E board. Lines from dmesg that seem relevant:

    [    0.370850] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)                              
    [    0.370852] io scheduler noop registered                                                                      
    [    0.370853] io scheduler anticipatory registered                                                              
    [    0.370854] io scheduler deadline registered                                                                  
    [    0.370876] io scheduler cfq registered (default)
    ...
    [    0.908233] ata_piix 0000:00:1f.2: version 2.13                                                               
    [    0.908243] ata_piix 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19                                 
    [    0.908246] ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]                                                        
    [    0.908275] ata_piix 0000:00:1f.2: setting latency timer to 64                                                
    [    0.908316] scsi0 : ata_piix                                                                                  
    [    0.908374] scsi1 : ata_piix                                                                                  
    [    0.909180] ata1: SATA max UDMA/133 cmd 0xa000 ctl 0x9c00 bmdma 0x9480 irq 19                                 
    [    0.909183] ata2: SATA max UDMA/133 cmd 0x9880 ctl 0x9800 bmdma 0x9488 irq 19                                 
    [    0.909199] ata_piix 0000:00:1f.5: PCI INT B -> GSI 19 (level, low) -> IRQ 19                                 
    [    0.909202] ata_piix 0000:00:1f.5: MAP [ P0 -- P1 -- ]                                                        
    [    0.909228] ata_piix 0000:00:1f.5: setting latency timer to 64                                                
    [    0.909279] scsi2 : ata_piix                                                                                  
    [    0.909326] scsi3 : ata_piix                                                                                  
    [    0.910021] ata3: SATA max UDMA/133 cmd 0xb000 ctl 0xac00 bmdma 0xa480 irq 19                       
    

  • Related Answers
  • Rachel

    Check the offset for the partition - needs to be divisable by 4 for EARS as they have the 4096 technology. If it isn't - repartition it to get alignment and performance issues should go away (misaligned EARS drives will be doing a lot more sector writes per op).

  • Peter Eisentraut

    It's a shot in the dark, but I've had a problem like this a while ago, and the cause turned out to be that the kernel did not support the chipset completely and DMA was turned off. Check with

    hdparm -i /dev/sda
    

    whether one of the DMA modes is enabled.

    (The solution in that case was to get a newer kernel.)

  • pioto

    I've run into problems where operations which perform lots of fsync(2) calls will cause a major system slowdown. In my case, I'm running with my root partition contained in LVM contained in LUKS. Are you using either LVM or LUKS?

    A tool which may help pinpoint what specifically is chewing up your disks (rather than just "installing packages") is called iotop. I'd suggest running it while you do one of these tasks, and it may point out some other background process which may be triggering at the same time and sucking up all of your I/O throughput.

  • ciceron

    sudo fdisk -u /dev/sda

    That should give you the starting offset. I 'think' you can create the partition using fdisk -o 64 or something - I would have to google it so Ill let you do the googling on fdisk and manually setting the partition offset (default is 63 so thats no good).

    and yes the disk will show with 512b sectors as it pretends to be as such to the OS - Vista/W7 handle this by setting the correct offset, but XP and I think near all linus distros dont :( manually is the only way it seems (mine is just a storage drive and created in win7/ntfs so its no issue for me)

    Edit: - Found a nice post over at wdc - this should have you up and running in no-time :)

    http://community.wdc.com/t5/Desktop/Problem-with-WD-Advanced-Format-drive-in-LINUX-WD15EARS/m-p/10920#M631

  • liori

    Just a random shot, which seems stupid given that you use Debian... but I found that it helped someone with the same HDD model: have you tried to update your BIOS?

  • Avery Payne

    As a general rule of thumb, if you can use hdparm on the device, it's the "old" ATA interface, vs. the newer SATA/SCSI interface. If that is the case, then the problem is probably that disk ops during interrupts are not enabled by default. This is a common issue on some machines using the older ATA interface and will degrade performance of the disk or the system during heavy I/O operations.

    You really should try this:

    sudo hdparm -t -T /dev/sda
    sudo hdparm -a8 -c3 -u1 /dev/sda
    sudo hdparm -t -T /dev/sda
    

    If you don't see an improvement in performance on the 2nd timing run (the third command) then there's something else going on.

    Another factor is expecting UDMA6 mode to work over a non-UDMA cable (assuming it's not a SATA interface). If you are using an 80-pin ATA cable, you're fine; if you're using an older 40-pin, you'll get all kinds of grief. If the cable is the older 40-pin, you'll need to step the transfer rate down to something that can be supported "safely". WARNING: adjusting the IDE interface can hang the drive and/or interface, and if the drive is your root filesystem, the entire system will hang with it!

    Should you need to step down the transfer rate to match the hardware, try the following:

    sudo hdparm -t -T /dev/sda
    sudo sync; sleep 3 ; sync    
    sudo hdparm -d 1 -X mdma2 /dev/sda
    sudo hdparm -t -T /dev/sda
    

    Again, the second timing (third command issued) should show an improvement.

    Lastly, the drive itself might be marginal, but without SMART reporting, you might not notice the issue (until it's too late). I would really recommend installing the smartmontools package to assist you, especially if you have an older drive that will be needing a little TLC now and then.

    sudo apt-get update && apt-get install smartmontools
    

    IF all else fails, look in /var/log/messages for disk I/O errors.


    Update:

    It appears you're not alone. There are message boards all over the 'net reporting all kinds of heartache with these units.

    There is also mention of the drive using a 4k sector size vs. the "traditional" 512 byte size. I can only imagine what kind of trouble this must be causing.

    Lastly, looking at your output again, it appears that the journaling thread is pretty much tying up the system. A non-journaled file system might temporarily alleviate the issue, but it's prophylactic at best, and doesn't fix the problem at worst.

  • Grzenio

    This is finally fixed! As @Rachel pointed out, the problem indeed was with alignment to the 4kb sectors, but unfortunately the article linked was incorrect :(

    The correct way to align partitions is here: http://www.linuxconfig.org/linux-wd-ears-advanced-format

    And this article gives a pretty good benchmark so that you can check if your partition table is correct: http://article.gmane.org/gmane.linux.utilities.util-linux-ng/2955

    On a side note, if you have this drive and use Linux, you also SHOULD increase one of the idle timers as described here: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=5357&p_created=1266947046&p_sid=Os7DQL2k&p_accessibility=0&p_redirect=&p_srch=1&p_lva=&p_sp=cF9zcmNoPTEmcF9zb3J0X2J5PSZwX2dyaWRzb3J0PSZwX3Jvd19jbnQ9NTEsNTEmcF9wcm9kcz0yMjcsMjk0JnBfY2F0cz0xMzAmcF9wdj0yLjI5NCZwX2N2PTEuMTMwJnBfcGFnZT0x&p_li=&p_topview=1