linux - How to estimate compute time on EC2?

09
2013-08
  • Patrick McCarthy

    I need to generate data samples for my thesis, using some R/C++ code I've written. It can be embarrassingly parallel, and I've generalized it to run on local multicore machines without much trouble.

    On my Core2Duo it takes about 8 seconds to generate one sample, all are roughly the same, and ideally I'd like millions or tens of millions, so I was thinking about throwing it on EC2 for a few hours. Assuming one of their cores is comparable in performance to my C2D, 1m samples should take 2200 core-hours, give or take, ~70hrs on a 32 core machine.

    I want to figure out how long this will take with reasonable confidence, so I thought I'd jump on a free micro instance, run some tests, and assume that compares to the larger costlier machines. Except the job I submitted (a for loop generating 100 samples 50 different times) should take <12 hours, but now I'm running into hour 28. This suggests that either the cores are much slower than I expected, or else my jobs are low priority and I'm getting uneven performance.

    Say I'm interested in renting 1-3 32 core machines for a day or two. How could I estimate how long this might take?

  • Answers
  • wingedsubmariner

    The micro instances are purposefully crippled. They are intended only for intermittent CPU load, and if you try to load the CPU continuously the hypervisor Amazon has setup will cut your CPU time to some ridiculously small amount. This is why it took unusually long on the micro instance.

    A small instance however will meet your benchmarking needs, though ideally you should test on the size you plan to use. Unlike many VPS providers, the amount of CPU time is known to be relatively stable across Amazon's instances, so benchmarking on one of them should be a reasonable approximation of how long it will take on the instances you actually end up using.

    There is no way to know this with any real certainty however, because different machines will have different CPUs, different load levels (Amazon isn't be perfect in isolating you from the effect of other users of the same machine), and because of the variable overclocking in modern Xeon processors, be running at different clock speeds due to the temperature they happen to be at.


  • Related Question

    linux - How to stop adding IP from EC2 to known_hosts for ssh?
  • projectshave

    I start/stop lots of new instances as I'm learning to use Amazon EC2. Every temporary instance is added to the known_hosts file. Is this ever a problem for others who use EC2 a lot?

    I'd like to tell ssh to skip this step anytime I connect to amazonaws.com. Is there a way to do that in the config? I'm using Linux & openssh.


  • Related Answers
  • BillThor

    This is done to prevent Man in the Middle attacks. Disabling it would disable basic functionality of the ssh tools.

    You may want to keep a copy of your .ssh/known_hosts file without the entries and replace it when you are done.

  • 8088

    Try this:

    ssh -q -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no -i $MYKEY $MYUSERNAME@$MYIP $MYCOMMAND
    

    You can also do this in your config file:

    Host *.amazonaws.com
      User root
      StrictHostKeyChecking no
      UserKnownHostsFile /dev/null
      LogLevel QUIET