linux - How to estimate compute time on EC2?

09
2013-08

Patrick McCarthy

I need to generate data samples for my thesis, using some R/C++ code I've written. It can be embarrassingly parallel, and I've generalized it to run on local multicore machines without much trouble.

On my Core2Duo it takes about 8 seconds to generate one sample, all are roughly the same, and ideally I'd like millions or tens of millions, so I was thinking about throwing it on EC2 for a few hours. Assuming one of their cores is comparable in performance to my C2D, 1m samples should take 2200 core-hours, give or take, ~70hrs on a 32 core machine.

I want to figure out how long this will take with reasonable confidence, so I thought I'd jump on a free micro instance, run some tests, and assume that compares to the larger costlier machines. Except the job I submitted (a for loop generating 100 samples 50 different times) should take <12 hours, but now I'm running into hour 28. This suggests that either the cores are much slower than I expected, or else my jobs are low priority and I'm getting uneven performance.

Say I'm interested in renting 1-3 32 core machines for a day or two. How could I estimate how long this might take?

Answers

wingedsubmariner

The micro instances are purposefully crippled. They are intended only for intermittent CPU load, and if you try to load the CPU continuously the hypervisor Amazon has setup will cut your CPU time to some ridiculously small amount. This is why it took unusually long on the micro instance.

A small instance however will meet your benchmarking needs, though ideally you should test on the size you plan to use. Unlike many VPS providers, the amount of CPU time is known to be relatively stable across Amazon's instances, so benchmarking on one of them should be a reasonable approximation of how long it will take on the instances you actually end up using.

There is no way to know this with any real certainty however, because different machines will have different CPUs, different load levels (Amazon isn't be perfect in isolating you from the effect of other users of the same machine), and because of the variable overclocking in modern Xeon processors, be running at different clock speeds due to the temperature they happen to be at.

Home

linux - How to estimate compute time on EC2?