Need suggestions on the computer server for YADE computation.

Asked by xuelong14 on 2017-09-01

I would like to buy a computer server to run YADE.
Now I have 2 choices: one is to buy a server of more cores but lower base frequency; the other is to buy a server of less cores but higher base frequency.
I want to run many YADE programes on this server at the same time. So which server is the better choice?
Thank you very much.

Question information

English Edit question
Yade Edit question
No assignee Edit question
Last query:
Last reply:
Christian Jakob (jakob-ifgt) said : #1


I would definitely prefer more single CPU power and less number of CPUs because DEM is bad to parallelize (in general, not just YADE).



Jérôme Duriez (jduriez) said : #2


Out of curiosity regarding Christian's answer, is it here really about parallelization (or lack of ..) if it is about different YADE "programes" (sessions ?) ?


xuelong14 (xuelong03) said : #3

Actually, it is not about parallelization. In my simulation, I usually use -j8 to achieve the paralelization. So If too many YADE programes of -j8 run at the same time, the calculation efficiency still decreases.

So my question can be described , for example, as below:
if I run 10 YADE programes at the same time on one server(12 cores of base frequency 3.0GHz), and the calculation has decreased a lot compared with running only 5 programes at the same time. Then, if I use the server(16cores of base frequency 2.6GHz), will the calculation efficiency improve a lot?

Thank you very much.

Gary Pekmezi (gpekmezi) said : #4

I have an alternative suggestion you may not have thought of.

Instead of using the money to buy a server, you could use it to purchase server time on one of the cloud-based servers. I will use Amazon's EC2 as an example, but I'm sure other IaaS (Infrastructure as a Service) will work fine. You could for instance purchase time on several 36-core Ubuntu-based compute-nodes. You could then run 9 8-thread jobs on each node using hyper-threading.

Jan Stránský (honzik) said : #5


it depends, as usually, on many factors. If you have independent simulations, the most efficient from CPU point of view is to run as many simulations simultaneously as the number of cores, each one on single core (-j 1). But each simulation uses certain RAM and that might be a problem with this approach. What RAM capacity have the machines?

If RAM is the problem, you can use -j 2, -j 3 or -j 4 (usually) without very significant performance influence. In that case, I would decide according to "total power" nCores*frequency.

In this specific case, I don't think that for running Yade 12 or 16 cores does not make much difference (using up to -j 4), as well as the "total power" 12*3=36 and 16*2.6=41.6 is not too different..

So my resume is that according to given data, the computers are pretty comparable..


Robert Caulk (rcaulk) said : #6

This question was scientifically approached here [1] and here [2]
"Which kind of hardware is right? A lot of cores at low
frequency (AMD) or some cores with higher frequency

Some key take-aways:
"Results indicate again that there is an optimum number of threads. It might be best to try a couple of configurations first since it will certainly be problem dependent."
"Performance is … depending on what you simulate"
"Serial collider took a lot of time with large amount of particles -> New collider is
doing way better. Especially without Hyperthreading."

As always, optimal computer configuration depends entirely on what types of simulations you need to run, how long you need to run them, the types of analyses you'd like to do, etc. Maybe you want to run thousands of small simulations for your analysis? Or maybe you want to run a massive simulation for a long period of time? As Gary mentions, you might try using Amazon EC2 to test out some configurations [3] (shameless plug to my AEC2 guide). However, Gary also suggests hyperthreading, which is generally not recommended for any scientific computing application.

This brings me to another point: I also noticed you are passing -j8 and hoping to run 10 of these on a 12 or 16 core machine? IINM, you should avoid this since multiple jobs will demand core time from similar cores leading to unnecessary swapping in and out of the cache.

In my experience, it is equally important to consider the cache size of the processor in addition to the speed of the cores. Increasing cache reduces the number of expensive ram swaps, especially for very large simulations.


Can you help with this problem?

Provide an answer of your own, or ask xuelong14 for more information if necessary.

To post a message you must log in.