Question #698672 “running on a server is slower than on a PC ” : Questions : Yade

Revision history for this message

Jan Stránský (honzik) said on 2021-09-09:

#1

Hello,

thanks for detailed question statement and explanation. Two details:
- what is "the server"? How much you can influence it (running other jobs, setting, ...)?
- please provide the code, as the parallelization performance is strongly influenced by the code used.
- provide the results of yade.timing [1] (for different -j options)

> server ... 2.10GHz
> PC ... 3.00GHz
> When I use -j2 for both of them, speed of simulation on server is about 30% slower than on the PC.

this seems roughly expectable.

What about -j1 (or just not specifying -j at all)?
- do you get 100% of CPU?
- what is the PC x server difference?

> ... server ... yade -j8 test.py
> PC ... yade -j2 test.py
> The speed of simulation process on server is about 10% slower than on the PC. With "top" command server use only 55-60% of CPU but my PC use 97% of CPU.

This is independent of PC or server (depending on "the server", rather it is the parallelization itself, see below).

> When I use -j12 for server: 64000 iteration in 1 hour and 16 min (60% of CPU).
> -j8 for server: 64000 iteration in 44 min (55% of CPU). -j2 for server: 64000 iteration in 55 min (17% of CPU).
> With -j2 for my PC: 64000 iteration in 33 min (97% of CPU).

-j2 using 17% CPU is indeed very suspicious. Are you able to reproduce it? Wasn't it just a coincidence?

Anyway, include also -j1 results for this kind of investigation.

Yade uses OpenMP parallelization, which has some limitations.
(very roughly), if running in parallel, Yade in advance divide its computation to threads. After each thread is finished, the simulation is then merged together. If one thread finishes earlier then the others, it waits until the last thread is finished. This "waiting" causes the decrease of CPU from 100% (although the CPU is reserved for Yade, it is "idle" for some time).

Some other notes for not perfect scaling (using 2 cores, the time is more than half time of 1 core):
- dividing / merging simulation takes some extra time
- some code is not / cannot be parallelized

Have a look at Alexander Eulitz's benchmark ([2], page 462). Especially the quote:
"Surprisingly shorter simulation times are only achieved for a rather small number of cores (depending on simulation setup between 4 and 7)."
I think this is what you are experiencing, right?
Even it is from 2014, I am not sure if there are a significant code change related to this.
If case of no solution in this thread, you can open a new question aimed on parallelization, benchmarks etc.

> Yade: 2018.02b
> Is that possible, some updates in Ubuntu and some lib or version of them in server cause this problem?

It is worth to try a newer version, just to get some basic idea if the parallelization works better or not. E.g. yadedaily can be installed by one "sudo apt install" command.

Also, are you aware of the "mpi" project [3]? I have no personal experience, others may give suggestions

Cheers
Jan

[1] https://yade-dem.org/doc/yade.timing.html
[2] https://yade-dem.org/publi/1stWorkshop/booklet.pdf
[3] https://yade-dem.org/doc/mpy.html