--performance
hi all
I'd like to know what this command actually do.
yade-daily --performance
The documentation says it "Starts a test to measure the productivity"
I run it and I understand that it runs some simulation with an increasing number of particles and gives back the time/iter and an extrapolation of the time needed for a bigger amount of iteration (1e5). Is this right?
I have a 8 core hardware, do this command exploit all the available computational power in the evaluation?
Can I assume it as an estimation of yade computational specification on my machine?
Thanks in advance
Riccardo
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- Yade Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- Anton Gladky
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Hi Riccardo,
you can start the test with the parameter -j8 to get the
approximated productivity of yade on your machine with 8 cores.
Cheers,
Anton
Revision history for this message
|
#2 |
Thanks Anton Gladky, that solved my question.
Revision history for this message
|
#3 |
Hey guys!
i guess its good to use this post for the results of the performance tests run with our server.
it has the following configuration:
- 2xIntel Xeon E5-2687W @3.1GHz each got 8 physical cores
- 128 GB RAM
- 240 GB SSD
- onboard Graphics Matrox G200
After we bought this new system I wanted to see how good its performance really is. Unfortunately its not as good as it was supposed to.
http://
the server reaches its maximum performance between 8 and 14 cores. Thats kind of strange for me.
Yes, i heard of the aspect of increasing ammount of communication needed when using a lot of cores, but such a bad performance? Especially in more complex scenarios (i.e. more spheres) the server should profit from its multi cores, shouldnt it?
I guess a desktop computer would be as fast as our system ;)
So my questions are,
1) whether the "--performance" check is suitable for multi core systems
2) how to improve performance?
thanks, Eugen
Revision history for this message
|
#4 |
Thank you very much Eugene, this is useful.
Unfortunately the most interesting curve cannot be seen due to the scaling (that one with the larger number of grains).
It would be nice to post your numbers.
Also, the efficiency of parallelism is really problem-dependant, so "--performance" should not be taken as the unique measurment. You could try with your own simulation scripts, or I can provide other test scripts too.
For your questions:
1) yes, but the number of grains is rather small, you could try problems with 1e6 particles
2) do you plan to work on the code for improvements? Else there is no way from the users side.
Revision history for this message
|
#5 |
2013/1/8 Eugen Kubowsky <email address hidden>:
>
> So my questions are,
> 1) whether the "--performance" check is suitable for multi core systems
Yes, but initially it was created not to test new systems, but to check
regressions after some commits.
Bruno is right, you should test the performance on your real tasks.
After that you can define, what is an optimal number of cores for it.
Yade is working fine in batch-mode, where you can calculate
several tasks in one time, using this optimal number of cores.
You can check and other DEM-codes, maybe they will work
even better in your special case.
> 2) how to improve performance?
Yade code is paralleled in most possible places. If you find more
optimal way to do it, it would be very welcome.
Cheers,
Anton
Revision history for this message
|
#6 |
Hi,
Thanks for this interesting results.
> the server reaches its maximum performance between 8 and 14 cores.
As I can see from your picture, the optimal number of cores is 3.
Unfortunately the results for higher number of particles the scaling of y-axis is not good chosen to see details. Can you provide this results in another picture (for 100k and 200k)?
Thanks in advance.
I also did calculation speed test and compared yade and pfc, see here:
https:/
There you will find the script I used. Can you please also do some calculations with this script?
(There you just have to change the first line "num_balls1D = 20" to increase number of particles ;)
I hope it is working on new yade version.
Cheers,
Christian.
Revision history for this message
|
#7 |
2013/1/8 Christian Jakob <email address hidden>:
> I hope it is working on new yade version.
We are trying not to break back-compatibility, so it should work.
Anton
Revision history for this message
|
#8 |
Ok, thanks for all the reply. As is thought performance is a hot topic in here ;)
first of all I'll provide you detailed graphs. Second I will give you results for simulation with disabled hyperthreading.
And third, I will do Christians tests.
So far for now:
detailed graphs
http://
Eugen
Revision history for this message
|
#9 |
Thanks.
You could actually plot everything on the same graph more easily if y-axis was cycles*Nparticles / time.
A conclusion from these results seems to be that parallelism gives a 3x speedup, obtained with 3-4 cores, and there is no
point using more than 4 cores.
This is not really what I concluded from my recent tests, but again: different simulation => different conclusions.
A few things to keep in mind:
- The collider (contact detection) is the main non-parallel task.
- the collider takes a larger part of the total time for larger number of particles, and for more dynamic simulations
- BUT it takes less time if verletDist is increased, at the price of more virtual interactions
In my recent tests, the collider was taking about 1% of the total time (*), then it did not matter if the collider is parallel or not. If the collider takes more than that, then it can explain why you get the best speed with 3-4 cores when I get it with 8 cores.
In "--performance", the collider's cost goes from 1.8% (5k bodies) to 55% (200k bodies). This is partly because, the stats there include the cost of initializing the collider (cost of the first iteration in any simulation). Including this cost is not really correct: since the number of steps is varying as a function of Nparticles, the 1st iteration will take proportionaly more time with more particles but this is only because the total number of iterations is smaller, then the result can't be extrapolated in the form of an average time per step.
In the end, there is a clear answer to your question: no, --performance is not good at testing parallelism and/or hardware.
(*) This information is available in the "--performance" output, 2nd line in the table below. If you are currently running tests, it would be good to record such data as it gives a better understanding of how/why speed is affected by the different factors.
Name Count Time Rel. time
-------
ForceResetter 12000 369078us 0.39%
InsertionSortCo
InteractionLoop 12000 74036435us 77.56%
NewtonIntegrator 12000 19331902us 20.25%
TOTAL 95450891us 100.00%
25091
Revision history for this message
|
#10 |
I just commited a change in the performance script, removing 1st iteration from the stats.
Revision history for this message
|
#11 |
Check out the latest version of the performance stats:
http://
it was done before i looked at your suggestion of changing the y-axis.
I simply redirected console output of the --performance in a .log file and copy-pasted it in excel - so the y axis is "velocity".
This time I added the comparision of yade performance numbers and I added the results with my desktop system (i5 processor)
I think there was a mistake in one of these charts where I mixed up with and without hyperhtreading.
Anywhere hope it helps.
Eugen
Revision history for this message
|
#12 |
Eugen, when you have a consistent set of results, it would be nice to publish them on Yade's wiki, like Christian did here:
https:/
https:/
https:/
Getting reference data on performance is always interesting, and if your data is only mentioned here in the form of directupload.net links it will be lost very soon.
Even before getting clean results, you can create a draft page and elaborate it progressively with graphs and brief explanatory text. You just need to create a wiki account (send us an email so we don't forget to validate the new account).
Bruno
Revision history for this message
|
#13 |
hi again,
I did a lot of performance tests during the last weeks. I'll post results in some days.
@Bruno: you said, that InsertionSortCo
Is it possible to deactivate this engine? In this case there will be a lot more calculations that have to be done per timestep but I think that these calculations can be done in parallel. Maybe if you enough cpu cores this will get faster than with InsertionSortCo
What do you think?
Revision history for this message
|
#14 |
Thank you very much Eugen, this will be interesting.
As for deactivating InsertionSortCo
What would make sense would be to implement another collider that would use parallelism, then it would be possible to choose between different colliders depending on the available cores and the size of the problem.
We have different colliders already in fact (see ZECollider) but they are all single-threaded and InsertionSortCo
What is the typical cpu time spent in collider in your tests?
Revision history for this message
|
#15 |
Ah ok, I see that deactivating the collider is no solution. But what about this idea:
- increase all Bouding boxes so that there will be a lot of pseudo-collisions from InsertionSortCo
Is that possible?
Revision history for this message
|
#16 |
Hi Eugen,
It should be possible, see InsertionSortCo
https:/
namely
https:/
https:/
Jan
2013/3/12 Eugen Kubowsky <email address hidden>
> Question #215540 on Yade changed:
> https:/
>
> Eugen Kubowsky posted a new comment:
> Ah ok, I see that deactivating the collider is no solution. But what about
> this idea:
> - increase all Bouding boxes so that there will be a lot of
> pseudo-collisions from InsertionSortCo
> InteractionLoop which uses parallelism
>
> Is that possible?
>
> --
> You received this question notification because you are a member of
> yade-users, which is an answer contact for Yade.
>
> _______
> Mailing list: https:/
> Post to : <email address hidden>
> Unsubscribe : https:/
> More help : https:/
>
Revision history for this message
|
#17 |
> increase all Bouding boxes so that there will be a lot of pseudo-collisions
It is not only possible, it is done by default.
If you inspect timing stats you should see that the collider is not running at each step.
Revision history for this message
|
#18 |
well, I proudly present the latest results of Christian Jakobs Tests with additional Yade Timing Stats.
[can you please activate my yade-wiki account, so that I can put it in there?]
Here is a first screenshot:
http://
On the x-axis of the stacked bar charts you'll find "count of Collider vs. Iterations done".
For example:
->one core and 1000 particles: collider runs 17 times out of 1100
I hope this will help improving Yade and exposing why yade wont benefit from a lot of cores.
Revision history for this message
|
#19 |
thank you for unlocking my wiki account. but unfortunately i can submit any changes. If I click on 'save page' this error occurs: "connection to server was reset during site was loading"