InsertionSortCollider SortAxis : does axis choice make a difference to speed?

Asked by Daniel Kiracofe

InsertionSortCollider documentation says: "At the initial step, Bodies’ bounds (along sortAxis) are first std::sort’ed along this (sortAxis) axis, then collided. The initial sort has O(n^2) complexity,". But what is not stated is how to choose the sortAxis (or if it really matters). For example, if I have a box with dimensions 10x100x1000, is it better to pick sortAxis as the biggest dimension, or the smallest, or the one in between? Or are they all about the same? Mostly I'm concerned about the initial sort because of the aforementioned n^2 time, but if one choice is better for the initial sort and a different choice is better for subsequent steps I'd like to know that too.

If it matters, in the above assume particle diameter=1, so 1 million total particles. Large enough that choices like this could have a significant impact on run time.

If nobody has ever tried this, I can go run some experiments and see. But hoping to not reinvent the wheel

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
Daniel Kiracofe
Solved:
Last query:
Last reply:
Revision history for this message
Bruno Chareyre (bruno-chareyre) said :
#1

Hi, in a scene which is very elongated along Y-axis it could increase performance a little to choose Y as the first sorting axis, indeed.
I don't expect large differences though.
You should check how much cpu time is spent in collider [*], if it's more than 25% come back here and report the timings, there are ways to make it small enough usually.

Bruno

[*]
O.timingEnabled=True
O.run(..., True)
from yade import timing
timing.stats()

Revision history for this message
Daniel Kiracofe (kiracodl) said :
#2

Bruno, thank you for reply. I decided to try a few experiments. I created a highly elongated packing with an aspect ratio of 1:7:49. All particles were spheres with same diameter generated by makeCloud from yade.pack.SpherePack(). There were approximately 60,000 total particles. I used timing.stats() to pull out the time of the initial collider run. I ran both single threaded and with 4 threads to compare, but single threaded actually seemed to be a little faster so I'm reporting that here. Results:

sortAxis = 0 (short axis), time = 11.8 s
sortAxis = 1 (medium axis), time = 1.87 s
sortAxis = 2 (long axis), time = 0.39s

So actually a factor of 30 difference between the different axes. I tried a few variations on this with different aspect ratios and different number of particles, but general conclusion was that the long axis was MUCH faster.

This highly elongated packing is perhaps a little uncommon in the general community, but if anyone else is using something like this, it could be useful information to know.

Revision history for this message
Jérôme Duriez (jduriez) said :
#3

Hi Daniel,

Do you confirm these interesting timing data are for one DEM iteration ? (I'm wondering if these significant differences would be visible on a DEM simulation with many iterations)

Revision history for this message
Daniel Kiracofe (kiracodl) said :
#4

Jérôme, the initial report above was for a single iteration: O.run(1). To answer your question, I did a reset of the timing statistics and then ran for 100 iterations (very small verlet distance so collider was running every iteration) letting the particles settle under gravity. I did not see any statistically significant differences for those next 100 iterations. It is only the first initial sort where there are differences.

Revision history for this message
Janek Kozicki (cosurgi) said :
#5

It probably has something to do with the order in which the spheres were inserted. It is almost sorted for one direction, but not the other.

Revision history for this message
Bruno Chareyre (bruno-chareyre) said :
#6

Hi, thanks for report. It is interesting to see the effect on initSort but it is in fact not the important question.
It occurs only at iteration 1, then never again (also I don't think I parallelized it).
Since you found no difference for the next iterations I guess you confirm that there is little to gain here.
Bruno