optimize camputational time for vibrated granular media
Hi everybody!
I'm trying simulate a mono-disperse vibro-fluidized granular media in a cylinder with a cone-shape base.
I'm using the geom tool to build a facetcylinder+
Regarding the interactions, i'm using the HertzMindlin physics.
I've done exactly the same simulation with LAMMPS and run both on the same machine using 4 cores:
once reached a stationary state, simulations with 200 or 300 particles (gas-like behaviour) have, for the two softwares, almost the same computational time (YADE is just 20% slower) while in the cases with more than 2000 grains (liquid-like behaviour) LAMMPS is ten times faster!
This is the code i'm using to create the container and the engines. Is there a specific tool to speed up YADE simulation in condition of vibro-fluidized granuar system in the liquid phase namely when the particles are in a vibrating box but constantly in contact to each other?
Thanks in advance.
Andrea
_______
O.materials.
O.materials.
coneId=
cyliId=
contenitore=
O.engines=[
ForceResetter(),
InsertionSortC
InteractionLoop(
[Ig2_
],
[Ip2_
],
[Law2_
],
),
NewtonIntegrat
]
O.engines = O.engines + [HarmonicMotion
Question information
- Language:
- English Edit question
- Status:
- Expired
- For:
- Yade Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Are you running in parallel [1]? LAMMPS is probably automatically using your 4 cores. Yade requires you to pass an argument containing the number of threads you wish to use.
yade -jX script.py
(X is a place holder for your number of cores, not to be taken literally.)
[1]https:/
Revision history for this message
|
#2 |
Yes i'm sure, i run the script in this way:
yade -j 4 myscript.py
then i've controlled with htop command that the code is actually running
on four cores
On 11/29/18 1:37 PM, Robert Caulk wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Status: Open => Answered
>
> Robert Caulk proposed the following answer:
> Are you running in parallel [1]? LAMMPS is probably automatically using
> your 4 cores. Yade requires you to pass an argument containing the
> number of threads you wish to use.
>
> yade -jX script.py
>
> (X is a place holder for your number of cores, not to be taken
> literally.)
>
>
> [1]https:/
>
Revision history for this message
|
#3 |
Very interesting feedback!
It will be difficult to say something without a working script and without specific timings, though.
I thus have questions, only. :)
May I ask:
- "faster": just to be sure, you are speaking of real wall clock time per numerical time iteration, correct? Are timesteps the same?
- "more than 2000", what does it mean? 2500? one million?
- do you have an approximately linear time vs. Nparticles?
- "verletDist=
- would you share more of the data?
- could you report timing.stats() [1]?
- could you show a working script?
Bruno
Revision history for this message
|
#4 |
Here you find the working code: the comparison i'm talking about is
between the case of Nball=300 grains Nball=2600.
run the script as:
yade - j 4 -n script.py Nball
I misure the velocity of a simulation considering the average over time
of O.speed or the angular coefficient of the line O.realtime VS O.iter.
The verletList=2*rball has been chosen because after some attempts i
found that it makes faster the 300 ball case... but maybe there is a
better choice.
This is the timing.stats() in the case of 50000 steps in the stationary
state of the 2600 balls case:
Name Count Time Rel. time
-------
ForceResetter 50000
2226036us 3.09%
"collider" 340 549592us 0.76%
InteractionLoop 50000
57352064us 79.53%
NewtonIntegrator 50000
11860431us 16.45%
"shaker" 50000 121798us 0.17%
PyRunner 5 1367us 0.00%
PyRunner 5 199us 0.00%
TOTAL 72111491us 100.00%
Thanks!
Andrea
SCRIPT:
+++++++
from yade import pack,ymport,
import numpy as np
import math
import sys
import random
def adder():
if(
sp.makeCloud(
#Dimensioni utili
rball=0.002
hcone=6.37*rball
rcont=22.5*rball
rconelow=4*rball
hcil=17.13*rball
Nball=int(
A=0.00025 #amplitude shaker
fr=200 #freq shaker
#Steel
densSteel=8000
ySteel=21e7 #originale e9
poisSteel=0.293
shearModStell=
#Plaexiglass
densPMMA=1190
yPMMA=33e6
poissPMMA=0.37
frictAngle=
#Add material
#plexiglass as lammps
O.materials.
frictionAngle=
#Steel as lammps
O.materials.
frictionAngle=
#rayleigh time
tRay=math.
#container
coneId=
0, 1),0),wallMask=
cyliId=
0, 1),0),wallMask=
contenitore=
nBodyCont=
track=[]
for k in range(nBodyCont
track.
O.engines=[
ForceReset
InsertionS
Interactio
],
],
],
),
NewtonInte
]
O.engines = O.engines + [HarmonicMotion
(0,0,A), f = (0,0,fr), label='shaker')]
O.engines = O.engines + [PyRunner(command = "print O.iter, O.time,
O.speed, O.realtime,
kineticEnergy(
"adder(
O.dt=0.2*tRay
O.run(1000000)
On 11/29/18 3:07 PM, Bruno Chareyre wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Status: Open => Answered
>
> Bruno Chareyre proposed the following answer:
> Very interesting feedback!
> It will be difficult to say something without a working script and without specific timings, though.
> I thus have questions, only. :)
>
> May I ask:
> - "faster": just to be sure, you are speaking of real wall clock time per numerical time iteration, correct? Are timesteps the same?
> - "more than 2000", what does it mean? 2500? one million?
> - do you have an approximately linear time vs. Nparticles?
> - "verletDist=
> - would you share more of the data?
> - could you report timing.stats() [1]?
> - could you show a working script?
>
> Bruno
>
> [1] https:/
>
Revision history for this message
|
#5 |
Are you aware that you may not always get the number of spheres you ask for?
WARN /data/trunk/
Revision history for this message
|
#6 |
Yes, but i call the function adder() (see the code) every 10000 steps
until the number of ball is the one i want
On 11/29/18 4:38 PM, Bruno Chareyre wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Bruno Chareyre posted a new comment:
> Are you aware that you may not always get the number of spheres you ask
> for?
>
> WARN /data/trunk/
> tries to insert non-overlapping sphere to packing. Only 1133 spheres
> were added, although you requested 2000.
>
Revision history for this message
|
#7 |
I see.
Well, as I just reduced the radius a little to make them all fit in I can give you some results with -j4 (also reduced verletDist but the speedup is small, script below):
200 spheres: 0.52 sec for 10k iterations
2000 spheres: 5.5 sec for 10k iterations
That's nearly linear.
Unless the the cost in LAMMPS is sub-linear (how is that possible?!) I can't imagine how yade becomes 100x slower. Do you have approximately similar numbers?
Could it be that sphere insertions makes it *very* different?
Bruno
from yade import pack,ymport,
import numpy as np
import math
import sys
import random
def adder():
if(
sp=pack.
sp.makeCloud(
sp.toSimulation()
#Dimensioni utili
rball=0.002
hcone=6.37*rball
rcont=22.5*rball
rconelow=4*rball
hcil=17.13*rball
Nball=int(
A=0.00025 #amplitude shaker
fr=200 #freq shaker
#Steel
densSteel=8000
ySteel=21e7 #originale e9
poisSteel=0.293
shearModStell=
#Plaexiglass
densPMMA=1190
yPMMA=33e6
poissPMMA=0.37
frictAngle=
#Add material
#plexiglass as lammps
O.materials.
frictionAngle=
#Steel as lammps
O.materials.
frictionAngle=
#rayleigh time
tRay=math.
#container
coneId=
cyliId=
contenitore=
nBodyCont=
track=[]
for k in range(nBodyCont
track.
O.engines=[
ForceReset
InsertionS
Interactio
], [Ip2_FrictMat_
NewtonInte
]
O.engines = O.engines + [HarmonicMotion
O.engines = O.engines + [PyRunner(command = "print O.iter, O.time,O.speed, O.realtime,
"adder(
O.dt=0.2*tRay
O.timingEnabled
O.run(20000,1)
timing.reset()
O.timingEnabled
O.run(10000,1)
timing.stats()
Revision history for this message
|
#9 |
After rolling back to your version with sphere insertion the perfs are still more or less the same (6.7s instead of 5.5s for 2k spheres).
This is measured between iterations 20k and 30k because I lack patience to run 1e6.
Let us know if you find very different results.
Bruno
Revision history for this message
|
#10 |
In order to have a good comparison we have to make the statistics once
the system has reached the stationary state after the grains have been
poured in the container (from 20k to 30k is still a transient). I give
you the results for the mean speed (Nstep/realseconds) between 50k and
500k step namely after the pouring for both cases.
meanSpeed300LAM
meanSpeed2600LA
the number that i gave to you in the first question were just to an idea
hoping that someone else had already done this analysis :)
but the situation is almost the same: increasing the number of grains
the ratio between the mean velocities increase in favor of LAMMPS.
In the same time interval i obtain for YADE:
meanSpeed2600YA
while 300/2600=0.12
thus YADE seems to be not linear in this kind of setup.
Andrea
On 11/29/18 5:17 PM, Bruno Chareyre wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Status: Open => Answered
>
> Bruno Chareyre proposed the following answer:
> After rolling back to your version with sphere insertion the perfs are
> still more or less the same (6.7s instead of 5.5s for 2k spheres).
>
> This is measured between iterations 20k and 30k because I lack patience to run 1e6.
> Let us know if you find very different results.
>
> Bruno
>
Revision history for this message
|
#11 |
It is difficult for me to understand the meaning of various values of meanSpeedLAMMPS
I understand that something is transient physically but it doesn't seem to mater really in terms of speed. Why do you think so?
For instance 2000 spheres give (3rd column is speed):
20000 0.269857831466 3415.36292082 3.239 0.00195161180227
30000 0.404786747199 1349.67539933 9.901 0.00198843695584
40000 0.539715662932 1214.09029787 17.085 0.00106745354645
50000 0.674644578665 1199.2390826 24.621 0.00122919908665
...
190000 2.56364939893 1258.1333534 131.847 0.00197268858104
200000 2.69857831467 1269.76190104 139.55 0.00190037774033
210000 2.8335072304 1351.34040383 147.32 0.00169256567465
...
440000 5.93687229221 1357.22627666 326.907 0.000706448896021
450000 6.07180120794 1374.03537234 334.778 0.000549054156559
460000 6.20673012367 1322.66851309 342.658 0.000677048337406
The transient aspect of it is not obvious... I don't think it explains our different conclusions.
For 200 spheres:
10000 0.134928915733 85606.0606061 0.457 0.0
20000 0.269857831466 18590.9408551 0.9 0.00123066523288
30000 0.404786747199 16420.3612479 1.486 0.000652081911623
...
190000 2.56364939893 14819.7952972 11.095 0.00067976598448
200000 2.69857831467 16289.4412128 11.704 0.000573059522713
210000 2.8335072304 16424.248529 12.302 0.000804168697981
Still no clear trend after 20k.
The ratio of final speed is ~11 for a ratio in numbers of 10. Still close to linear.
Unless you have a sudden decrease of speed after 800k iterations I can't reproduce or explain your results.
How many cores do you have in total?
Did you try verlet=0.5*r for 2k spheres? You timings suggest that the cost of virtual interactions is excessively large and could be reduced by running the collider a bit more.
Bruno
Revision history for this message
|
#12 |
It is not directly related to computation time but do you know that right after adder() there is a burst of kinetic energy because you insert new spheres through the previous ones (randomly overlapping in the same volume)?
I think I realize why this non linearity appears: you increase the number of particles without changing their size hence the coordination number is increasing. 2400 spheres is when the box is maximally filled, leading to a non linear change between 2000 and 2400.
Maybe ligghts implementation of Hertz is more efficient and then this effect is less visible? I also suspect Ig2_Facet_Sphere, which generates and manipulate matrices every time even for virtual interactions [1].
This is actually a strange case to test scaling since the number of spheres cannot exceed ~2500 by definition...
Bruno
[1] https:/
Revision history for this message
|
#13 |
On 11/30/18 11:42 AM, Bruno Chareyre wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Bruno Chareyre proposed the following answer:
> It is not directly related to computation time but do you know that
> right after adder() there is a burst of kinetic energy because you
> insert new spheres through the previous ones (randomly overlapping in
> the same volume)?
Yes i know, and what i call transient is the initial time when the
particles are still not 2600 in my simulation. If you print also the
number of particle you see that i reach 2600 grains after x iteration.
So i want to start the comparison between 300 an 2600 after that time.
Also the burst in the kinetic energy is something that just affects the
initial time of the simulation.
> I think I realize why this non linearity appears: you increase the number of particles without changing their size hence the coordination number is increasing. 2400 spheres is when the box is maximally filled, leading to a non linear change between 2000 and 2400.
> Maybe ligghts implementation of Hertz is more efficient and then this effect is less visible? I also suspect Ig2_Facet_Sphere, which generates and manipulate matrices every time even for virtual interactions [1].
>
> This is actually a strange case to test scaling since the number of
> spheres cannot exceed ~2500 by definition...
It is surely a strange case but it is what i need for my research. I'm
studying what's happen in this setup when i increase the packing
fraction of the system and so when i increase the number of particles
(without changing the size) in the same volume. Maybe there is no reason
to obtain linearity for "CPU time VS N" in these conditions because the
number and the complexity of the interaction is increasing in a non
trivial way. Looking at the source code i've found that the
implementation of Hertz-Mindlin is a little bit different in the two
softwares and maybe this would be an element to take in account. I will
also study Ig2_Facet_Sphere.
In any case i'm doing a more systematic analysis and i will share to you
results and raw data soon
Thanks a lot for this interesting discussion!
Andrea
>
> Bruno
>
> [1]
> https:/
>
Revision history for this message
|
#14 |
Welcome.
That would be interesting for me to see:
N=.. | YadeTime | LammpsTime
200
400
...
2400
I was checking Ig2_Facet_Sphere and there is certainly a way to avoid some matrix manipulation by escaping sooner when there is no contact. However I'll not hurry on that one though and it would not change you life by orders of magnitude anyway.
Besides, it is not a surprise if details of the Hertz models differ, and that would be interesting top know the consequence from a physics point of view also.
Looking forward. :)
Bruno
Revision history for this message
|
#15 |
Sorry for the late but i was far away from the office.
This is my analysis for the total time involved to perform the same
simulation (both lammps and yade write on file and insert particle
almost in the same way).
#Nballl TotCPU YADE
200 23.1
400 42.9
800 85.4
1200 144.9
1600 225.7
2000 416.3
#Nballl TotCPU LAMMPS
200 15.3
400 25.1
800 36.9
1200 44.8
1600 60.1
2000 78.0
I give you the YADE script used. Run it as
time yade -j 4 -n MyScript.py Nball logfile
i use the command time that print you the total time at the end of the
execution.
Andrea
-------
from yade import pack,ymport,
import numpy as np
import math
import sys
import random
def printator():
neff=
f1.
'+str(O.realtime)+' '+str(kineticEn
def adder():
if(
sp.makeCloud(
#Dimensioni utili
rball=0.002
hcone=6.37*rball
rcont=22.5*rball
rconelow=4*rball
hcil=17.13*rball
Nball=int(
A=0.00025 #amplitude shaker
fr=200 #freq shaker
#Steel
densSteel=8000
ySteel=21e7 #originale e9
poisSteel=0.293
shearModStell=
#Plaexiglass
densPMMA=1190
yPMMA=33e6
poissPMMA=0.37
frictAngle=
#Add material
#plexiglass as lammps
O.materials.
frictionAngle=
#Steel as lammps
O.materials.
frictionAngle=
#rayleigh time
tRay=math.
#container
coneId=
cyliId=
rcont,hcil,
wallMask=
contenitore=
nBodyCont=
track=[]
for k in range(nBodyCont
track.
f1=open(
O.engines=[
ForceRese
InsertionSortCo
InteractionLoop
betas=0.4),], [Law2_ScGeom_
NewtonInt
]
O.engines = O.engines + [HarmonicMotion
(0,0,A), f = (0,0,fr), label='shaker')]
O.engines = O.engines + [PyRunner(command = "print O.iter,
O.time,O.speed,
O.realtime,
=
"adder(
"printator(
O.timingEnabled
O.dt=0.2*tRay
O.run(500000)
O.wait()
exit()
On 11/30/18 6:07 PM, Bruno Chareyre wrote:
> Your question #676451 on Yade changed:
> https:/
>
> Status: Open => Answered
>
> Bruno Chareyre proposed the following answer:
> Welcome.
> That would be interesting for me to see:
>
> N=.. | YadeTime | LammpsTime
> 200
> 400
> ...
> 2400
>
> I was checking Ig2_Facet_Sphere and there is certainly a way to avoid
> some matrix manipulation by escaping sooner when there is no contact.
> However I'll not hurry on that one though and it would not change you
> life by orders of magnitude anyway.
>
> Besides, it is not a surprise if details of the Hertz models differ, and
> that would be interesting top know the consequence from a physics point
> of view also.
>
> Looking forward. :)
>
> Bruno
>
Revision history for this message
|
#16 |
This question was expired because it remained in the 'Open' state without activity for the last 15 days.