run with Yadedaily
Hello All,
I am using yadedaily and I want to use parallel computation. I typed "yadedaily -j8 test.py", but the yade just used one CPU. I checked it in the System Monitor in Ubunto 18. My yadedaily is the version of 20200213-
Could you please help me if you have some ideas?
Thank you very much.
Best Regards,
Alireza
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- Yade Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- Alireza Sadeghi
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Hi, it depends on "test.py" a little (the content of it)... therefore I have no idea.
Bruno
Revision history for this message
|
#2 |
Ya Bruno, I was going to predict this warning:
"The number of bodies is probably too small for allowing more threads, or the geometry is flat. The contact detection should succeed but not all available threads are used."
Is being printed in his terminal. Am I right?
Revision history for this message
|
#3 |
Hello,
Thank you very much for your answers. I want to simulate a PeriTriaxController test with 12485 particles. I am using clump instead of spheres. Yes, you are right, when I used 500 particles in the same code, I can sun it in parallel.
What should I do now?
Thank you very much.
Best Regards,
Alireza
P.S. my code is:
from yade import utils, plot
from yade import pack, qt
from datetime import datetime
qtr=qt.Renderer()
qtr.bgColor=(1,1,1)
#======
#======
#======
O.periodic=True
O.cell.
#======
#================= define the materials =======
#======
O.materials.
O.materials.
O.materials.
density=1523.6, poisson=0.3, frictionAngle= 0.28, fragile=False, label='wall'))
#======
#======
#======
radz1=[
poz1= [[1.13418e-
template1= []
template1.
radz2=[
poz2= [[1.16386e-
template2= []
template2.
radz3=[
poz3=[[
template3= []
template3.
radz4=[
poz4=[[
template4= []
template4.
radz5= [0.734571e-
poz5= [[-0.452026e-
template5= []
template5.
radz6= [0.669163e-
poz6= [[0.466453e-
template6= []
template6.
radz7= [0.442256e-
poz7= [[0.994838e-
template7= []
template7.
radz8= [1.08571e-
poz8= [[0.195184e-
template8= []
template8.
radz9= [0.474032e-
poz9= [[-0.149359e-
template9= []
template9.
radz10= [0.950557e-
poz10= [[-0.220529e-
template10= []
template10.
radz11= [0.485631e-
poz11= [[-0.0737463e-
template11= []
template11.
radz12= [0.447918e-
poz12= [[-0.327852e-
template12= []
template12.
radz13= [1.19962e-
poz13= [[0.192005e-
template13= []
template13.
radz14= [0.792554e-
poz14= [[-0.00384768e-
template14= []
template14.
radz15= [0.775848e-
poz15= [[-0.0743132e-
template15= []
template15.
radz16= [0.47463e-
poz16= [[-0.0815721e-
template16= []
template16.
radz17= [0.662254e-
poz17=[
template17= []
template17.
radz18= [0.553536e-
poz18= [[0.11275e-
template18= []
template18.
radz19= [0.575494e-
poz19= [[-0.402566e-
template19= []
template19.
radz20= [0.517041e-
poz20= [[1.09062e-
template20= []
template20.
radz21= [0.590227e-
poz21= [[-0.67795e-
template21= []
template21.
radz22= [1.08158e-
poz22= [[0.21638e-
template22= []
template22.
radz23= [0.919619e-
poz23= [[-0.447017e-
template23= []
template23.
radz24= [0.724891e-
poz24= [[-0.623002e-
template24= []
template24.
radz25= [0.234683e-
poz25= [[-0.209645e-
template25= []
template25.
radz26= [0.796721e-
poz26= [[-0.594612e-
template26= []
template26.
radz27= [0.46615e-
poz27= [[0.616128e-
template27= []
template27.
radz28= [0.708714e-
poz28= [[-0.72741e-
template28= []
template28.
radz29= [0.735391e-
poz29= [[-0.759223e-
template29= []
template29.
radz30= [0.700592e-
poz30= [[0.848456e-
template30= []
template30.
radz31= [0.658203e-
poz31= [[-0.179519e-
template31= []
template31.
radz32= [0.508506e-
poz32= [[-0.225357e-
template32= []
template32.
radz33= [0.420505e-
poz33= [[0.0691127e-
template33= []
template33.
radz34= [0.466058e-
poz34= [[0.197587e-
template34= []
template34.
radz35= [0.261295e-
poz35=[
template35= []
template35.
radz36= [0.582179e-
poz36= [[-0.0790762e-
template36= []
template36.
radz37= [0.890405e-
poz37= [[0.589379e-
template37= []
template37.
radz38= [0.491309e-
poz38= [[0.400075e-
template38= []
template38.
radz39= [0.359834e-
poz39= [[0.42399e-
template39= []
template39.
radz40= [0.599317e-
poz40= [[0.465454e-
template40= []
template40.
radz41= [0.378217e-
poz41= [[-0.0534197e-
template41= []
template41.
radz42= [1.06732e-
poz42= [[0.254003e-
template42= []
template42.
radz43= [0.767746e-
poz43= [[0.129521e-
template43= []
template43.
radz44= [0.304305e-
poz44= [[-0.0890751e-
template44= []
template44.
radz45= [0.226757e-
poz45= [[0.402344e-
template45= []
template45.
radz46= [0.59094e-
poz46= [[-0.988543e-
template46= []
template46.
radz47= [1.18767e-
poz47= [[-0.370619e-
template47= []
template47.
#======
#======
#======
coke=((
(1.875e-
(1.875e-
(1.875e-
(1.875e-
(1.875e-
(1.875e-
(0.9475e-
(0.9475e-
(0.9475e-
(0.50125e-
(0.50125e-
nums=['
't','t'
't','t'
't','t'
't','t'
't','t'
't','t'
't','t','t','t']
temps=[
template20,
template32,
template46,
template17,
template1,
template20,
template17,
mats=['
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
'aggregate-
for i in range(len(nums)):
nums[
nums[
O.bodies.
O.bodies.
for x in range(len(
if (O.bodies[x]):
if isinstance(
if O.bodies[
O.bodies[
else:
O.bodies[
else:
O.bodies[
qtr=qt.Renderer()
qtr.bgColor=(1,1,1)
#======
#======
#======
sigmaIso=-1e5
O.engines=[
ForceResetter(),
InsertionSortC
InteractionLoo
PeriTriaxContr
goal=
dynCell=
maxUnbalanced
doneHook=
),
NewtonIntegrat
PyRunner(
]
O.dt=5e-7
def history():
plot.addData(
sxx=-
exx=-
print (-triax.
def compactionFinis
O.cell.
triax.
triax.stressMask=3
triax.
triax.
triax.
dataOld = plot.data
plot.saveDataT
plot.data = {}
plot.plot()
#plot.plot()
def triaxFinished():
dataOld = plot.data
plot.saveDataT
print ('Finished')
O.pause()
Revision history for this message
|
#4 |
>>Yes, you are right, when I used 500 particles in the same code, I can sun it in parallel.
What should I do now?
I am sorry, I do not understand what you are asking me. Please rephrase the question.
Revision history for this message
|
#5 |
Hello Robert,
Thank you very much. How can I fix my problem? I mean how can I use parallel calculation for this simulation? Is it not possible to run this code in parallel?
Thanks a lot.
Best Regards,
Alireza
Revision history for this message
|
#6 |
If this is the warning(?):
"The number of bodies is probably too small for allowing more threads, or the geometry is flat. The contact detection should succeed but not all available threads are used."
Then you cannot run it in parallel without revising the source code.
Revision history for this message
|
#7 |
No, I have no warning. In addition, I can use parallel when I use 500 particles. Now I have more than 12000 particles. Therefore, the number of bodies is not too small for parallel calculation.
Thank you.
Alireza
Revision history for this message
|
#8 |
Please run your larger simulation using one core, use yade.timing module and provide yade.timing.stats.
Maybe most of time is consumed on a stuff, which is not parallelized, which then results in a simulation seemingly not being parellelized..
cheers
Jan
Revision history for this message
|
#9 |
Dear Jan,
Thank you very much for your response. I applied your comments and I have this in my terminal:
ForceResetter 49 54037.627us 0.02%
InsertionSortCo
InteractionLoop 49 5101713.404us 1.82%
"triax" 49 647611.067us 0.23%
NewtonIntegrator 49 1206346.451us 0.43%
forces sync 49 126156.097us 10.46%
motion integration 49 1080081.651us 89.53%
sync max vel 49 25.183us 0.00%
terminate 49 17.333us 0.00%
TOTAL 196 1206280.264us 99.99%
"recorder" 47 3182985.26us 1.14%
TOTAL 280157829.486us 100.00%
In addition, when I used 8 CPU for the larger simulation, after two hours, the program used one CPU 100% and other seven CPUs worked at almost 50%. It means that, although all the cpus did not participate in the simulation equally, they have effect on the speed.
Do you know how can I receive to faster calculation with my code?
Thank you again.
Best Regards,
Alireza
Revision history for this message
|
#10 |
> InsertionSortCo
seems a bit suspicious, but currently I am not able to dig deeper into the problem..
cheers
Jan
Revision history for this message
|
#11 |
Bruno will know for sure, but it seems possible that the sorting is unable to use parallelization because a bound moves outside of a single thread's domain (if I understand correctly) [1]. Maybe this has something to do with the use of clumps+periodic.
Could you tell us if you observe the same performance issue when you use:
-unclumped particles + periodic triax
-clumped particles + nonperiodic triax
-unclumped particles + non periodic triax
?
Cheers,
Robert
https:/
Revision history for this message
|
#12 |
Dear Robert,
You are absolutely right. Sure, I will do those simulations in the three days later. Now, I am running a similar code (1000 particles+using clumps+periodic boundary) but all the particles are in the same size (with different templates but the overall size is the same). This code is running parallel very well.
The problem arises when I add small particles in the aforementioned mixture.
Best Regards,
Alireza
Revision history for this message
|
#13 |
Hi Alireza,
I had a similar problem with my code as yours. When I tried to define my sample with my desire dimensions (for the sample and grains) I got a long run-time and similarly, about 90% of the time-run and CPU process was dedicated to "InsertionSortC
Bests,
Ehsan
Revision history for this message
|
#14 |
Hi Ehsan,
Thank you for your reply. My problem is "when I run a simulation with 12,000 particles with 8 CPUs, the software just uses one CPU and the other seven CPUs do not participate in the simulation. Therefore, it takes a long time. But, when I am using, for example, 1000 particles, all of my CPUs work 100%. Do you have a similar problem?
Best Regards,
Alireza
Revision history for this message
|
#15 |
For the record, OpenMP parallel execution of some part of InsertionSortCo
See this block [*] which is unable to call insertionSortPa
(because the latter is not designed for periodic simulations [**])
Adding to the confusion of the discussion, OpenMP parallel execution of the same part of InsertionSortCo
But this should be temporary.
Note that the number of particles with respect to 1000 also plays a role [*], and that YADE now offers MPI (instead of OpenMP) parallel computing.
[*] https:/
[**] https:/
Revision history for this message
|
#17 |
Hi,
I checked you script.
The problem is that the automatic verletDist is based on minimum particle size.
As it turns out that it doesn't lead to a wise decision in your highly heterogeneous case (the extra length is much too small), and consequently the collision detection occurs much too often.
You need to increase verletDist to decrease collision frequency to ~1/100 iterations or less. [1]
In addition, with 140k spheres in periodic conditions the initial sort (iteration 1) is slow. It could be improved probably, but that cost remain small compared to an entire simulation (it suggests that 1 iteration should be done before enabling timing, though, in order to get consistent measurements).
I hope it helps.
Bruno