# Why my iteration speed is so low?

Hi, i try to do some simulation about particle packing behavior in a rotating drum. However the iteration speed is so low as about 30 iterations per second. Nevertheless, at the beginning when the particles are far from each, the iteration speed is 500 iter/s, no matter with -j2 or without prescribed -j parameter.

## Below is my code,

import numpy as np
### the unit in the simulation is UI, like kg m s Pa
### the particle has diameter below 100 microns

muS = 0.57
muF = 0.9
FricAngleS = math.atan(muS)
FricAngleF = math.atan(muF)

density = 8150
yongmodu = 195e5 ## scaled by 1e-4, normal value is 195Gpa
poisn = 0.3
Dsph = 80e-6 ## particle max diameter
angvelsw = -25 ## rotation speed rad/s
nrot = 6 ## rotation times

matSph = CohFrictMat(density=density, young=yongmodu, poisson=poisn, frictionAngle=FricAngleS, momentRotationLaw=True)
SMat = O.materials.append(matSph)
matFacet = CohFrictMat(density=density, young=yongmodu, poisson=poisn, frictionAngle=FricAngleF, momentRotationLaw=True)
FMat = O.materials.append(matFacet)

### geometry ###
lengthSweeper = 12*Dsph

## spheres ##
O.bodies.clear()
sp = pack.SpherePack()
x1, y1, z1 = -radiusSweeper, 0, 0
sp.makeCloud((x1, y1, z1), (x2, y2, z2), psdSizes=[0.010e-3,0.024e-3,0.035e-3,0.051e-3,0.085e-3], psdCumm=[0,0.1,0.5,0.9,1.0])
sp.toSimulation(material=SMat)

## drum ##
Sweeper=[]
for i in np.linspace(0, 2*pi, num=numSweeperParts, endpoint=True):

SweeperP=[Sweeper, [p+Vector3(0.0,lengthSweeper,0.0) for p in Sweeper]]
SweeperPoly = pack.sweptPolylines2gtsSurface(SweeperP, threshold=1e-7)
sweeperid = O.bodies.append(pack.gtsSurface2Facets(SweeperPoly, wire=False, material=FMat))

## remove the spheres out of the drum ##
for eb in O.bodies:
if isinstance(eb.shape, Sphere):
xi,yi,zi = eb.state.pos
O.bodies.erase(eb.id)
## total sphere is 6691, facet is 2020

### define engines functions ###
t1 = 0.1
t2 = t1+nrot*(2*np.pi/abs(angvelsw))
t3 = t2+0.1
def change_motion():
if O.time > t1:
rotaEngineSw.angularVelocity = angvelsw
if O.time > t2:
rotaEngineSw.angularVelocity = 0.0

O.engines=[
ForceResetter(),
InsertionSortCollider([Bo1_Sphere_Aabb(),Bo1_Facet_Aabb()]),
InteractionLoop(
[Ig2_Sphere_Sphere_ScGeom6D(),Ig2_Facet_Sphere_ScGeom6D()],
[Ip2_CohFrictMat_CohFrictMat_CohFrictPhys()],
[Law2_ScGeom6D_CohFrictPhys_CohesionMoment()]
),
PyRunner(iterPeriod=int(1.0e6),command='change_motion()'),
NewtonIntegrator(damping=0.75, exactAsphericalRot=True, gravity=(0,0,-9.81)),
]

O.dt = 0.85*utils.PWaveTimeStep()
O.run()

## the hardware ##
Ubuntu 18.04 on VMware 14.0 with 8 processors 32 GB memory
Main system: Win10-64
Processor: AMD Ryzen Threadripper 1950X 16-Core Porcessor, 3.40 GHz
2018.02b

## 4k 20k 40k spherical particles with yade -j 1 *.py
## 4k 20k 40k spherical particles with yade -j 2 *.py

Very appreciated.
Xuesong

## Question information

Language:
English Edit question
Status:
For:
Assignee:
No assignee Edit question
Last query:
2019-08-21
2019-08-23
 Jérôme Duriez (jduriez) said on 2019-08-21: #1

Hi,

As for the change of speed that you observe (from 500 iter/s in the beginning to 30 iter/s), this is because there is no interaction in the beginning, hence no interaction treatment, hence less computations to be done by YADE.

As for the rest, I do not think there can be any other answer to your question that
"because world is finite, and the capabilities of your hardware are as well. That's too bad..." ;-)

 Robert Caulk (rcaulk) said on 2019-08-21: #2

What is the total number of particles in this simulation?

 gaoxuesong (260582472-9) said on 2019-08-21: #3

Hi, Duriez. You are right. I know the capability is limited. However, i post the performance test of Yade on my computer and want to make sure the problem of slow speed is not my hardware or the run mode of Yade. So could someone try my case on your own computer to see what it is going on.
Thanks.

 gaoxuesong (260582472-9) said on 2019-08-21: #4

Hi Robert, i have sphere and facet particles in my model. The total sphere is 6691, facet is 2020. So with this number of bodies, is it necessary to use multiple cores, like -j 2 or more?
Thanks.

 Robert Caulk (rcaulk) said on 2019-08-22: #5

>is it necessary to use multiple cores, like -j 2 or more?

The more cores you use, the faster your simulation will run. Why not run -j8 if you have 8?

I ran your simulation with -j8 and i get 550 iter/sec at iteration 350000.

Some of your performance results don't make sense (your links are mixed, some j2s in the j1 and j1s in the j2, and your image labeled 40_ j1 is 60 seconds while your image labeled 20k_j1 is 63 seconds .... can't find a 20k_j2

I ran yade --performance and got:

-j1
Particles Me You
4k 43 61 18
20k 55 63 8
40k 72 ? ?

-j2
4k 24 31
20k 31 36
40k 35 37

If your image labels are accurate, your VM costs significant overhead in single threaded mode but is not actually costing you much overhead in multithreaded modes. That said, I am running at 2.8 GHz in multithread mode and I think you are running at 3.4 GHz in multithread mode. The final comparison will be when you answer this question, what is your simulation speed at 350000 iterations, with -j8 with the MWE you provided?

For future users: the MWE is an example of a rotational drum .

 gaoxuesong (260582472-9) said on 2019-08-23: #6

Hi Robert, thanks for your test. When i try to use -j8, i have warn, like
###
WARN /build/yade-fDuCoe/yade-2018.02b/pkg/common/InsertionSortCollider.cpp:76 insertionSortParallel: Parallel insertion: only 3 thread(s) used. The number of bodies is probably too small for allowing more threads, or the geometry is flat. The contact detection should succeed but not all available threads are used.
###
So only three threads are used. When get 350000, the iteration speed is 274 iter/s .
##
1. Don't you have that kind of warn?
##
2. So you don't use a virtual machine but install a linux system directly?

Thank you again.

 Robert Caulk (rcaulk) said on 2019-08-23: #7

>Don't you have that kind of warn?

Ah yes, I do have that warning. So we are comparing 3 threads to 3 threads then. It seems your virtual machine is costing you quite a bit of performance, 550 vs 274.

>So you don't use a virtual machine but install a linux system directly?

Of course.