Why my iteration speed is so low?

Asked by gaoxuesong on 2019-08-21

Hi, i try to do some simulation about particle packing behavior in a rotating drum. However the iteration speed is so low as about 30 iterations per second. Nevertheless, at the beginning when the particles are far from each, the iteration speed is 500 iter/s, no matter with -j2 or without prescribed -j parameter.

## Below is my code,

from yade import pack,geom,utils
import numpy as np
from yade import qt
### the unit in the simulation is UI, like kg m s Pa
### the particle has diameter below 100 microns

muS = 0.57
muF = 0.9
FricAngleS = math.atan(muS)
FricAngleF = math.atan(muF)

density = 8150
yongmodu = 195e5 ## scaled by 1e-4, normal value is 195Gpa
poisn = 0.3
Dsph = 80e-6 ## particle max diameter
angvelsw = -25 ## rotation speed rad/s
nrot = 6 ## rotation times

matSph = CohFrictMat(density=density, young=yongmodu, poisson=poisn, frictionAngle=FricAngleS, momentRotationLaw=True)
SMat = O.materials.append(matSph)
matFacet = CohFrictMat(density=density, young=yongmodu, poisson=poisn, frictionAngle=FricAngleF, momentRotationLaw=True)
FMat = O.materials.append(matFacet)

### geometry ###
radiusSweeper = 10*Dsph
radiusSweeper0 = 9*Dsph
lengthSweeper = 12*Dsph
numSweeperParts = int(round(2*np.pi*radiusSweeper/5.0e-6))

## spheres ##
O.bodies.clear()
sp = pack.SpherePack()
x1, y1, z1 = -radiusSweeper, 0, 0
x2, y2, z2 = radiusSweeper, lengthSweeper, 2*radiusSweeper
sp.makeCloud((x1, y1, z1), (x2, y2, z2), psdSizes=[0.010e-3,0.024e-3,0.035e-3,0.051e-3,0.085e-3], psdCumm=[0,0.1,0.5,0.9,1.0])
sp.toSimulation(material=SMat)

## drum ##
Sweeper=[]
for i in np.linspace(0, 2*pi, num=numSweeperParts, endpoint=True):
 Sweeper.append(Vector3(radiusSweeper*cos(-i), 0.0, radiusSweeper*sin(-i)+radiusSweeper))

SweeperP=[Sweeper, [p+Vector3(0.0,lengthSweeper,0.0) for p in Sweeper]]
SweeperPoly = pack.sweptPolylines2gtsSurface(SweeperP, threshold=1e-7)
sweeperid = O.bodies.append(pack.gtsSurface2Facets(SweeperPoly, wire=False, material=FMat))
boxid = O.bodies.append(geom.facetBox((0,0.5*lengthSweeper,radiusSweeper),(radiusSweeper,0.5*lengthSweeper,radiusSweeper), \
                              wallMask=63, material=FMat, wire=True))

## remove the spheres out of the drum ##
for eb in O.bodies:
    if isinstance(eb.shape, Sphere):
        xi,yi,zi = eb.state.pos
        lengi = np.sqrt(xi**2 + (zi-radiusSweeper)**2)
        if lengi > 0.96*radiusSweeper:
            O.bodies.erase(eb.id)
## total sphere is 6691, facet is 2020

### define engines functions ###
t1 = 0.1
t2 = t1+nrot*(2*np.pi/abs(angvelsw))
t3 = t2+0.1
def change_motion():
    if O.time > t1:
        rotaEngineSw.dead = False
        rotaEngineSw.angularVelocity = angvelsw
    if O.time > t2:
        rotaEngineSw.angularVelocity = 0.0

O.engines=[
 ForceResetter(),
 InsertionSortCollider([Bo1_Sphere_Aabb(),Bo1_Facet_Aabb()]),
 InteractionLoop(
         [Ig2_Sphere_Sphere_ScGeom6D(),Ig2_Facet_Sphere_ScGeom6D()],
         [Ip2_CohFrictMat_CohFrictMat_CohFrictPhys()],
          [Law2_ScGeom6D_CohFrictPhys_CohesionMoment()]
    ),
    PyRunner(iterPeriod=int(1.0e6),command='change_motion()'),
    RotationEngine(dead=True, label='rotaEngineSw', rotateAroundZero=True, zeroPoint=(0,0,radiusSweeper), angularVelocity=angvelsw, rotationAxis=[0,1,0], ids=sweeperid),
    NewtonIntegrator(damping=0.75, exactAsphericalRot=True, gravity=(0,0,-9.81)),
]

O.dt = 0.85*utils.PWaveTimeStep()
O.run()

## the hardware ##
Ubuntu 18.04 on VMware 14.0 with 8 processors 32 GB memory
Main system: Win10-64
Processor: AMD Ryzen Threadripper 1950X 16-Core Porcessor, 3.40 GHz
## Yade version
2018.02b

## some yade --performance test
## 4k 20k 40k spherical particles with yade -j 1 *.py
https://drive.google.com/open?id=1nqXm75HpqhG-n0XPU7MF1G0AiBfuTcSj
https://drive.google.com/open?id=1_2TWGVauOQvFznHqdiV5R35yhqzitHvo
https://drive.google.com/open?id=1rPvseozgYY9qEW41dQXKm3iz4sORTGtY
## 4k 20k 40k spherical particles with yade -j 2 *.py
https://drive.google.com/open?id=1_z7C7jkjYy_wzXy9iNaq_N4SaI_wghSV
https://drive.google.com/open?id=1hrOEshAC4YQAEmSpCrk_IHmIwvcvjz7z
https://drive.google.com/open?id=1pcR6WAmLWeCWo0-yiRGR-2g0BzJ7rMCl

Very appreciated.
Xuesong

Question information

Language:
English Edit question
Status:
Answered
For:
Yade Edit question
Assignee:
No assignee Edit question
Last query:
2019-08-21
Last reply:
2019-08-23
Jérôme Duriez (jduriez) said : #1

Hi,

As for the change of speed that you observe (from 500 iter/s in the beginning to 30 iter/s), this is because there is no interaction in the beginning, hence no interaction treatment, hence less computations to be done by YADE.

As for the rest, I do not think there can be any other answer to your question that
"because world is finite, and the capabilities of your hardware are as well. That's too bad..." ;-)

Robert Caulk (rcaulk) said : #2

What is the total number of particles in this simulation?

gaoxuesong (260582472-9) said : #3

Hi, Duriez. You are right. I know the capability is limited. However, i post the performance test of Yade on my computer and want to make sure the problem of slow speed is not my hardware or the run mode of Yade. So could someone try my case on your own computer to see what it is going on.
Thanks.

gaoxuesong (260582472-9) said : #4

Hi Robert, i have sphere and facet particles in my model. The total sphere is 6691, facet is 2020. So with this number of bodies, is it necessary to use multiple cores, like -j 2 or more?
Thanks.

Robert Caulk (rcaulk) said : #5

>is it necessary to use multiple cores, like -j 2 or more?

The more cores you use, the faster your simulation will run. Why not run -j8 if you have 8?

I ran your simulation with -j8 and i get 550 iter/sec at iteration 350000.

Some of your performance results don't make sense (your links are mixed, some j2s in the j1 and j1s in the j2, and your image labeled 40_ j1 is 60 seconds while your image labeled 20k_j1 is 63 seconds .... can't find a 20k_j2

I ran yade --performance and got:

-j1
Particles Me You
4k 43 61 18
20k 55 63 8
40k 72 ? ?

-j2
4k 24 31
20k 31 36
40k 35 37

If your image labels are accurate, your VM costs significant overhead in single threaded mode but is not actually costing you much overhead in multithreaded modes. That said, I am running at 2.8 GHz in multithread mode and I think you are running at 3.4 GHz in multithread mode. The final comparison will be when you answer this question, what is your simulation speed at 350000 iterations, with -j8 with the MWE you provided?

For future users: the MWE is an example of a rotational drum .

gaoxuesong (260582472-9) said : #6

Hi Robert, thanks for your test. When i try to use -j8, i have warn, like
###
WARN /build/yade-fDuCoe/yade-2018.02b/pkg/common/InsertionSortCollider.cpp:76 insertionSortParallel: Parallel insertion: only 3 thread(s) used. The number of bodies is probably too small for allowing more threads, or the geometry is flat. The contact detection should succeed but not all available threads are used.
###
So only three threads are used. When get 350000, the iteration speed is 274 iter/s .
##
1. Don't you have that kind of warn?
##
2. So you don't use a virtual machine but install a linux system directly?

Thank you again.

Robert Caulk (rcaulk) said : #7

>Don't you have that kind of warn?

Ah yes, I do have that warning. So we are comparing 3 threads to 3 threads then. It seems your virtual machine is costing you quite a bit of performance, 550 vs 274.

>So you don't use a virtual machine but install a linux system directly?

Of course.

Can you help with this problem?

Provide an answer of your own, or ask gaoxuesong for more information if necessary.

To post a message you must log in.