How to accelerate Yade's poromechanical coupling

Asked by Huang peilun

Hi,

I learned from this paper [1] that Yade's poromechanically coupled DEM+PFV scheme can be accelerated by 170x by combining four techniques: matrix factor reuse, multithreaded factorization, GPU accelerated factorization, and parallel task management.

I'm wondering that are these four technologies, especially the matrix factor reuse technique, currently being implemented in YADE? How can I apply these four techniques to the coupling problem in practice?

Thanks
Peilun

[1]Caulk R A, Catalano E, Chareyre B. Accelerating yade’s poromechanical coupling with matrix factorization reuse, parallel task management, and gpu computing[J]. Computer Physics Communications, 2020, 248: 106991.

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
Huang peilun
Solved:
Last query:
Last reply:
Revision history for this message
Robert Caulk (rcaulk) said :
#1

Yes, they are all available in Yade. However, GPU accelerated factorization requires you to compile from sources.

>How can I apply these four techniques to the coupling problem in practice?

Matrix factorization reuse [1]
Multithreaded factorization [2]
Parallel task management [3]
GPU accelerated [4]

[1]https://yade-dem.org/doc/yade.wrapper.html#yade.wrapper.FlowEngine.meshUpdateInterval
[2]https://yade-dem.org/doc/yade.wrapper.html#yade.wrapper.FlowEngineT.multithread
[3]https://yade-dem.org/doc/yade.wrapper.html#yade.wrapper.ParallelEngine
[4]https://yade-dem.org/doc/GPUacceleration.html

Revision history for this message
Robert Caulk (rcaulk) said :
#2

Additionally, the paper you cite includes supplementary python scripts for reproducing the results.

[1]Caulk R A, Catalano E, Chareyre B. Accelerating yade’s poromechanical coupling with matrix factorization reuse, parallel task management, and gpu computing[J]. Computer Physics Communications, 2020, 248: 106991

[1] https://doi.org/10.1016/j.cpc.2019.106991

Revision history for this message
Huang peilun (hpl16) said :
#3

Thanks Robert, your comments help me a lot.

I now understand the matrix factorization reuse, multithreaded factorization and GPU accelerated techniques. I still have some problems about the parallel task management technique.

In the python scripts that your mentioned, I think the parallel task management technique is implemented through the following code:

#######################################################################
O.engines=[
 ForceResetter(),
 InsertionSortCollider([Bo1_Sphere_Aabb(),Bo1_Box_Aabb()]),
 InteractionLoop(
  [Ig2_Sphere_Sphere_ScGeom(),Ig2_Box_Sphere_ScGeom()],
  [Ip2_FrictMat_FrictMat_FrictPhys()],
  [Law2_ScGeom_FrictPhys_CundallStrack()],label="iloop"
 ),
 FlowEngine(multithread=1,dead=1,label="flow",ompThreads=10),
 GlobalStiffnessTimeStepper(active=1,timeStepUpdateInterval=100,timestepSafetyCoefficient=0.8),
 triax,
 #VTKRecorder(Key=identifier,dead=1, label='vtkRec', iterPeriod=100,initRun=True,fileName=(outputDir+'/vtkFiles/out-'),recorders=['spheres','facets','boxes']),
 newton
]

#Some of the original code is omitted here

if setEnginesParallel:

 O.engines=[ParallelEngine([flow,[O.engines[0],O.engines[1],O.engines[2],O.engines[4]]]),
  O.engines[5],
  O.engines[6]
 ]

 triax = O.engines[1]
 newton = O.engines[2]

 O.engines[0].slaves[1][0].ompThreads=O.engines[0].slaves[1][1].ompThreads=O.engines[0].slaves[1][2].ompThreads=O.engines[0].slaves[1][3].ompThreads=5

 flow.ompThreads=4
#######################################################################

I think based on the above code, FlowEngine runs before the ForceResetter() which is now the O.engines[0].slaves[1][0]. However, in this case, the fluid force cannot be applied to the sphere particles. Did I get this right?

Besides, I noticed that the parallelism in Yade has 3 levels. According to my understanding, by using the -j/--thread option, one can implement parallelism inside Engines and parallelism between Computation, interaction (python, GUI) and rendering. Parallelism inside multiple engine groups can only be implemented by ParallelEngine. Did I get it right?

I'm not sure if I should open a new question.

Thanks
Peilun

Revision history for this message
Huang peilun (hpl16) said :
#4

Thanks Robert, that solved my question.

Revision history for this message
Robert Caulk (rcaulk) said :
#5

> FlowEngine runs before the ForceResetter() which is now the O.engines[0].slaves[1][0]. However, in this case, the fluid force cannot be applied to the sphere particles. Did I get this right?

FlowEngine runs parallel to ForceResetter. Fluid force is applied to spherical particles inside FlowEngine. Total integration of particles based on total forces happens in newton.

>Besides, I noticed that the parallelism in Yade has 3 levels. Did I get it right?

I'm sorry I don't understand the question. -j means that any loops that can be multithreaded will be multithreaded. ParallelEngines allows you to run two engines at the same time on separate threads. Hope it helps.

Cheers,

Robert