# How to accelerate Yade's poromechanical coupling

Hi,

I learned from this paper [1] that Yade's poromechanically coupled DEM+PFV scheme can be accelerated by 170x by combining four techniques: matrix factor reuse, multithreaded factorization, GPU accelerated factorization, and parallel task management.

I'm wondering that are these four technologies, especially the matrix factor reuse technique, currently being implemented in YADE? How can I apply these four techniques to the coupling problem in practice?

Thanks

Peilun

[1]Caulk R A, Catalano E, Chareyre B. Accelerating yade’s poromechanical coupling with matrix factorization reuse, parallel task management, and gpu computing[J]. Computer Physics Communications, 2020, 248: 106991.

## Question information

- Language:
- English Edit question

- Status:
- Solved

- For:
- Yade Edit question

- Assignee:
- No assignee Edit question

- Solved by:
- Huang peilun

- Solved:
- 2020-09-25

- Last query:
- 2020-09-25

- Last reply:
- 2020-09-21

Robert Caulk (rcaulk) said : | #1 |

Yes, they are all available in Yade. However, GPU accelerated factorization requires you to compile from sources.

>How can I apply these four techniques to the coupling problem in practice?

Matrix factorization reuse [1]

Multithreaded factorization [2]

Parallel task management [3]

GPU accelerated [4]

[1]https:/

[2]https:/

[3]https:/

[4]https:/

Robert Caulk (rcaulk) said : | #2 |

Additionally, the paper you cite includes supplementary python scripts for reproducing the results.

[1]Caulk R A, Catalano E, Chareyre B. Accelerating yade’s poromechanical coupling with matrix factorization reuse, parallel task management, and gpu computing[J]. Computer Physics Communications, 2020, 248: 106991

Huang peilun (hpl16) said : | #3 |

Thanks Robert, your comments help me a lot.

I now understand the matrix factorization reuse, multithreaded factorization and GPU accelerated techniques. I still have some problems about the parallel task management technique.

In the python scripts that your mentioned, I think the parallel task management technique is implemented through the following code:

#######

O.engines=[

ForceResetter(),

InsertionSortC

InteractionLoop(

[Ig2_

[Ip2_

[Law2_

),

FlowEngine(

GlobalStiffnes

triax,

#VTKRecorder(

newton

]

#Some of the original code is omitted here

if setEnginesParallel:

O.engines=

O.engines[5],

O.engines[6]

]

triax = O.engines[1]

newton = O.engines[2]

O.engines[

flow.ompThreads=4

#######

I think based on the above code, FlowEngine runs before the ForceResetter() which is now the O.engines[

Besides, I noticed that the parallelism in Yade has 3 levels. According to my understanding, by using the -j/--thread option, one can implement parallelism inside Engines and parallelism between Computation, interaction (python, GUI) and rendering. Parallelism inside multiple engine groups can only be implemented by ParallelEngine. Did I get it right？

I'm not sure if I should open a new question.

Thanks

Peilun

Huang peilun (hpl16) said : | #4 |

Thanks Robert, that solved my question.

Robert Caulk (rcaulk) said : | #5 |

> FlowEngine runs before the ForceResetter() which is now the O.engines[

FlowEngine runs parallel to ForceResetter. Fluid force is applied to spherical particles inside FlowEngine. Total integration of particles based on total forces happens in newton.

>Besides, I noticed that the parallelism in Yade has 3 levels. Did I get it right？

I'm sorry I don't understand the question. -j means that any loops that can be multithreaded will be multithreaded. ParallelEngines allows you to run two engines at the same time on separate threads. Hope it helps.

Cheers,

Robert