about -j XX and accelerate in yadedaily

Asked by Ziyu Wang

Hello,

As far as I know, using the - j option to adopt multithreading when running a script can increase the number of iterations per second (i.e. acceleration).I also use - j XX (e.g. 12) on my own computer to increase the iteration speed by two to three times.
However,due to performance problems,when I want to use yadedaily on workstation and adopt multithreading acceleration,I found that no matter I use - j XX (from 8 - 96), the iteration speed does not increase or even decrease..
The performance of the workstation should be much higher than that of my own computer.I also learned that the number of -j is not the more the better.But at least it should be much faster than my own computer..
What should I do..

Best regards.

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
Ziyu Wang
Solved:
Last query:
Last reply:
Revision history for this message
Ziyu Wang (ziyuwang1) said :
#1

Here are some workstation Hardware information that may be related to the problem:
cpu: Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1008 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 3741 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1237 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1011 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1008 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1006 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 2821 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1510 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1025 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
                       Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz
graphics card:Matrox VGA compatible controller
storage:Intel Lewisburg SATA Controller [AHCI mode]
                       Intel Lewisburg SSATA Controller [AHCI mode]
                       Dell PERC H730P Adapter

Is there anything else I can provide to help solve the problem.
Thanks.

Revision history for this message
Jan Stránský (honzik) said :
#2

Hello,

this is known issue, also recently discussed. Have a look at [1,2] and links therein.
Currently the acceleration using OpenMP is limited.
There is also an MPI parallelization [3]. (I personally have no idea about the current status of this sub-project)

Cheers
Jan

[1] https://answers.launchpad.net/yade/+question/698963
[2] https://answers.launchpad.net/yade/+question/698672
[3] https://yade-dem.org/doc/mpy.html

Revision history for this message
Robert Caulk (rcaulk) said :
#3

Hey,

Don't forget that this depends on the number of particles in your system. i.e. it is hard to parallelize a 10 particle system onto 96 cores.

Cheers,

Robert

Revision history for this message
Ziyu Wang (ziyuwang1) said :
#4

Hello,Jan and Robert!

I have read[1,2] you give and links therein.Actually I have read the Performance Test[3]..But I can't fully understand the content. I can only understand that more threads don't mean faster simulation,there is a suitable number of cores for simulation.

I'm sorry for my lack of understanding of OpenMP. Can I understand that this problem is difficult to solve at present.
There are about 2000 particles in my simulation.I can understand what you mean. Few particles don't use so many threads..
What puzzles me is that even with single thread, workstation should run faster than my own computer(The fact is that workstation is about 10% slower).
And if I use multithreading and run the same script, my own computer can achieve two to three times the acceleration, while workstation does not.
By the way,Is it possible that this problem has something to do with the graphics card?

Following is the hardware of my own computer:
CPU:AMD® Ryzen 7 4800h with radeon graphics × 16
Memory:15.6GB
graphics card:NVIDIA GeForce RTX 2060/PCIe/SSE2
OS:Ubuntu 18.04.6 LTS(The same as workstation)
And I use yadedaily 20211105-6086~6f71ebd~bionic1 for both.

Thanks for help.

[3]https://yade-dem.org/wiki/Performance_Test

Revision history for this message
Jan Stránský (honzik) said :
#5

> I can only understand that more threads don't mean faster simulation,there is a suitable number of cores for simulation.
> Can I understand that this problem is difficult to solve at present.

yes

> What puzzles me is that even with single thread, workstation should run faster than my own computer(The fact is that workstation is about 10% slower).
> And if I use multithreading and run the same script, my own computer can achieve two to three times the acceleration, while workstation does not.

This might be some hardware settings, difficult to tell just from the type.
Why 1 thread workstation should run faster? From quick searching, I got:
workstation: Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz, 1000 MHz -- 2.2 - 3.9 GHz
own computer: CPU:AMD® Ryzen 7 4800h -- 2.9 - 4.2 GHz
So I assume that for "basic" settings, the own computer is just simply faster than the workstation.
Maybe also not only the CPU itself, but also RAM plays some role? I don't know..
Maybe others with more knowledge about hardware can answer.

> By the way,Is it possible that this problem has something to do with the graphics card?

I **think** not (I see no reason for that).
Do you use some graphics (e,g, 3D view) for the measurements?
You should not use graphics for "sharp" simulations anyway.
If you use graphics (like 3d view), then the results may be influenced by the graphics.

Cheers
Jan

Revision history for this message
Ziyu Wang (ziyuwang1) said :
#6

Hello,thanks Jan for your patient explanation.

I have understood most of the previous confusion.Let me express my understanding:( look forward to your guidance)
The current - j option implements multithreaded parallel computing through OpenMP, but this method is limited at present.And OpenMPI[1] is another parallel computing method, and I can adjust the multithreaded openMPI computing for Yade using.
Am I right?

Best regards.
[1]https://yade-dem.org/doc/mpy.html

Revision history for this message
Jan Stránský (honzik) said :
#7

Your understanding is correct.
Note that I only mentioned mpi [1] because I just know it exists and possibly could do a better job. However, I am not involved in the project and don't know if / how it works, its limitations etc.
You can ask a separate question concerning mpi if you are interested.
Cheers
Jan

Revision history for this message
Robert Caulk (rcaulk) said :
#11

Hello Ziyu,

I appreciate the curiosity for the code, but I must admit, it is a bit frustrating to re-answer this same question 5 times per year in this forum. Please consider searching through the forum a bit more deeply before requesting our assistance. The reason I add information here is because you've made incorrect statements about Yade OpenMP and MPI, and thus I need to correct the record.

>>The current - j option implements multithreaded parallel computing through OpenMP, but this method is limited at present.

You may be a bit mistaken here. Yade's OpenMP implementation is, in no way, "limited." It is quite robust, effective, debugged, and working in many different aspects of the code. What you believe to be "limited" is that you cannot speed up a 2000 particle simulation with 96 cores. This is the same case for all OpenMP applications, it is not *unique* to yade. So let's be clear there is nothing wrong or "limited" with Yade's implementation of OpenMP.

>> So I assume that for "basic" settings, the own computer is just simply faster than the workstation.

I confirm Jan's answer here. "Workstations" (high core count computers) are not designed to be the best at single threaded applications. They are designed for large parallelizable problems. Home desktops are designed to load facebook quickly. Hence the focus on higher clock speed than core count. I dont think anyone, in any computational field, would consider 2000 particles to be considered a large parallelizable problem.

>>MPI[1] is another parallel computing method, and I can adjust the multithreaded openMPI computing for Yade using.
>>However, I am not involved in the project and don't know if / how it works, its limitations etc.

MPI* is in fact another parallel computing method. However, I do not know what "adjust multithreaded openMPI" means. To summarize MPI, it is allowing separate computers to work on the same problem. In comparison OpenMP, shared memory, is allowing separate cores of a single CPU to work on the same problem. As you can imagine the MPI implementation is much more difficult for implementation, and the communication between computers takes time. So no, MPI will *NOT* speed up your 2000 particle problem. IT will certainly slow it down. Also keep in mind that MPI is relatively new to yade, so there is a higher risk of encountering bugs, relative to OpenMP. Additionally you should consider that using MPI in yade requires at least some fundamental knowledge of how MPI works, so that you are setting up the problem correctly. In comparison, OpenMP requires zero knowledge from a coding perspective.

-rc

Revision history for this message
Ziyu Wang (ziyuwang1) said :
#12

Hello Robert,

Thanks for your patient explanation.I have a clearer understanding of what I should do in the future.
I'm sorry that there was no in-depth search before asking questions. I'll be ready before asking questions in the future.

Best regards.