running on a server is slower than on a PC

Asked by yang yi on 2020-05-26

Dear friend

I meet a very strange question. Our Lab. take a new server and I install the yade on the server. But I find that running on the server is slower than on the PC.

(1) The hardware and configuration of the server is
     CPU: Intel XEON Gold 6146 * 4, 3.2GH 96 core
     Memory : 256GB
     OS: Ubuntu 18.04 64bits
     Yade: https://gitlab.com/yade-dev/trunk

(2) The hardware and configuration of the PC
    CPU : I7-9700k 3.6GH 8 core
    Memory :32GB
    OS Ubunut 18.04 64bit
    Yade: https://gitlab.com/yade-dev/trunk

I operate the server by remote desktop -- VNC viewer

I cmake and install the yade like this:

   cd ~/myYade/build
   cmake -DCMAKE_INSTALL_PREFIX=../install ../trunk -DNOSUFFIX=ON -DENABLE_OPENMP=ON
   make -j96
   make install

I run the yade like this:
(1) On the server
~/PycharmProjects/myname/yade/Yade20200519_for_Journal of Coal$ ~/myYade/install/bin/yade -j96 20200522.py

(2) On the PC
~/PycharmProjects/myname/yade/Yade20200519_for_Journal of Coal$ ~/myYade/install/bin/yade -j8 20200522.py

I open the yade simulation, show 3D

The speed of simulation process on server is slower than on the PC, and it is very very obviously.

When I make the yade : make -j96. The speed is very quickly. But when I running the yade is slower than on the PC.

The server supplier can insure that the hardware of the server is right.

Please kindly help me. If the OS on server is right? If process of installation yade on server right? If the running the yade is right? It makes me very confusion.

Thank you very much.

Yang Yi

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
yang yi
Solved:
2020-05-29
Last query:
2020-05-29
Last reply:
2020-05-28
Jan Stránský (honzik) said : #1

Hello,

TLTR: do not compare -j8 and -j96, compare -j8 for both systems. Difference of -j8 and -j96 is another problem not much related to different hardware.

Long answer:

to answer / give you some hints, we need more information, e.g. the script you are running.

> I open the yade simulation, show 3D

how it behaves without any GUI?

> The speed ... is slower ... it is very very obviously.

please be more specific what "very obviously" means. 10% slower ? 3x slower? 1000x slower?

> On the server ... -j96
> On the PC ... -j8

what are the results if you use e.g. -j 4 (or the same -j) for both systems?

In general, it is not very good idea to compare -j 8 and -j 96, it is then very different situation, very much depending on the simulation itself..

cheers
Jan

Hi,

i have also used remote desktop PC to run yade. You should be very clear on which one you think that it is slow: (1). YADE simulation (running in the command window, i.e. the terminal) or (2). 3D visualization in YADE GUI ? I guess it is (2) since (2) depends on the connection between you and the remote controlled computer, not the hardware of your PC or the remote controlled computer.

Hope that help,

Hi,
What you observe is actually expected:
XEON 4, 3.2GH
I7 3.6GH
The I7 is faster. HPC rarely outperform personal computers in per-core performance.

Arguably, you use "-jSomething", but it makes no sense to discuss -jAnything on the basis of just an obscure script name "20200522.py".

You did not mention how -j2 compares to -j1 on any of the procs, so that would be just blind guessing to try and answer this part.
Compare -j1 on both first.

Bruno

Janek Kozicki (cosurgi) said : #4

If you have a recent yade version you can do yade --stdperformance -jN benchmark, with same N on both systems. If the speed is not stable, because some different programs are running it will take a long time, so you can use --quickperformance instead.

yang yi (yangyizaixian) said : #5

1. Jan Stránský (honzik)
Than you very much for you quickly response. I explain as the follows

(1) The script
The script must import torch. The code is use to simulated particles falling down with the gravity.

#!/usr/bin/env python
#  encoding: utf-8
## this programmig is DQN with HMRF-2-NH.

from __future__ import print_function
# import sys
# sys.path.append('~/PycharmProjects/yangyi/yade/Yade20200419_for_Journal of Coal/')
# from yadeImport import *

import random
import math
import csv
import torch
import torch.nn as nn
import numpy as np
import os
import torch.nn.functional as F
from torch.distributions import Categorical
from yade.gridpfacet import *

pi = math.pi

Testing = False
AddCoalRock = False

loadPoint = '00088'
o = Omega()
o.dt = 1e-12
totalLoss = 0
LastState = []
LastAction = np.zeros((5, 1), dtype=int)
saveLoss = []
outputDir = 'Output'
checkCounter = 0

numWinds = 5
WinAction = np.zeros((5,1), dtype=int)

RewardCoal = 1
RewardRock = -3

nParameters = 2
mu_in = np.zeros((5, nParameters))
sigma_in = np.ones((5, nParameters))
MAP_iter_number = 5
EM_iter = 5

## =========== Environment ============##
widthSpace = 6.8 ## width is the distance from the front coal wall to the back
widthHydr = 1.5
lengthSpace = widthHydr * numWinds ## length is the width of 5 windows
highCoal = 2
highRock = 2
if AddCoalRock:
    highCoalRock1 = 1
    highCoalRock2 = 1
    highCoalRock3 = 1
else:
    highCoalRock1 = 0
    highCoalRock2 = 0
    highCoalRock3 = 0

highHydr = 3.8
highUnderground = 10
highBottom = 0.5
highSpace = highUnderground + highHydr + highCoal + highCoalRock1 + highCoalRock2 + highCoalRock3 + highRock
radiusCoal = 0.15
radiusRock = 0.15
CheckThick = 1
highDummy = 3

colorCoal = (0, 1, 0)
colorRock = (1, 0, 0)

colorState = (0, 0, 1)
colorReceive = (238/255, 233/255, 191/255)

colorShield = (205/255, 150/255, 205/255)
colorWind = (0, 1, 1)
colorGround = (54/255,54/255, 54/255)

angleShield = 50 * 3.1415926 / 180
angleSwingPositive = 15 * 3.1415926 / 180
angleSwingNegtive = 40 * 3.1415926 / 180
lengthShield = 3
lengthTail = 2
windUpperBoundary = 0.9
windLowerBoundary = 0.5019
stateUpperBoundary = highHydr
stateLowerBoundary = windLowerBoundary
##———————————————————————————————————————————————##
positionShield = []
positionWind = []
positionTopBeam = []
positionDummy = []
windPosition = np.zeros(5)
windPositionPositive = 1
windPositionNegative = 2

shield_y_0 = lengthTail * math.cos(angleShield - angleSwingPositive)
shield_y_1 = shield_y_0 + lengthShield * math.cos(angleShield)
shield_z_0 = highHydr - lengthShield * math.sin(angleShield)
shield_z_1 = highHydr #
wind_y_0 = lengthTail * (math.cos(angleShield - angleSwingPositive) - math.cos(angleShield))
wind_y_1 = shield_y_0
wind_z_0 = highHydr - (lengthShield + lengthTail) * math.sin(angleShield)
wind_z_1 = highHydr - lengthShield * math.sin(angleShield)
topBeam_y_0 = lengthTail * math.cos(angleShield - angleSwingPositive) + lengthShield * math.cos(angleShield)
topBean_y_1 = widthSpace

# matRock = O.materials.append(JCFpmMat(young=13.5e9, cohesion=2.06e6,density=2487, frictionAngle=radians(10),
# tensileStrength=1.13e6, poisson=0.123, jointNormalStiffness=1e8, jointShearStiffness=1e7, jointCohesion=1e6))

HyMat = O.materials.append(FrictMat(frictionAngle=0.1, density=3000, young=2e9))
#
matRock = O.materials.append(JCFpmMat(young=13.5e9, cohesion=2.06e6,density=2487, frictionAngle=radians(30),
      tensileStrength=1.13e1, poisson=0.123, jointNormalStiffness=1e1, jointShearStiffness=1e1, jointCohesion=1e1))

# matCoal = O.materials.append(JCFpmMat(young=4.2e9, cohesion=2.11e6, density=1420, frictionAngle=radians(19.5),
# tensileStrength=2.6e6, poisson=0.22, jointNormalStiffness=1e1, jointShearStiffness=1e1, jointCohesion=1e1))

# matRock = O.materials.append(FrictMat(frictionAngle=radians(30), density=2487, young=13.5e9))
matCoal = O.materials.append(FrictMat(frictionAngle=radians(29.5), density=1420, young=4.2e9))

myGraviaty = -1200
nIterControl = 1000
nCheckEnd = 3000
nIterReload =10000

VelocityWindPositive = 40
VelocityWindNegative = 60
percentCoalStopEpisode = 20

# matGround = O.materials.append(CohFrictMat(young=E, poisson=0.3, density=2650, frictionAngle=radians(30), normalCohesion=3e100,
# shearCohesion=3e100, momentRotationLaw=True))

def Boundary():
    boundary = O.bodies.append(geom.facetBox((lengthSpace / 2, widthSpace / 2, highSpace / 2 - highUnderground),
                                             (lengthSpace / 2, widthSpace / 2, highSpace / 2), wallMask=63, material =HyMat ))

def Ground():
    O.bodies.append(utils.wall(position=-highUnderground, sense=0, axis=2, color=colorGround, material=-1))

def Dummy():
    for i in range (1, numWinds):
        temp=[
            Vector3(widthHydr*i, 0 , -highUnderground),
            Vector3(widthHydr*i, widthSpace, -highUnderground),
            Vector3(widthHydr*i, widthSpace, shield_z_0),
            Vector3(widthHydr*i, 0, shield_z_0)]
        positionDummy.append(temp)

    Dummy1 = pack.sweptPolylines2gtsSurface([positionDummy[0]], capStart=True, capEnd=True)
    Dummy2 = pack.sweptPolylines2gtsSurface([positionDummy[1]], capStart=True, capEnd=True)
    Dummy3 = pack.sweptPolylines2gtsSurface([positionDummy[2]], capStart=True, capEnd=True)
    Dummy4 = pack.sweptPolylines2gtsSurface([positionDummy[3]], capStart=True, capEnd=True)

    O.bodies.append(pack.gtsSurface2Facets(Dummy1, color=colorShield,material = HyMat))
    O.bodies.append(pack.gtsSurface2Facets(Dummy2, color=colorShield,material = HyMat))
    O.bodies.append(pack.gtsSurface2Facets(Dummy3, color=colorShield,material = HyMat))
    O.bodies.append(pack.gtsSurface2Facets(Dummy4, color=colorShield,material = HyMat))
##----------------------------------------------------------------------------------------##
def HydraulicSupport():
    for i in range(0, numWinds):
        temp = [Vector3(widthHydr * i, shield_y_0, shield_z_0),
                Vector3(widthHydr * (i + 1), shield_y_0, shield_z_0),
                Vector3(widthHydr * (i + 1), shield_y_1, shield_z_1),
                Vector3(widthHydr * i, shield_y_1, shield_z_1)
                ]
        positionShield.append(temp)
        temp = [
            Vector3(widthHydr * i, wind_y_0, wind_z_0),
            Vector3(widthHydr * (i + 1), wind_y_0, wind_z_0),
            Vector3(widthHydr * (i + 1), wind_y_1, wind_z_1),
            Vector3(widthHydr * i, wind_y_1, wind_z_1)
        ]
        positionWind.append(temp)
        temp = [
            Vector3(widthHydr * i, topBeam_y_0, highHydr),
            Vector3(widthHydr * (i + 1), topBeam_y_0, highHydr),
            Vector3(widthHydr * (i + 1), topBean_y_1, highHydr),
            Vector3(widthHydr * i, topBean_y_1, highHydr)
        ]
        positionTopBeam.append(temp)
    # kwBoxes={'color':[1,0,0],'wire':False,'dynamic':False,'material':0}
    # vibrationRotationPlate = O.bodies.append(geom.facetBox((-15,5,-5),(2,2,2),wallMask=16,**kwBoxes))
    Shield1 = pack.sweptPolylines2gtsSurface([positionShield[0]], capStart=True, capEnd=True, )
    Shield2 = pack.sweptPolylines2gtsSurface([positionShield[1]], capStart=True, capEnd=True)
    Shield3 = pack.sweptPolylines2gtsSurface([positionShield[2]], capStart=True, capEnd=True)
    Shield4 = pack.sweptPolylines2gtsSurface([positionShield[3]], capStart=True, capEnd=True)
    Shield5 = pack.sweptPolylines2gtsSurface([positionShield[4]], capStart=True, capEnd=True)

    IDShield1 = O.bodies.append(pack.gtsSurface2Facets(Shield1, color=colorShield,material = HyMat))
    IDShield2 = O.bodies.append(pack.gtsSurface2Facets(Shield2, color=colorShield))
    IDShield3 = O.bodies.append(pack.gtsSurface2Facets(Shield3, color=colorShield))
    IDShield4 = O.bodies.append(pack.gtsSurface2Facets(Shield4, color=colorShield))
    IDShield5 = O.bodies.append(pack.gtsSurface2Facets(Shield5, color=colorShield))

    Wind1 = pack.sweptPolylines2gtsSurface([positionWind[0]], capStart=True, capEnd=True)
    Wind2 = pack.sweptPolylines2gtsSurface([positionWind[1]], capStart=True, capEnd=True)
    Wind3 = pack.sweptPolylines2gtsSurface([positionWind[2]], capStart=True, capEnd=True)
    Wind4 = pack.sweptPolylines2gtsSurface([positionWind[3]], capStart=True, capEnd=True)
    Wind5 = pack.sweptPolylines2gtsSurface([positionWind[4]], capStart=True, capEnd=True)

    IDWind1 = O.bodies.append(pack.gtsSurface2Facets(Wind1, color=colorWind))
    IDWind2 = O.bodies.append(pack.gtsSurface2Facets(Wind2, color=colorWind))
    IDWind3 = O.bodies.append(pack.gtsSurface2Facets(Wind3, color=colorWind))
    IDWind4 = O.bodies.append(pack.gtsSurface2Facets(Wind4, color=colorWind))
    IDWind5 = O.bodies.append(pack.gtsSurface2Facets(Wind5, color=colorWind))

    TopBeam1 = pack.sweptPolylines2gtsSurface([positionTopBeam[0]], capStart=True, capEnd=True)
    TopBeam2 = pack.sweptPolylines2gtsSurface([positionTopBeam[1]], capStart=True, capEnd=True)
    TopBeam3 = pack.sweptPolylines2gtsSurface([positionTopBeam[2]], capStart=True, capEnd=True)
    TopBeam4 = pack.sweptPolylines2gtsSurface([positionTopBeam[3]], capStart=True, capEnd=True)
    TopBeam5 = pack.sweptPolylines2gtsSurface([positionTopBeam[4]], capStart=True, capEnd=True)

    IDTopBeam1 = O.bodies.append(pack.gtsSurface2Facets(TopBeam1, color=colorShield))
    IDTopBeam2 = O.bodies.append(pack.gtsSurface2Facets(TopBeam2, color=colorShield))
    IDTopBeam3 = O.bodies.append(pack.gtsSurface2Facets(TopBeam3, color=colorShield))
    IDTopBeam4 = O.bodies.append(pack.gtsSurface2Facets(TopBeam4, color=colorShield))
    IDTopBeam5 = O.bodies.append(pack.gtsSurface2Facets(TopBeam5, color=colorShield))
    IDWind = [IDWind1, IDWind2, IDWind3, IDWind4, IDWind5]
    return IDWind

##----------------------------------------------------------------------------------------##
def CoalLayer():
    ## establish coal layer
    IDCoal = O.bodies.append(
        pack.regularHexa(inAlignedBox((0, 0, highHydr), (lengthSpace, widthSpace, highCoal + highHydr)),
                         radius=radiusCoal, gap=0, color=colorCoal, material=matCoal))
    return IDCoal

##----------------------------------------------------------------------------------------##
def CoalRockLayer():
    ## estatblish the coal_rock layer_1
    spheresCoalRock1 = O.bodies.append(
        pack.regularHexa(inAlignedBox((0, 0, highHydr + highCoal - radiusCoal * 2),
                                      (lengthSpace, widthSpace, highCoal + highHydr + highCoalRock1)),
                         radius=radiusCoal, gap=0, color=colorCoal, material=matCoal))
    CoalList1 = spheresCoalRock1
    RockList1 = random.sample(range(spheresCoalRock1[0], spheresCoalRock1[-1]), int(len(spheresCoalRock1) / 8))
    for mm in RockList1:
        CoalList1.remove(mm)
        O.bodies[mm].shape.color = colorRock

    ## estatblish the coal_rock layer_2
    spheresCoalRock2 = O.bodies.append(
        pack.regularHexa(inAlignedBox((0, 0, highHydr + highCoal + highCoalRock1 - radiusCoal * 2),
                                      (lengthSpace, widthSpace, highCoal + highHydr + highCoalRock1 + highCoalRock2)),
                         radius=radiusCoal, gap=0, color=colorCoal, material=matCoal))
    CoalList2 = spheresCoalRock2
    RockList2 = random.sample(range(spheresCoalRock2[0], spheresCoalRock2[-1]), int(len(spheresCoalRock2) / 4))
    for mm in RockList2:
        CoalList2.remove(mm)
        O.bodies[mm].shape.color = colorRock

    ## estatblish the coal_rock layer_3
    spheresCoalRock3 = O.bodies.append(
        pack.regularHexa(inAlignedBox((0, 0, highHydr + highCoal + highCoalRock1 + highCoalRock2 - radiusCoal * 2),
                                      (lengthSpace, widthSpace,
                                       highCoal + highHydr + highCoalRock1 + highCoalRock2 + highCoalRock3)),
                         radius=radiusCoal, gap=0, color=colorCoal, material=matCoal))
    CoalList3 = spheresCoalRock3
    RockList3 = random.sample(range(spheresCoalRock3[0], spheresCoalRock3[-1]), int(len(spheresCoalRock3) / 2))
    for mm in RockList3:
        CoalList3.remove(mm)
        O.bodies[mm].shape.color = colorRock
    List_CR_Coal = CoalList1 + CoalList2 + CoalList3
    List_CR_Rock = RockList1 + RockList2 + RockList3
    return List_CR_Coal, List_CR_Rock

##----------------------------------------------------------------------------------------##
def RockLayer():
    ## estatblish the rock layer
    IDRock = O.bodies.append(
        pack.regularHexa(
            inAlignedBox((0, 0, highHydr + highCoal + highCoalRock1 + highCoalRock2 + highCoalRock3 - radiusCoal * 2),
                         (lengthSpace, widthSpace,
                          highCoal + highHydr + highCoalRock1 + highCoalRock2 + highCoalRock3 + highRock)),
            radius=radiusRock, gap=0, color=colorRock, material=matRock))
    return IDRock

def funHMRF(px, py_x, py_y, nAgent, nAction,mu,sigma,MAP_iter_number):
    sum_U_MAP = []
    U = 0
    d = 0.125
    Out = np.zeros((nAgent))
    for it in range(0, MAP_iter_number):
        U1 = np.zeros((nAgent, nAction))

        for i in range(0, nAction):
            yi_x = py_x - mu[i, 0]
            tempx = np.multiply(yi_x, yi_x) / (2*np.square(sigma[i, 0]))
            tempx = tempx + np.log(sigma[i, 0])
            yi_1 = py_y - mu[i, 1]
            tempy = np.multiply(yi_1, yi_1) / (2*np.square(sigma[i, 1]))
            tempy = tempy + np.log(sigma[i, 1])
            temp = tempx + tempy
            U1[:, i] = U1[:, i] + temp

        U2 = -1*np.log(px)
        U = U1 + U2
        Out = np.argmin(U, axis=1)
        tempSumU = np.min(U, axis=1)
        sum_U_MAP.append(np.sum(tempSumU))
        chechArray=np.array(sum_U_MAP)
        if it >= 2 and np.std(chechArray[it - 2:it])/sum_U_MAP[it] < 0.0001:
            break

    sum_U = 0
    for i in range(0, nAgent):
        sum_U = sum_U + U[i, Out[i]]
    return Out, sum_U
#
def funRL_HMRF_EM(state, actionProbability, nAction, mu, sigma, MAP_iter_number, EM_iter):

    # state is the neighbor coal ratio
    nAgent = int(len(state))
    x_x = state[:, 0].reshape(5)
    x_y = state[:, 1].reshape(5)
    x_left = np.array(x_x)
    x_right= np.array(x_y)

    P_lyi = np.zeros((nAction, nAgent))
    sum_U = []
    outAction = actionProbability

    for it in range(0, EM_iter):
        outAction, temp_sum_U = funHMRF(actionProbability,x_left, x_right, nAgent, nAction, mu, sigma, MAP_iter_number)
        for i in range(0, nAction):
            temp1 = np.exp(-np.square(x_left - mu[i, 0]) -np.square(x_right - mu[i,1]))/(2*pi*sigma[i, 0]*sigma[i, 1])
            temp2 = actionProbability[:, i]
            P_lyi[i, :] = np.multiply(temp1, temp2)

        temp3 = np.sum(P_lyi, axis=0)
        P_lyi = np.divide(P_lyi, temp3)

        for i in range(0, nAction):
            if np.sum(P_lyi[i,:]) == 0:
                P_lyi[i, :] = 0.00001

            mu[i, 0] = np.dot(P_lyi[i,:], x_left)
            mu[i, 0] = mu[i, 0] / np.sum(P_lyi[i,:])
            mu[i, 1] = np.dot(P_lyi[i, :], x_right)
            mu[i, 1] = mu[i, 1] / np.sum(P_lyi[i, :])

        for i in range(0,nAction):
            if np.sum(P_lyi[i,:]) == 0:
                P_lyi[i, :] = 0.00001
            sigma[i, 0] = np.dot(P_lyi[i,:], np.square(x_left-mu[i,0]))
            sigma[i, 0] = sigma[i,0] / np.sum(P_lyi[i,:])
            sigma[i, 0] = np.sqrt(sigma[i, 0])

            sigma[i, 1] = np.dot(P_lyi[i, :], np.square(x_right - mu[i, 1]))
            sigma[i, 1] = sigma[i, 1] / np.sum(P_lyi[i, :])
            sigma[i, 1] = np.sqrt(sigma[i, 1])

        sum_U.append(np.sum(temp_sum_U))
        chechArray = np.array(sum_U)

        if it >= 2 and np.std(chechArray[it-2: it]) / sum_U[it] < 0.0001 :
            break
    return outAction, mu, sigma, sum_U

##---------------------------------------##
def GetRewardState():
    global nCoal, windPosition
    global nRock, percentCoalStopEpisode
    checkArea_y_0 = 0
    checkArea_y_1 = shield_y_0
    checkArea_z_1 = shield_z_0
    checkArea_z_0 = shield_z_0 - lengthTail + radiusCoal
    State = np.zeros((5, 4), dtype=int)
    Reward = np.zeros((5, 3), dtype=int)
    ##----------get reward-------------##

    temp_coal_remove=[]
    for i in nCoal:
        temp = O.bodies[i].state.pos
        if temp[2] <= checkArea_z_0: ## the particle high lower than the boundary, thought the window
            Reward[int(temp[0] / widthHydr), 0] += 1
            temp_coal_remove.append(i)

    for i in temp_coal_remove:
        nCoal.remove(i)

    temp_rock_remove=[]
    for i in nRock:
        temp = O.bodies[i].state.pos
        if temp[2] <= checkArea_z_0: ## the particle high lower than the boundary, thought the window
            Reward[int(temp[0] / widthHydr), 1] += 1
            temp_rock_remove.append(i)

    for i in temp_rock_remove:
        nRock.remove(i)

    for i in range(0, numWinds):
        Reward[i, 2] = Reward[i, 0] * RewardCoal + Reward[i, 1] * RewardRock

    ##--------get state--------------##
    ## area for checking the state with wind closed
    # state[:,0]---nCoal
    # state[:,1]---nRock
    # state[:,2]---nCoal + nRock
    # state[:,3]---coal_rate
    for b in nCoal:
        temp = O.bodies[b].state.pos
        if (checkArea_y_0 < temp[1] < checkArea_y_1)&(checkArea_z_0 <= temp[2] <= checkArea_z_1):
            for i in range(numWinds):
                if i * widthHydr < temp[0] < (i + 1) * widthHydr:
                    # O.bodies[b].shape.color = colorState
                    State[i, 0] = State[i, 0] + 1

    for b in nRock:
        temp = O.bodies[b].state.pos
        if (checkArea_y_0 < temp[1] < checkArea_y_1) & (checkArea_z_0 <= temp[2] <= checkArea_z_1):
            for i in range(numWinds):
                if i * widthHydr < temp[0] < (i + 1) * widthHydr:
                    State[i, 1] = State[i, 1] + 1 ## State[i,0] numCoal
                    # O.bodies[b].shape.color = colorState

    for i in range(numWinds):
        State[i, 2] = State[i, 0] + State[i, 1] # total number
        if State[i, 2] == 0:
            State[i, 3] = 0 # coal rate
        else:
            State[i, 3] = State[i, 0] * 100 / State[i, 2] # coal rate

    Done = True
    for i in range(numWinds):
        if (State[i, 3] <= percentCoalStopEpisode) & (State[i, 2] != 0):
            Done = (Done & True)
        else:
            Done = False
    return Reward, State, Done

## define the agent
class Net(nn.Module):
    """docstring for Net"""
    def __init__(self, numState, numAction):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(numState, 56)
        nn.init.constant_(self.fc1.weight, 0.5)
        self.bn1 = nn.BatchNorm1d(56)
        self.fc2 = nn.Linear(56, 128)
        nn.init.constant_(self.fc2.weight, 0.5)
        self.bn2 = nn.BatchNorm1d(128)
        self.out = nn.Linear(128, numAction)
        nn.init.constant_(self.out.weight, 0.1)
        self.active = torch.nn.LeakyReLU()

    def forward(self, x):
        x = self.fc1(x)
        x = self.active(x)
        x = self.fc2(x)
        x = self.active(x)
        x = self.out(x)
        return x

class AgentDQN():
    """docstring for DQN"""
    def __init__(self):
        super(AgentDQN, self).__init__()
        self.numState = 6
        self.numActions = 2
        self.memory_capacity = 100000
        self.batch_size = 2
        self.t_net_update = 300

        self.memory_full = False
        self.start_learnig = 100
        self.lr = 0.001
        self.gamma = 0.999
        self.epsilon = 0.999 ## for random
        self.epsilon_decay = (1 - 0.000001)
        self.epsilon_min = 0.001
        self.eval_net = Net(self.numState, self.numActions)
        self.target_net = Net(self.numState, self.numActions)
        self.learn_step_counter = 0
        self.memory = np.zeros((self.memory_capacity, self.numState * 2 + 2))
        self.memory_counter = 0
        self.pathExp = outputDir + '/experience/'
        self.pathHMRFpara = outputDir + '/HMRFpara/'

        self.optimizer = torch.optim.Adam(self.eval_net.parameters(), lr=self.lr)
        self.loss_func = nn.MSELoss()

        self.eval_params = list(self.eval_net.named_parameters())
        self.target_params = list(self.target_net.named_parameters())

        self.HMRF_mu = []
        self.HMRF_sigma = []

    def choose_action(self, state, Testing):
        if Testing:
            self.epsilon = self.epsilon_min
        state = torch.tensor(state, dtype=torch.float)
        state = state.reshape((1, 6))

        if torch.rand(1) <= self.epsilon: # greedy policy for random
            action = np.random.randint(0, self.numActions)
            action = action
            actionTpye = 1
            action_probs = [0.5, 0.5]
        else: # policy
            action_value = self.eval_net(state)
            action = torch.max(action_value, 1)[1].data.numpy()
            action = action[0]
            actionTpye = 0
            action_probs = F.softmax(action_value).data
            action_probs = action_probs.reshape((1, 2))
        return action, action_probs, actionTpye

    def learn(self):
        if self.memory_counter >= self.start_learnig:
            if self.learn_step_counter % self.t_net_update == 0:
                self.target_net.load_state_dict(self.eval_net.state_dict())
                print('TargetNet Change')
            self.learn_step_counter += 1
            # sample batch from memory
            if self.memory_full:
                sample_index = np.random.choice(self.memory_capacity, self.batch_size)
            else:
                sample_index = np.random.choice(self.memory_counter, self.batch_size)

            batch_memory = self.memory[sample_index, :]
            batch_state = torch.FloatTensor(batch_memory[:, :self.numState]) / 100
            batch_action = torch.LongTensor(batch_memory[:, self.numState:self.numState + 1].astype(int))
            batch_reward = torch.FloatTensor(batch_memory[:, self.numState + 1:self.numState + 2]) / 10
            batch_next_state = torch.FloatTensor(batch_memory[:, -self.numState:])
            batch_next_state[:, 0:self.numState] = batch_next_state[:, 0:self.numState] / 100

            self.optimizer.zero_grad()
            q_eval = self.eval_net(batch_state).gather(1, batch_action)
            q_next = self.target_net(batch_next_state).detach()
            q_target = batch_reward + self.gamma * q_next.max(1)[0].view(self.batch_size, 1)
            loss = self.loss_func(q_eval, q_target)
            loss.backward()
            self.optimizer.step()
            if self.epsilon > self.epsilon_min:
                self.epsilon *= self.epsilon_decay
            startLearning = 1
        else:
            loss = 0
            startLearning = 0
        return loss, startLearning, self.epsilon

    def store_transition(self, state, action, reward, next_state):
        if (state[0] != 0) & (state[1] != 0) & (state[2] != 0) & (state[3] != 0) & (state[4] != 0) & (state[5] != 0):
            transition = np.hstack((state, [action, reward], next_state))
            index = self.memory_counter % self.memory_capacity
            self.memory[index:] = transition
            self.memory_counter += 1

            if self.memory_counter >= self.memory_capacity:
                self.memory_full = True

    def saveExperience(self, nAgent):
        np.save(self.pathExp + "winAgents{}/".format(nAgent) + 'experience.npy', self.memory)
        np.save(self.pathExp + "winAgents{}/".format(nAgent) + 'counter.npy', self.memory_counter)

    def saveHMRFparameters(self, nAgent):
        np.save(self.pathHMRFpara + "winAgents{}/".format(nAgent) + 'HMRF_mu.npy', self.HMRF_mu)
        np.save(self.pathHMRFpara + "winAgents{}/".format(nAgent) + 'HMRF_sgma.npy', self.HMRF_sigma)

    def loadExperience(self, nAgent):
        if os.path.exists(self.pathExp + "winAgents{}/".format(nAgent) + "experience.npy"):
            self.memory = np.load(self.pathExp + "winAgents{}/".format(nAgent) + 'experience.npy')
            self.memory_counter = np.load(self.pathExp + "winAgents{}/".format(nAgent) +'counter.npy')
        print(len(self.memory), self.memory_counter)

    def saveWeight(self, path):
        dict = {'eval_net_dict': self.eval_net.state_dict(),
                'target_net_dict': self.target_net.state_dict(),
                }
        torch.save(dict, path)

    def loadWeight(self, path):
        checkPoint = torch.load(path)
        self.eval_net.load_state_dict(checkPoint['eval_net_dict'])
        self.target_net.load_state_dict(checkPoint['target_net_dict'])

###------------------end agent define------------------
def CheckEpisodeEnd():
    global checkCounter, saveLoss, totalLoss
    for i in range(0, numWinds):
        path = outputDir + "/weights/winAgents{}".format(i) + '/C{:05d}'.format(checkCounter)
        windAgents[i].saveWeight(path=path)
        windAgents[i].saveExperience(i)
        windAgents[i].saveHMRFparameters(i)

    if not Testing:
        with open(outputDir + '/processResult.csv', 'a') as csvFile:
            writer = csv.writer(csvFile)
            writer.writerow(saveLoss)
        csvFile.close()
    checkCounter += 1

def ResteLocation():
    global nCoal, nRock, nEpisode, saveCounter
    for i in range(0, len(savePositionCoal)):
        n = savePositionCoal[i][0]
        o.bodies[n].state.pos = savePositionCoal[i][1]
    for i in range(0, len(savePositonRock)):
        n = savePositonRock[i][0]
        o.bodies[n].state.pos = savePositonRock[i][1]
    nEpisode += 1
    nCoal = IDCoal + List_CR_Coal
    nRock = IDRock + List_CR_Rock

    for i in nCoal:
        O.bodies[i].shape.color = colorCoal
    for i in nRock:
        O.bodies[i].shape.color = colorRock

##---------------------------------------##
def WindowsAction(IDWind):
    global WinAction, windPosition
    RotationW = [RotationW1, RotationW2, RotationW3, RotationW4, RotationW5]
    for nW in range(0, numWinds):
        ## action
        Pos_z = sum(O.bodies[i].state.pos[2] for i in IDWind[nW]) / len(IDWind[nW])
        if WinAction[nW] == 0:
            RotationW[nW].angularVelocity = -VelocityWindNegative
        else:
            RotationW[nW].angularVelocity = VelocityWindPositive

        NegtiveStop = (RotationW[nW].angularVelocity > 0) & (Pos_z <= windLowerBoundary)
        PostiveStop = (RotationW[nW].angularVelocity < 0) & (Pos_z >= windUpperBoundary)

        if NegtiveStop: windPosition[nW] = windPositionNegative
        elif PostiveStop: windPosition[nW] = windPositionPositive
        else: windPosition[nW] = 0

        if NegtiveStop | PostiveStop:
            RotationW[nW].angularVelocity = 0

def JustForLearning():
    loss = np.zeros(6, dtype=float)
    for i in range(0, numWinds):
        loss[i], epslong, startLearning = windAgents[i].learn()
        loss[numWinds] = loss[numWinds] + loss[i]

    if (o.iter % 20 == 0):
        global checkCounter, saveLoss, totalLoss
        for i in range(0, numWinds):
            path = outputDir + "/weights/winAgents{}".format(i) + '/C{:05d}'.format(checkCounter)
            windAgents[i].saveWeight(path=path)
            if not Testing:
                with open(outputDir + '/processResult.csv', 'a') as csvFile:
                    writer = csv.writer(csvFile)
                    writer.writerow(saveLoss)
                csvFile.close()
        checkCounter += 1
        print("iter:", o.iter, "loss:%.6f, %.6f, %.6f, %.6f, %.6f total=%.6f" % (loss[0], loss[1], loss[2], loss[3], loss[4], loss[5]))
        print('---------------------------------------------------------------\n')

##---------------------------------------##
def WindowsInitialLocation(IDWind):
    global WinAction
    WinAction = np.zeros((5, 1), dtype=int)
    WindowsAction(IDWind)
    if os.path.exists(outputDir + "/weights/winAgents0/" + loadPoint):
        for i in range(0, numWinds):
            path = outputDir + "/weights/winAgents{}/".format(i) + 'C' + loadPoint
            windAgents[i].loadWeight(path)

    for i in range(0, numWinds):
        windAgents[i].loadExperience(i)

##---------------------------------------##
def WindowsControl():
    global saveCounter, LastState, LastAction, saveLoss, TotalReward, listAction
    global WinAction, nRock, nCoal
    global mu_in, sigma_in, MAP_iter_number, EM_iter

    Reward, state, done = GetRewardState()
    TotalReward = TotalReward + Reward
    LastReward = Reward[:, 2]
    CurrentAction = np.zeros((5, 1))
    Action_probs = np.ones((5, 2))*0.5

    # CurrentState = np.concatenate((CurrentState, LastAction), axis=1)
    ## area for checking the state with wind closed
    # state[:,0]---nCoal
    # state[:,1]---nRock
    # state[:,2]---nCoal + nRock
    # state[:,3]---coal_rate

    # CurrentState include the particle number and the coal ratio of the current agent and neigborhood.
    CurrentState = np.zeros((5, 6))

    for i in range(0, numWinds):
        if i == 0:
            CurrentState[i, 0:2] = state[i, 2:4]
            CurrentState[i, 2:4] = state[i, 2:4]
            CurrentState[i, 4:6] = state[i + 1, 2:4]
        if i == numWinds - 1:
            CurrentState[i, 0:2] = state[i - 1, 2:4]
            CurrentState[i, 2:4] = state[i, 2:4]
            CurrentState[i, 4:6] = state[i, 2:4]
        if 0 < i < numWinds - 1:
            CurrentState[i, 0:2] = state[i - 1, 2:4]
            CurrentState[i, 2:4] = state[i, 2:4]
            CurrentState[i, 4:6] = state[i + 1, 2:4]

    actType = np.zeros(5, dtype=int)
    for i in range(0, numWinds):
        CurrentAction[i], Action_probs[i], actType[i] = windAgents[i].choose_action(CurrentState[i, :], Testing)

    if Testing:
        s1 = torch.from_numpy(CurrentState[:, 1].reshape(5, 1)) # coal ratio of left neighbor
        s2 = torch.from_numpy(CurrentState[:, 5].reshape(5, 1)) # coal ratio of right neighbor
        state_HMRF = torch.cat((s1, s2), 1)
        action_from_HMRF, mu_in, sigma_in, sum_U = funRL_HMRF_EM(state_HMRF, Action_probs, 2, mu_in,
                                                                    sigma_in, MAP_iter_number, EM_iter)
        CurrentAction = action_from_HMRF

    for i in range(0, numWinds):
        windAgents[i].HMRF_mu.append(mu_in[i,:])
        windAgents[i].HMRF_sigma.append(sigma_in[i,:],)

    if done :
        print('=====This episode is over ======')
        CurrentAction = np.zeros((5, 1))
        ResteLocation()

    loss = np.zeros(6, dtype=float)
    epslong = 0
    if (saveCounter >0) & (not Testing):
        startLearning = 0
        for i in range(0, numWinds):
            windAgents[i].store_transition(LastState[i, :], LastAction[i], LastReward[i], CurrentState[i,:])
            loss[i], epslong, startLearning = windAgents[i].learn()
            loss[numWinds] = loss[numWinds] + loss[i]
        if startLearning == 1:
            saveLoss.append(loss)

    totalCoal = 0
    totalRock = 0
    CoalRate = 0
    RockRate = 0
    totalRe = 0

    for i in range(0, numWinds):
        totalCoal += TotalReward[i, 0]
        totalRock += TotalReward[i, 1]
        if (totalRock + totalCoal > 0):
            CoalRate = totalCoal / (totalRock + totalCoal)
            RockRate = totalRock / (totalRock + totalCoal)
        totalRe += TotalReward[i, 2]

    AT = ['Max', 'Max', 'Max', 'Max', 'Max']
    for i in range(0, numWinds):
        AT[i] = 'Policy' if actType[i] == 0 else "Ran"

    print('nEpisode', nEpisode, 'iter:', o.iter, 'nSave:', saveCounter, 'Epslong:%.4f' % (epslong))
    print("numCurrent:", state[:, 2], 'CoalRate:', state[:, 3], " LastReward:", LastReward, 'LastAct:', LastAction.T)
    print('CurrenAct:', WinAction.T, 'Type:', AT,
          "loss:%.2f, %.2f, %.2f, %.2f, %.2f total=%.2f" % (loss[0], loss[1], loss[2], loss[3], loss[4], loss[5]))
    print('---------------------------------------------------------------\n')
    WinAction = CurrentAction
    LastState = CurrentState
    LastAction = CurrentAction
    listAction.append(CurrentAction)
    np.save(outputDir + '/SaveAction_AC.np', np.array(listAction))
    saveCounter += 1

##——————————————————————Creat Agents and ——————————————————————————————————————##
windAgents = [AgentDQN() for i in range(numWinds)]

for i in range(0, numWinds):
    if not os.path.exists(outputDir + "/weights/winAgents{}".format(i)):
        os.makedirs(outputDir + "/weights/winAgents{}".format(i))

    if not os.path.exists(outputDir + "/experience/winAgents{}".format(i)):
        os.makedirs(outputDir + "/experience/winAgents{}".format(i))

    if not os.path.exists(outputDir + "/HMRFpara/winAgents{}".format(i)):
        os.makedirs(outputDir + "/HMRFpara/winAgents{}".format(i))

##-----------------------------------------------------------------##
saveCounter = 0
nEpisode = 0
Ground()
Boundary()
# Dummy()
IDWind = HydraulicSupport()
IDCoal = CoalLayer()
TotalReward = np.zeros((5,3))
List_CR_Coal = []
List_CR_Rock = []
if AddCoalRock:
    List_CR_Coal, List_CR_Rock = CoalRockLayer()

IDRock = RockLayer()
nCoal = IDCoal + List_CR_Coal
nRock = IDRock + List_CR_Rock

savePositionCoal = []
savePositonRock = []
listAction = []

for i in nCoal:
    temp = [i, o.bodies[i].state.pos]
    savePositionCoal.append(temp)

for i in nRock:
    temp = [i, o.bodies[i].state.pos]
    savePositonRock.append(temp)
##———————————————————————engines—————————————————————————————##
O.engines = [
    ForceResetter(),
    InsertionSortCollider([Bo1_Sphere_Aabb(),Bo1_Facet_Aabb(),Bo1_Wall_Aabb()]),
    InteractionLoop(
        [Ig2_Sphere_Sphere_ScGeom(), Ig2_Facet_Sphere_ScGeom(), Ig2_Wall_Sphere_ScGeom()],
        [Ip2_FrictMat_FrictMat_FrictPhys(),
         Ip2_JCFpmMat_JCFpmMat_JCFpmPhys(xSectionWeibullScaleParameter=0.5,
        xSectionWeibullShapeParameter=0.5,
        weibullCutOffMin=0,
        weibullCutOffMax=10)],
        [Law2_ScGeom_JCFpmPhys_JointedCohesiveFrictionalPM(recordCracks=False, recordMoments=False,label='interactionLaw'),
        Law2_ScGeom_FrictPhys_CundallStrack()]
        ),
    GlobalStiffnessTimeStepper(),
    # VTKRecorder(recorders=['jcfpm','cracks','facets','moments']),
    NewtonIntegrator(gravity=(0, 0, myGraviaty), damping=0.5, label='down'),
    RotationEngine(rotationAxis=(1, 0, 0), rotateAroundZero=True, ids=IDWind[0], zeroPoint=positionWind[0][2],
                   label='RotationW1'),
    RotationEngine(rotationAxis=(1, 0, 0), rotateAroundZero=True, ids=IDWind[1], zeroPoint=positionWind[1][2],
                   label='RotationW2'),
    RotationEngine(rotationAxis=(1, 0, 0), rotateAroundZero=True, ids=IDWind[2], zeroPoint=positionWind[2][2],
                   label='RotationW3'),
    RotationEngine(rotationAxis=(1, 0, 0), rotateAroundZero=True, ids=IDWind[3], zeroPoint=positionWind[3][2],
                   label='RotationW4'),
    RotationEngine(rotationAxis=(1, 0, 0), rotateAroundZero=True, ids=IDWind[4], zeroPoint=positionWind[4][2],
                   label='RotationW5'),
    PyRunner(command="WindowsInitialLocation(IDWind)", firstIterRun=1),

    # PyRunner(command="JustForLearning()", iterPeriod=2),
    PyRunner(command="WindowsAction(IDWind)", iterPeriod=1),
    ##
    PyRunner(command="WindowsControl()", iterPeriod=nIterControl),
    PyRunner(command="CheckEpisodeEnd()", iterPeriod=nCheckEnd),
    PyRunner(command="ResteLocation()", iterPeriod=nIterReload),

]
# o.run(1200000000000, True)

(2) I have not checked that without GUI. Becuase if there is no GUI the process speed of the simulation maybe diffrent. I will check that tommorrow

(3) "very obviously" means the speed of server is show more than 10 times.

(4) If the PC and server use the same job, such as -j8, even -j2. the speed is the same. That likes the just a core of server working.

2 Son Pham Thai (pham-thai-son-987)

Thank you very much. I load the GUI in the command window. The remote server and controll PC in a local area network. I will check the speed of the GUI on the server directly. But I worry the speed maybe is the same.

3. Bruno Chareyre (bruno-chareyre)
Thank you very much. You are right, each core of PC is faster then the server. But I need a faster speed for the simulation by parallel computation. So I use 96 core. If the performance is just depend on a core, that is a very terrible message to me.

4. Janek Kozicki (cosurgi) :
Thank you very much. But I am sorry that I don't know your mean, can you tell me where could I the example or the material?

Janek Kozicki (cosurgi) said : #6

run this command:

yade --stdperformance -j8

or this command:

yade --quickperformance -j8

Jan Stránský (honzik) said : #7

> But I need a faster speed for the simulation by parallel computation. So I use 96 core.

Would another usage of the power be meaningful, e.g. running in parallel 12 simulation, each using -j8?

> So I use 96 core.

Are you sure there are no other processes running on the computer? How e.g. -j 48 behaves?

> If the performance is just depend on a core, that is a very terrible message to me.

I did not understand this, but the speed-up depends on the combination of the number of cores used AND the script. There are many factors..

cheers
Jan

Janek Kozicki (cosurgi) said : #8

Sorry, that was written in a hurry. I mean that you can use these two commands to measure the computer performance using a standardized test.

> So I use 96 core.

Maybe it is a bad idea. Or maybe it is ok but it doesn't give much more compared to, say, 48.
OpenMP may have an optimum somewhere and maybe it is 24 cores, or 48 (could well be depending on hardware), or whatever it is. You'll have to find out.
I didn't hear about previous tests on 96, that would be interesting to know your results.
Cheers
Bruno

Ah: and it is absolutely critical that you keep GUI turned *off*.
Else the comparisons are meaningless, you could be testing the GPU. You don't use GUI for long simulations anyway.
B

> If the performance is just depend on a core, that is a very terrible message to me

Not really. The message was: 1/ of course you don't automatically gain speedup by using more cores, 2/ comparing just a pair of (very different) N gives no real insight on where the optimum is (and there is an optimum, always). So the only solid data at the moment is per-core speed, if it is shocking to hear it could be a way to put the problem a bit differently. :)
B.

yang yi (yangyizaixian) said : #12

 Janek Kozicki (cosurgi) , Jan Stránský (honzik) , Bruno Chareyre (bruno-chareyre)

 Thank you very much for answer my question. According to your suggestion, I checked the server like this

(1) Run yade by the command:
      ~/myYade/install/bin/yade -j90 20200522.py
(2) Start the script by the 'start' button on the Controller(), without "3D" show

(3) I did 10 iteration test and the time result as follows

     PC: -j8 39.45s \ 47.79s
     Server: -j32 26.1s
                   -j48 24.51s
                   -j50 25.32s
                   -j52 24.55s
                   -j60 26.53s
                   -j88 23.29s \ 23.12s
    -j90 23.27s \ 23.72s
                   -j92 23.05s \ 23.76s \ 24.24s \ 24.09s
                   -j94 23.48s
                   -j96 23.91s
     Because I get the time by hand, so there is errors. The results seems that the server is faster than PC without 3D show and the -j92 or -j90 maybe the optimal jobs.

(4) I used the command : yade --stdperformance -j8 to test the performance
     But I just to guess the result, I wirte as below
     -j8 1465.98 iter/s
     -j48 1386.40 iter/s
     -j88 1427 iter/s
     -j90 1279 iter/s 1467 iter/s
     -j92 1385 iter/s
     -j96 1313 iter/s

     I find the number is unstable. It seems that -j90 is the best.

(5) For the above test, I guess that:
      The before test, I take 3D show, the server is slower than PC very slowly, that is becuase the GUI is depend on just one core of the CPU. For one core of I7-9700k 3.6GH is better than 6146 , 3.2GH. IF my guess is right or not?

(6) I have two questions:
     1) It looks that the falling speed of particles and the roation speed of the plank in the 3D show are different, if the I modify the number of the particles. So in the Server, I hope to check if the rotation speed matches the falling speed of particles or not. The direct way is the 3D show. However, if 3D show is used, the speed is very very slow. my question is :
    if I close the 3D show, the speed of the plank and particles are the same with 3D show open or not?

    2) If the speed of plank and particles are not depend on the 3D show. I will storage the bodies' location of the simulation process, and replay the locations after the simulation is over. my question : is there a better way to do that?

Thank you very much

Jan Stránský (honzik) said : #13

> -j32 26.1s
> -j48 24.51s
> -j90 23.27s \ 23.72s
> -j92 23.05s \ 23.76s \ 24.24s \ 24.09s
> the -j92 or -j90 maybe the optimal jobs

pretty much depending on the definition of "optimal". Compared to -j32, -j90 is just 10% faster, but using almost 3x more CPU power.

> -j48 1386.40 iter/s
> -j90 1279 iter/s
> It seems that -j90 is the best.

same as above

> It looks that ... in the 3D show are different,
> I hope to check if the rotation speed matches the falling speed of particles or not. The direct way is the 3D show.

3D view is just for rough checks. Use "hard numbers" for serious comparisons (like O.bodies[...].state.vel)

> if I close the 3D show, the speed of the plank and particles are the same with 3D show open or not?

yes (if you mean the simulation-world speed)

> If the speed of plank and particles are not depend on the 3D show.

simulation state definitely is not influenced by the 3D view (if it was, it would be a bug).

> I will storage the bodies' location of the simulation process, and replay the locations after the simulation is over.

I would say this is a standard approach

> is there a better way to do that?

to do what? (storage the bodies' location, replay after the simulation, after the simulation is over, ...)

cheers
Jan

> Start the script by the 'start' button on the Controller(), without "3D" show [...] did 10 iteration test [...] Because I get the time by hand, so there is errors

How do you get time "by hand" for 10 iterations after clicking "start"?
You manage to click "stop" right in time?

                   -j48 24.51s
                   -j52 24.55s
                   -j60 26.53s
                   -j88 23.29s \ 23.12s
                   -j90 23.27s \ 23.72s
                   -j96 23.91s

Mmmmh... sorry but, why do you refuse to measure -j1? Is it against some sort of religion?
I have seen this "-jNmax must be better" approach before, with people doing it always - even at the price of slowing down their daily workstation. It is in fact a very bad practice. It could as well be slower! If you don't try you'll never know and you will waste 96 cores 24/24 for no good.
It is obvious in your numbers than you gain *nothing* in this range of -j. We will not discuss 5%, especially considering time measurements by hand.

If I may ask, are you the author of the script you posted? Because I think that's what needs inspection first. Is torch module parallel?
Also there are many pyRunner's inside, even one with iterPeriod=1. How much do they cost, and do they exploit OpenMP?

Bruno

yang yi (yangyizaixian) said : #15

To Jan Stránský (honzik) and Bruno Chareyre (bruno-chareyre) :

Thank you very much for you professional response.
My the simulation process likes this:
     The two kinds of particles falling down and the plank will open or close according to the rate of one kind of particle. My job is to design the control algorithm of the plank.

Because if the martial parameter of particles different, the falling speed of particles and action speed of plank is different. So I must to check if the two speeds are matching or not. And, because I just want to verify the control algorithm, so I hope the falling process could be as fast as possible since training the control algorithm needs a great number of episodes, maybe need two days or three days. So we buy the server. And the server is just for this simulation now. That is why I hope to use the maximum core of this server. I understand you suggestion to get the optimal price of cores. But now for me, to get the fastest speed of the simulation is the aim.

I am the author of the posted script. Actually, I just know the running model of pytorch and just to call the neural network package, I do not deeply to learning what the script structure is the best for matching YADE. Bruno Chareyre is right, the pyrunner decide the runing speed. But my question is I use the same OS, the same script, and same configuration in both of PC and server, if I get 3D show, the speed in PC is faster than in the server. However, if I closed the 3D show, the calculation speed of server is faster than PC.

Jan Stránský told me the simulation-world speed is the same regardless of the 3D show open or close and the simulation state definitely is not influenced by the 3D view. So it makes me very confusion.

without 3D show the calculation speed of server is faster than PC, that mean the iteration number is more than PC. so the 3D view checks the locations roughly, we could find the falling speed is faster than PC. But actually, I find the speed in the server is slower than in PC.

So, I will try like this way, to storage the location and check what will happen.

Thank you very much

> I understand you suggestion to get the optimal price of cores. But now for me, to get the fastest speed of the simulation is the aim.

Yes. That's exactly my point. It could be that the fastest is -j1, or that it is just equal.

Here is some bad news: if you write an ordinary python program and you assign more than one openMP thread to it, the change of execution time will be null. You can try on your server by running a minimal program, just try this in the command line:

:~$ export OMP_NUM_THREADS=1
:~$ time python3 -c "for k in range(int(1e7)): a=k**0.124"
real 0m2.240s
user 0m2.190s
sys 0m0.009s

Then:
:~$ export OMP_NUM_THREADS=96
:~$ time python3 -c "for k in range(int(1e7)): a=k**0.124"
real 0m2.190s
user 0m2.181s
sys 0m0.008s

That's how programming works in many cases. The variable OMP_NUM_THREADS is even not used when python runs. It is the same for your PyRunners. The fact that yade is parallel doesn't make *your* program parallel.

Here is the good news:
If you have to train your algorithm on many realizations then you can train 96 cases simultaneously (assuming RAM is enough). Which will be *much* faster.
You can, either, hire colleagues to have more hands clicking the open/close for 96 planks (will need many keyboards/mices with long cables for social distancing), or (safer) do it with hard numbers as Jan suggested.

If really you have no choice but running one single script at a time, then you need to re-think what your simulation script is doing - with a good idea of where/how time is spent.

B.

yang yi (yangyizaixian) said : #17

To Bruno Chareyre (bruno-chareyre):

Thank you very much. I test -j1 and -j8 on my PC, yes. the speed is the same. And I test the command ~$ export OMP_NUM_THREADS=1 and the result is the same with you.

I explain my understanding about your mean: the parallel programming, means there are 98 programming copies, and each copy for a cores. If my understanding right or not?

If my understanding is right, I have the following question. I know that my programming cannot parallelly operated. I just want the calculation of particles in a programming is parallel, such as calculation the pressure between the particles. Because there are huge number of particles, so the calculation is huge. If I increase the core, this kind of calculation can be improved?

Thank you for you second suggestion. Actually, the algorithm can not be trained parallelly. The aim of the training is to make the parameters to close the optimal value. The next episode must based on the current training result.

Thank you very much.

> the parallel programming, means there are 98 programming copies, and each copy for a cores. If my understanding right or not?

More or less.[1]

> If I increase the core, this kind of calculation can be improved?

In general yes.

> The next episode must based on the current training result.

Then you will have to improve your program. It is AFAIK the only way to gain time, based on your description.

Bruno

[1] https://en.wikipedia.org/wiki/Parallel_programming_model#Shared_memory

yang yi (yangyizaixian) said : #19

Bruno Chareyre (bruno-chareyre)

Thank you very much. It is not a good news for me. OK. I will try. Thank youo