running on a server is slower than on a PC
Dear friend
I meet a very strange question. Our Lab. take a new server and I install the yade on the server. But I find that running on the server is slower than on the PC.
(1) The hardware and configuration of the server is
CPU: Intel XEON Gold 6146 * 4, 3.2GH 96 core
Memory : 256GB
OS: Ubuntu 18.04 64bits
Yade: https:/
(2) The hardware and configuration of the PC
CPU : I7-9700k 3.6GH 8 core
Memory :32GB
OS Ubunut 18.04 64bit
Yade: https:/
I operate the server by remote desktop -- VNC viewer
I cmake and install the yade like this:
cd ~/myYade/build
cmake -DCMAKE_
make -j96
make install
I run the yade like this:
(1) On the server
~/PycharmProjec
(2) On the PC
~/PycharmProjec
I open the yade simulation, show 3D
The speed of simulation process on server is slower than on the PC, and it is very very obviously.
When I make the yade : make -j96. The speed is very quickly. But when I running the yade is slower than on the PC.
The server supplier can insure that the hardware of the server is right.
Please kindly help me. If the OS on server is right? If process of installation yade on server right? If the running the yade is right? It makes me very confusion.
Thank you very much.
Yang Yi
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- Yade Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- yang yi
- Solved:
- Last query:
- Last reply:
Revision history for this message
|
#1 |
Hello,
TLTR: do not compare -j8 and -j96, compare -j8 for both systems. Difference of -j8 and -j96 is another problem not much related to different hardware.
Long answer:
to answer / give you some hints, we need more information, e.g. the script you are running.
> I open the yade simulation, show 3D
how it behaves without any GUI?
> The speed ... is slower ... it is very very obviously.
please be more specific what "very obviously" means. 10% slower ? 3x slower? 1000x slower?
> On the server ... -j96
> On the PC ... -j8
what are the results if you use e.g. -j 4 (or the same -j) for both systems?
In general, it is not very good idea to compare -j 8 and -j 96, it is then very different situation, very much depending on the simulation itself..
cheers
Jan
Revision history for this message
|
#2 |
Hi,
i have also used remote desktop PC to run yade. You should be very clear on which one you think that it is slow: (1). YADE simulation (running in the command window, i.e. the terminal) or (2). 3D visualization in YADE GUI ? I guess it is (2) since (2) depends on the connection between you and the remote controlled computer, not the hardware of your PC or the remote controlled computer.
Hope that help,
Revision history for this message
|
#3 |
Hi,
What you observe is actually expected:
XEON 4, 3.2GH
I7 3.6GH
The I7 is faster. HPC rarely outperform personal computers in per-core performance.
Arguably, you use "-jSomething", but it makes no sense to discuss -jAnything on the basis of just an obscure script name "20200522.py".
You did not mention how -j2 compares to -j1 on any of the procs, so that would be just blind guessing to try and answer this part.
Compare -j1 on both first.
Bruno
Revision history for this message
|
#4 |
If you have a recent yade version you can do yade --stdperformance -jN benchmark, with same N on both systems. If the speed is not stable, because some different programs are running it will take a long time, so you can use --quickperformance instead.
Revision history for this message
|
#5 |
1. Jan Stránský (honzik)
Than you very much for you quickly response. I explain as the follows
(1) The script
The script must import torch. The code is use to simulated particles falling down with the gravity.
#!/usr/bin/env python
# encoding: utf-8
## this programmig is DQN with HMRF-2-NH.
from __future__ import print_function
# import sys
# sys.path.
# from yadeImport import *
import random
import math
import csv
import torch
import torch.nn as nn
import numpy as np
import os
import torch.nn.functional as F
from torch.distributions import Categorical
from yade.gridpfacet import *
pi = math.pi
Testing = False
AddCoalRock = False
loadPoint = '00088'
o = Omega()
o.dt = 1e-12
totalLoss = 0
LastState = []
LastAction = np.zeros((5, 1), dtype=int)
saveLoss = []
outputDir = 'Output'
checkCounter = 0
numWinds = 5
WinAction = np.zeros((5,1), dtype=int)
RewardCoal = 1
RewardRock = -3
nParameters = 2
mu_in = np.zeros((5, nParameters))
sigma_in = np.ones((5, nParameters))
MAP_iter_number = 5
EM_iter = 5
## =========== Environment ============##
widthSpace = 6.8 ## width is the distance from the front coal wall to the back
widthHydr = 1.5
lengthSpace = widthHydr * numWinds ## length is the width of 5 windows
highCoal = 2
highRock = 2
if AddCoalRock:
highCoalRock1 = 1
highCoalRock2 = 1
highCoalRock3 = 1
else:
highCoalRock1 = 0
highCoalRock2 = 0
highCoalRock3 = 0
highHydr = 3.8
highUnderground = 10
highBottom = 0.5
highSpace = highUnderground + highHydr + highCoal + highCoalRock1 + highCoalRock2 + highCoalRock3 + highRock
radiusCoal = 0.15
radiusRock = 0.15
CheckThick = 1
highDummy = 3
colorCoal = (0, 1, 0)
colorRock = (1, 0, 0)
colorState = (0, 0, 1)
colorReceive = (238/255, 233/255, 191/255)
colorShield = (205/255, 150/255, 205/255)
colorWind = (0, 1, 1)
colorGround = (54/255,54/255, 54/255)
angleShield = 50 * 3.1415926 / 180
angleSwingPositive = 15 * 3.1415926 / 180
angleSwingNegtive = 40 * 3.1415926 / 180
lengthShield = 3
lengthTail = 2
windUpperBoundary = 0.9
windLowerBoundary = 0.5019
stateUpperBoundary = highHydr
stateLowerBoundary = windLowerBoundary
##—————
positionShield = []
positionWind = []
positionTopBeam = []
positionDummy = []
windPosition = np.zeros(5)
windPositionPos
windPositionNeg
shield_y_0 = lengthTail * math.cos(
shield_y_1 = shield_y_0 + lengthShield * math.cos(
shield_z_0 = highHydr - lengthShield * math.sin(
shield_z_1 = highHydr #
wind_y_0 = lengthTail * (math.cos(
wind_y_1 = shield_y_0
wind_z_0 = highHydr - (lengthShield + lengthTail) * math.sin(
wind_z_1 = highHydr - lengthShield * math.sin(
topBeam_y_0 = lengthTail * math.cos(
topBean_y_1 = widthSpace
# matRock = O.materials.
# tensileStrength
HyMat = O.materials.
#
matRock = O.materials.
tensileSt
# matCoal = O.materials.
# tensileStrength
# matRock = O.materials.
matCoal = O.materials.
myGraviaty = -1200
nIterControl = 1000
nCheckEnd = 3000
nIterReload =10000
VelocityWindPos
VelocityWindNeg
percentCoalStop
# matGround = O.materials.
# shearCohesion=
def Boundary():
boundary = O.bodies.
def Ground():
O.bodies.
def Dummy():
for i in range (1, numWinds):
temp=[
Dummy1 = pack.sweptPolyl
Dummy2 = pack.sweptPolyl
Dummy3 = pack.sweptPolyl
Dummy4 = pack.sweptPolyl
O.bodies.
O.bodies.
O.bodies.
O.bodies.
##-----
def HydraulicSupport():
for i in range(0, numWinds):
temp = [Vector3(widthHydr * i, shield_y_0, shield_z_0),
]
temp = [
]
temp = [
]
# kwBoxes=
# vibrationRotati
Shield1 = pack.sweptPolyl
Shield2 = pack.sweptPolyl
Shield3 = pack.sweptPolyl
Shield4 = pack.sweptPolyl
Shield5 = pack.sweptPolyl
IDShield1 = O.bodies.
IDShield2 = O.bodies.
IDShield3 = O.bodies.
IDShield4 = O.bodies.
IDShield5 = O.bodies.
Wind1 = pack.sweptPolyl
Wind2 = pack.sweptPolyl
Wind3 = pack.sweptPolyl
Wind4 = pack.sweptPolyl
Wind5 = pack.sweptPolyl
IDWind1 = O.bodies.
IDWind2 = O.bodies.
IDWind3 = O.bodies.
IDWind4 = O.bodies.
IDWind5 = O.bodies.
TopBeam1 = pack.sweptPolyl
TopBeam2 = pack.sweptPolyl
TopBeam3 = pack.sweptPolyl
TopBeam4 = pack.sweptPolyl
TopBeam5 = pack.sweptPolyl
IDTopBeam1 = O.bodies.
IDTopBeam2 = O.bodies.
IDTopBeam3 = O.bodies.
IDTopBeam4 = O.bodies.
IDTopBeam5 = O.bodies.
IDWind = [IDWind1, IDWind2, IDWind3, IDWind4, IDWind5]
return IDWind
##-----
def CoalLayer():
## establish coal layer
IDCoal = O.bodies.append(
return IDCoal
##-----
def CoalRockLayer():
## estatblish the coal_rock layer_1
spheresCoal
CoalList1 = spheresCoalRock1
RockList1 = random.
for mm in RockList1:
## estatblish the coal_rock layer_2
spheresCoal
CoalList2 = spheresCoalRock2
RockList2 = random.
for mm in RockList2:
## estatblish the coal_rock layer_3
spheresCoal
CoalList3 = spheresCoalRock3
RockList3 = random.
for mm in RockList3:
List_CR_Coal = CoalList1 + CoalList2 + CoalList3
List_CR_Rock = RockList1 + RockList2 + RockList3
return List_CR_Coal, List_CR_Rock
##-----
def RockLayer():
## estatblish the rock layer
IDRock = O.bodies.append(
return IDRock
def funHMRF(px, py_x, py_y, nAgent, nAction,
sum_U_MAP = []
U = 0
d = 0.125
Out = np.zeros((nAgent))
for it in range(0, MAP_iter_number):
U1 = np.zeros((nAgent, nAction))
for i in range(0, nAction):
yi_x = py_x - mu[i, 0]
tempx = np.multiply(yi_x, yi_x) / (2*np.square(
tempx = tempx + np.log(sigma[i, 0])
yi_1 = py_y - mu[i, 1]
tempy = np.multiply(yi_1, yi_1) / (2*np.square(
tempy = tempy + np.log(sigma[i, 1])
temp = tempx + tempy
U1[:, i] = U1[:, i] + temp
U2 = -1*np.log(px)
U = U1 + U2
Out = np.argmin(U, axis=1)
tempSumU = np.min(U, axis=1)
if it >= 2 and np.std(
break
sum_U = 0
for i in range(0, nAgent):
sum_U = sum_U + U[i, Out[i]]
return Out, sum_U
#
def funRL_HMRF_
# state is the neighbor coal ratio
nAgent = int(len(state))
x_x = state[:, 0].reshape(5)
x_y = state[:, 1].reshape(5)
x_left = np.array(x_x)
x_right= np.array(x_y)
P_lyi = np.zeros((nAction, nAgent))
sum_U = []
outAction = actionProbability
for it in range(0, EM_iter):
outAction, temp_sum_U = funHMRF(
for i in range(0, nAction):
temp1 = np.exp(
temp2 = actionProbabili
temp3 = np.sum(P_lyi, axis=0)
P_lyi = np.divide(P_lyi, temp3)
for i in range(0, nAction):
if np.sum(P_lyi[i,:]) == 0:
mu[i, 0] = np.dot(P_lyi[i,:], x_left)
mu[i, 0] = mu[i, 0] / np.sum(P_lyi[i,:])
mu[i, 1] = np.dot(P_lyi[i, :], x_right)
mu[i, 1] = mu[i, 1] / np.sum(P_lyi[i, :])
for i in range(0,nAction):
if np.sum(P_lyi[i,:]) == 0:
chechArray = np.array(sum_U)
if it >= 2 and np.std(
break
return outAction, mu, sigma, sum_U
##-----
def GetRewardState():
global nCoal, windPosition
global nRock, percentCoalStop
checkArea_y_0 = 0
checkArea_y_1 = shield_y_0
checkArea_z_1 = shield_z_0
checkArea_z_0 = shield_z_0 - lengthTail + radiusCoal
State = np.zeros((5, 4), dtype=int)
Reward = np.zeros((5, 3), dtype=int)
##----------get reward-
temp_
for i in nCoal:
temp = O.bodies[
if temp[2] <= checkArea_z_0: ## the particle high lower than the boundary, thought the window
for i in temp_coal_remove:
temp_
for i in nRock:
temp = O.bodies[
if temp[2] <= checkArea_z_0: ## the particle high lower than the boundary, thought the window
for i in temp_rock_remove:
for i in range(0, numWinds):
Reward[i, 2] = Reward[i, 0] * RewardCoal + Reward[i, 1] * RewardRock
##--------get state--
## area for checking the state with wind closed
# state[:,0]---nCoal
# state[:,1]---nRock
# state[:,2]---nCoal + nRock
# state[:
for b in nCoal:
temp = O.bodies[
if (checkArea_y_0 < temp[1] < checkArea_
for i in range(numWinds):
if i * widthHydr < temp[0] < (i + 1) * widthHydr:
for b in nRock:
temp = O.bodies[
if (checkArea_y_0 < temp[1] < checkArea_y_1) & (checkArea_z_0 <= temp[2] <= checkArea_z_1):
for i in range(numWinds):
if i * widthHydr < temp[0] < (i + 1) * widthHydr:
for i in range(numWinds):
State[i, 2] = State[i, 0] + State[i, 1] # total number
if State[i, 2] == 0:
else:
Done = True
for i in range(numWinds):
if (State[i, 3] <= percentCoalStop
Done = (Done & True)
else:
Done = False
return Reward, State, Done
## define the agent
class Net(nn.Module):
"""docstring for Net"""
def __init__(self, numState, numAction):
super(Net, self).__init__()
self.fc1 = nn.Linear(numState, 56)
self.bn1 = nn.BatchNorm1d(56)
self.fc2 = nn.Linear(56, 128)
self.bn2 = nn.BatchNorm1d(128)
self.out = nn.Linear(128, numAction)
self.active = torch.nn.
def forward(self, x):
x = self.fc1(x)
x = self.active(x)
x = self.fc2(x)
x = self.active(x)
x = self.out(x)
return x
class AgentDQN():
"""docstring for DQN"""
def __init__(self):
self.lr = 0.001
self.gamma = 0.999
self.memory = np.zeros(
def choose_action(self, state, Testing):
if Testing:
state = torch.tensor(state, dtype=torch.float)
state = state.reshape((1, 6))
if torch.rand(1) <= self.epsilon: # greedy policy for random
action = np.random.
action = action
else: # policy
action = torch.max(
action = action[0]
return action, action_probs, actionTpye
def learn(self):
if self.memory_counter >= self.start_learnig:
if self.learn_
# sample batch from memory
if self.memory_full:
else:
q_eval = self.eval_
q_next = self.target_
loss = self.loss_
if self.epsilon > self.epsilon_min:
else:
loss = 0
return loss, startLearning, self.epsilon
def store_transitio
if (state[0] != 0) & (state[1] != 0) & (state[2] != 0) & (state[3] != 0) & (state[4] != 0) & (state[5] != 0):
index = self.memory_counter % self.memory_
if self.memory_counter >= self.memory_
def saveExperience(
def saveHMRFparamet
def loadExperience(
if os.path.
def saveWeight(self, path):
dict = {'eval_net_dict': self.eval_
}
def loadWeight(self, path):
checkPoint = torch.load(path)
###----
def CheckEpisodeEnd():
global checkCounter, saveLoss, totalLoss
for i in range(0, numWinds):
path = outputDir + "/weights/
if not Testing:
with open(outputDir + '/processResult
writer = csv.writer(csvFile)
checkCounter += 1
def ResteLocation():
global nCoal, nRock, nEpisode, saveCounter
for i in range(0, len(savePositio
n = savePositionCoa
for i in range(0, len(savePositon
n = savePositonRock
nEpisode += 1
nCoal = IDCoal + List_CR_Coal
nRock = IDRock + List_CR_Rock
for i in nCoal:
for i in nRock:
##-----
def WindowsAction(
global WinAction, windPosition
RotationW = [RotationW1, RotationW2, RotationW3, RotationW4, RotationW5]
for nW in range(0, numWinds):
## action
Pos_z = sum(O.bodies[
if WinAction[nW] == 0:
else:
NegtiveStop = (RotationW[
PostiveStop = (RotationW[
if NegtiveStop: windPosition[nW] = windPositionNeg
elif PostiveStop: windPosition[nW] = windPositionPos
else: windPosition[nW] = 0
if NegtiveStop | PostiveStop:
def JustForLearning():
loss = np.zeros(6, dtype=float)
for i in range(0, numWinds):
loss[i], epslong, startLearning = windAgents[
if (o.iter % 20 == 0):
global checkCounter, saveLoss, totalLoss
for i in range(0, numWinds):
path = outputDir + "/weights/
if not Testing:
##-----
def WindowsInitialL
global WinAction
WinAction = np.zeros((5, 1), dtype=int)
WindowsActi
if os.path.
for i in range(0, numWinds):
path = outputDir + "/weights/
for i in range(0, numWinds):
##-----
def WindowsControl():
global saveCounter, LastState, LastAction, saveLoss, TotalReward, listAction
global WinAction, nRock, nCoal
global mu_in, sigma_in, MAP_iter_number, EM_iter
Reward, state, done = GetRewardState()
TotalReward = TotalReward + Reward
LastReward = Reward[:, 2]
CurrentAction = np.zeros((5, 1))
Action_probs = np.ones((5, 2))*0.5
# CurrentState = np.concatenate(
## area for checking the state with wind closed
# state[:,0]---nCoal
# state[:,1]---nRock
# state[:,2]---nCoal + nRock
# state[:
# CurrentState include the particle number and the coal ratio of the current agent and neigborhood.
CurrentState = np.zeros((5, 6))
for i in range(0, numWinds):
if i == 0:
if i == numWinds - 1:
if 0 < i < numWinds - 1:
actType = np.zeros(5, dtype=int)
for i in range(0, numWinds):
if Testing:
s1 = torch.from_
s2 = torch.from_
state_HMRF = torch.cat((s1, s2), 1)
for i in range(0, numWinds):
if done :
loss = np.zeros(6, dtype=float)
epslong = 0
if (saveCounter >0) & (not Testing):
for i in range(0, numWinds):
if startLearning == 1:
totalCoal = 0
totalRock = 0
CoalRate = 0
RockRate = 0
totalRe = 0
for i in range(0, numWinds):
totalCoal += TotalReward[i, 0]
totalRock += TotalReward[i, 1]
if (totalRock + totalCoal > 0):
totalRe += TotalReward[i, 2]
AT = ['Max', 'Max', 'Max', 'Max', 'Max']
for i in range(0, numWinds):
AT[i] = 'Policy' if actType[i] == 0 else "Ran"
print(
print(
print(
print(
WinAction = CurrentAction
LastState = CurrentState
LastAction = CurrentAction
listAction.
np.
saveCounter += 1
##—————
windAgents = [AgentDQN() for i in range(numWinds)]
for i in range(0, numWinds):
if not os.path.
if not os.path.
if not os.path.
##-----
saveCounter = 0
nEpisode = 0
Ground()
Boundary()
# Dummy()
IDWind = HydraulicSupport()
IDCoal = CoalLayer()
TotalReward = np.zeros((5,3))
List_CR_Coal = []
List_CR_Rock = []
if AddCoalRock:
List_CR_Coal, List_CR_Rock = CoalRockLayer()
IDRock = RockLayer()
nCoal = IDCoal + List_CR_Coal
nRock = IDRock + List_CR_Rock
savePositionCoal = []
savePositonRock = []
listAction = []
for i in nCoal:
temp = [i, o.bodies[
savePositio
for i in nRock:
temp = [i, o.bodies[
savePositon
##—————
O.engines = [
ForceResett
InsertionSo
Interaction
),
GlobalStiff
# VTKRecorder(
NewtonInteg
RotationEng
RotationEng
RotationEng
RotationEng
RotationEng
PyRunner(
# PyRunner(
PyRunner(
##
PyRunner(
PyRunner(
PyRunner(
]
# o.run(120000000
(2) I have not checked that without GUI. Becuase if there is no GUI the process speed of the simulation maybe diffrent. I will check that tommorrow
(3) "very obviously" means the speed of server is show more than 10 times.
(4) If the PC and server use the same job, such as -j8, even -j2. the speed is the same. That likes the just a core of server working.
2 Son Pham Thai (pham-thai-son-987)
Thank you very much. I load the GUI in the command window. The remote server and controll PC in a local area network. I will check the speed of the GUI on the server directly. But I worry the speed maybe is the same.
3. Bruno Chareyre (bruno-chareyre)
Thank you very much. You are right, each core of PC is faster then the server. But I need a faster speed for the simulation by parallel computation. So I use 96 core. If the performance is just depend on a core, that is a very terrible message to me.
4. Janek Kozicki (cosurgi) :
Thank you very much. But I am sorry that I don't know your mean, can you tell me where could I the example or the material?
Revision history for this message
|
#6 |
run this command:
yade --stdperformance -j8
or this command:
yade --quickperformance -j8
Revision history for this message
|
#7 |
> But I need a faster speed for the simulation by parallel computation. So I use 96 core.
Would another usage of the power be meaningful, e.g. running in parallel 12 simulation, each using -j8?
> So I use 96 core.
Are you sure there are no other processes running on the computer? How e.g. -j 48 behaves?
> If the performance is just depend on a core, that is a very terrible message to me.
I did not understand this, but the speed-up depends on the combination of the number of cores used AND the script. There are many factors..
cheers
Jan
Revision history for this message
|
#8 |
Sorry, that was written in a hurry. I mean that you can use these two commands to measure the computer performance using a standardized test.
Revision history for this message
|
#9 |
> So I use 96 core.
Maybe it is a bad idea. Or maybe it is ok but it doesn't give much more compared to, say, 48.
OpenMP may have an optimum somewhere and maybe it is 24 cores, or 48 (could well be depending on hardware), or whatever it is. You'll have to find out.
I didn't hear about previous tests on 96, that would be interesting to know your results.
Cheers
Bruno
Revision history for this message
|
#10 |
Ah: and it is absolutely critical that you keep GUI turned *off*.
Else the comparisons are meaningless, you could be testing the GPU. You don't use GUI for long simulations anyway.
B
Revision history for this message
|
#11 |
> If the performance is just depend on a core, that is a very terrible message to me
Not really. The message was: 1/ of course you don't automatically gain speedup by using more cores, 2/ comparing just a pair of (very different) N gives no real insight on where the optimum is (and there is an optimum, always). So the only solid data at the moment is per-core speed, if it is shocking to hear it could be a way to put the problem a bit differently. :)
B.
Revision history for this message
|
#12 |
Janek Kozicki (cosurgi) , Jan Stránský (honzik) , Bruno Chareyre (bruno-chareyre)
Thank you very much for answer my question. According to your suggestion, I checked the server like this
(1) Run yade by the command:
~
(2) Start the script by the 'start' button on the Controller(), without "3D" show
(3) I did 10 iteration test and the time result as follows
PC: -j8 39.45s \ 47.79s
Server: -j32 26.1s
-j90 23.27s \ 23.72s
Because I get the time by hand, so there is errors. The results seems that the server is faster than PC without 3D show and the -j92 or -j90 maybe the optimal jobs.
(4) I used the command : yade --stdperformance -j8 to test the performance
But I just to guess the result, I wirte as below
-j8 1465.98 iter/s
-j48 1386.40 iter/s
-j88 1427 iter/s
-j90 1279 iter/s 1467 iter/s
-j92 1385 iter/s
-j96 1313 iter/s
I find the number is unstable. It seems that -j90 is the best.
(5) For the above test, I guess that:
The before test, I take 3D show, the server is slower than PC very slowly, that is becuase the GUI is depend on just one core of the CPU. For one core of I7-9700k 3.6GH is better than 6146 , 3.2GH. IF my guess is right or not?
(6) I have two questions:
1) It looks that the falling speed of particles and the roation speed of the plank in the 3D show are different, if the I modify the number of the particles. So in the Server, I hope to check if the rotation speed matches the falling speed of particles or not. The direct way is the 3D show. However, if 3D show is used, the speed is very very slow. my question is :
if I close the 3D show, the speed of the plank and particles are the same with 3D show open or not?
2) If the speed of plank and particles are not depend on the 3D show. I will storage the bodies' location of the simulation process, and replay the locations after the simulation is over. my question : is there a better way to do that?
Thank you very much
Revision history for this message
|
#13 |
> -j32 26.1s
> -j48 24.51s
> -j90 23.27s \ 23.72s
> -j92 23.05s \ 23.76s \ 24.24s \ 24.09s
> the -j92 or -j90 maybe the optimal jobs
pretty much depending on the definition of "optimal". Compared to -j32, -j90 is just 10% faster, but using almost 3x more CPU power.
> -j48 1386.40 iter/s
> -j90 1279 iter/s
> It seems that -j90 is the best.
same as above
> It looks that ... in the 3D show are different,
> I hope to check if the rotation speed matches the falling speed of particles or not. The direct way is the 3D show.
3D view is just for rough checks. Use "hard numbers" for serious comparisons (like O.bodies[
> if I close the 3D show, the speed of the plank and particles are the same with 3D show open or not?
yes (if you mean the simulation-world speed)
> If the speed of plank and particles are not depend on the 3D show.
simulation state definitely is not influenced by the 3D view (if it was, it would be a bug).
> I will storage the bodies' location of the simulation process, and replay the locations after the simulation is over.
I would say this is a standard approach
> is there a better way to do that?
to do what? (storage the bodies' location, replay after the simulation, after the simulation is over, ...)
cheers
Jan
Revision history for this message
|
#14 |
> Start the script by the 'start' button on the Controller(), without "3D" show [...] did 10 iteration test [...] Because I get the time by hand, so there is errors
How do you get time "by hand" for 10 iterations after clicking "start"?
You manage to click "stop" right in time?
Mmmmh... sorry but, why do you refuse to measure -j1? Is it against some sort of religion?
I have seen this "-jNmax must be better" approach before, with people doing it always - even at the price of slowing down their daily workstation. It is in fact a very bad practice. It could as well be slower! If you don't try you'll never know and you will waste 96 cores 24/24 for no good.
It is obvious in your numbers than you gain *nothing* in this range of -j. We will not discuss 5%, especially considering time measurements by hand.
If I may ask, are you the author of the script you posted? Because I think that's what needs inspection first. Is torch module parallel?
Also there are many pyRunner's inside, even one with iterPeriod=1. How much do they cost, and do they exploit OpenMP?
Bruno
Revision history for this message
|
#15 |
To Jan Stránský (honzik) and Bruno Chareyre (bruno-chareyre) :
Thank you very much for you professional response.
My the simulation process likes this:
The two kinds of particles falling down and the plank will open or close according to the rate of one kind of particle. My job is to design the control algorithm of the plank.
Because if the martial parameter of particles different, the falling speed of particles and action speed of plank is different. So I must to check if the two speeds are matching or not. And, because I just want to verify the control algorithm, so I hope the falling process could be as fast as possible since training the control algorithm needs a great number of episodes, maybe need two days or three days. So we buy the server. And the server is just for this simulation now. That is why I hope to use the maximum core of this server. I understand you suggestion to get the optimal price of cores. But now for me, to get the fastest speed of the simulation is the aim.
I am the author of the posted script. Actually, I just know the running model of pytorch and just to call the neural network package, I do not deeply to learning what the script structure is the best for matching YADE. Bruno Chareyre is right, the pyrunner decide the runing speed. But my question is I use the same OS, the same script, and same configuration in both of PC and server, if I get 3D show, the speed in PC is faster than in the server. However, if I closed the 3D show, the calculation speed of server is faster than PC.
Jan Stránský told me the simulation-world speed is the same regardless of the 3D show open or close and the simulation state definitely is not influenced by the 3D view. So it makes me very confusion.
without 3D show the calculation speed of server is faster than PC, that mean the iteration number is more than PC. so the 3D view checks the locations roughly, we could find the falling speed is faster than PC. But actually, I find the speed in the server is slower than in PC.
So, I will try like this way, to storage the location and check what will happen.
Thank you very much
Revision history for this message
|
#16 |
> I understand you suggestion to get the optimal price of cores. But now for me, to get the fastest speed of the simulation is the aim.
Yes. That's exactly my point. It could be that the fastest is -j1, or that it is just equal.
Here is some bad news: if you write an ordinary python program and you assign more than one openMP thread to it, the change of execution time will be null. You can try on your server by running a minimal program, just try this in the command line:
:~$ export OMP_NUM_THREADS=1
:~$ time python3 -c "for k in range(int(1e7)): a=k**0.124"
real 0m2.240s
user 0m2.190s
sys 0m0.009s
Then:
:~$ export OMP_NUM_THREADS=96
:~$ time python3 -c "for k in range(int(1e7)): a=k**0.124"
real 0m2.190s
user 0m2.181s
sys 0m0.008s
That's how programming works in many cases. The variable OMP_NUM_THREADS is even not used when python runs. It is the same for your PyRunners. The fact that yade is parallel doesn't make *your* program parallel.
Here is the good news:
If you have to train your algorithm on many realizations then you can train 96 cases simultaneously (assuming RAM is enough). Which will be *much* faster.
You can, either, hire colleagues to have more hands clicking the open/close for 96 planks (will need many keyboards/mices with long cables for social distancing), or (safer) do it with hard numbers as Jan suggested.
If really you have no choice but running one single script at a time, then you need to re-think what your simulation script is doing - with a good idea of where/how time is spent.
B.
Revision history for this message
|
#17 |
To Bruno Chareyre (bruno-chareyre):
Thank you very much. I test -j1 and -j8 on my PC, yes. the speed is the same. And I test the command ~$ export OMP_NUM_THREADS=1 and the result is the same with you.
I explain my understanding about your mean: the parallel programming, means there are 98 programming copies, and each copy for a cores. If my understanding right or not?
If my understanding is right, I have the following question. I know that my programming cannot parallelly operated. I just want the calculation of particles in a programming is parallel, such as calculation the pressure between the particles. Because there are huge number of particles, so the calculation is huge. If I increase the core, this kind of calculation can be improved?
Thank you for you second suggestion. Actually, the algorithm can not be trained parallelly. The aim of the training is to make the parameters to close the optimal value. The next episode must based on the current training result.
Thank you very much.
Revision history for this message
|
#18 |
> the parallel programming, means there are 98 programming copies, and each copy for a cores. If my understanding right or not?
More or less.[1]
> If I increase the core, this kind of calculation can be improved?
In general yes.
> The next episode must based on the current training result.
Then you will have to improve your program. It is AFAIK the only way to gain time, based on your description.
Bruno
[1] https:/
Revision history for this message
|
#19 |
Bruno Chareyre (bruno-chareyre)
Thank you very much. It is not a good news for me. OK. I will try. Thank youo