Resuming long interrupted calculations

Asked by Olivier Gillia

Hello everybody,

I am running some long lasting simulations, and quite often the calculation is interrupted for some reasons (maintenance operation on the computer, electric shutdown,…). I would like to be able to resume the calculation.

For this, the O.save() function sounds good. I regularly save the configuration to disc (overwriting the previous one), so I can use it to recover the Omega and restart the simulation. But there are two problems :
1- starting in a new .py file I will miss the other function I have defined in the main .py file.
2- Some variables are not included in the O.save file (for example variables that serves to know at what point is the loading, which is a cycling load).

For the problem 1- I can program something in the same main launching file .py file, with a “resume” mode (dealt with some “if iResume: … else: …”)

For the point 2-, I have tried to integrate my useful variables (t_phase, t_deb_phase, t_deb_cycle, n_phase, n_cycle, n_cycle_todo,…) in the omega (O.t_phase, O.t_deb_phase…) but the variables are not saved in the file on the disc (seen in the xml format).

Does someone has an idea how I can proceed to recover my cycling variables before relaunching the calculation ?

Thank you

Olivier Gillia

Question information

Language:
English Edit question
Status:
Solved
For:
Yade Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Gillia
Solved:
Last query:
Last reply:
Revision history for this message
Olivier Gillia (otg1) said :
#1

>>> Solution proposed by Jan Stránský <<<

Hello Olivier,

next time, please ask questions via launchpad interface [1].

By default, O.save only saves attributes of C++ core. It does not save python-defined stuff (functions, variables), nor custom O.anythingNotCPlusPlus.
However, see documentation of utils.saveVars [2] / utils.loadVars [3]. They allow you to save custom data in the .xml file and use it after O.load,. Something like:

###
resume = False # change to Ture for the "resume" run

someVar = 1 # default value, overwritten below

def incrementSomeVar():
    global someVar
    someVar += 1

O.engines += [
    PyRunner(iterPeriod=1,command="incrementSomeVar()"),
    PyRunner(iterPeriod=10,command="save()"),
]

def save():
    saveVars("mySavedVars",someVar=someVar)
    O.save("test.xml")

if resume:
    O.load("test.xml")
    loadVars("mySavedVars")
else:
    saveVars("mySavedVars",someVar=someVar)
from yade.params.mySavedVars import *

print("TEST 1:",O.iter,someVar)
O.run(25,True) # simulating interruption
print("TEST 2:",O.iter,someVar)
###

Cheers
Jan

[1] https://launchpad.nt/yade, "Ask a question" button
[2] https://yade-dem.org/doc/yade.utils.html#yade.utils.saveVars
[3] https://yade-dem.org/doc/yade.utils.html#yade.utils.loadVars

Revision history for this message
Olivier Gillia (otg1) said :
#2

I found another solution by the use of pickle. I have defined de OmegaParallel space, in which I put all the variables I want to retrieve when resuming. I do :

if iResume:

    path_name1 = './RESULTATS/'

    test_case = table.test_case
    filename_save = path_name1 + test_case + '-sauv.bin'
    filename_savePara = path_name1 + test_case + '-sauvPara.bin'

    print "Retrieving saved configuration in file :",filename_save
    print " and in file :",filename_savePara

    O.load(filename_save)

    OParaFile = open(filename_savePara,'rb')
    OPara = pickle.load(OParaFile)
    OParaFile.close()

    print "Resuming calculation at cycle no :",OPara.n_cycle
    print " at phase no :",OPara.n_phase
    print " at time :",('%6.4f' % (O.time))
    print " at iter :",O.iter

else:
    … calculations takes place here …
          #====== Saving the configuration for later possible resuming
          O.save(OPara.filename_save)
          OParaFile = open(OPara.filename_savePara,'wb')
          pickle.dump(OPara, OParaFile)
          OParaFile.close()
          print "Saving at :"
          affIter()

The only thing that you need to do is to declare a omegaParallel class as an external module, and import it at the beginning :
import omegaParallel

while the omegaParallel.py file contains :
omegaParallel.py
class OmegaParallel:
    def __init__(self, s):
        self.s = s
    r1 = 0.0
    myVariables=…