IOError : [Errno 2] No such file or directory: '.../GX.X/results.dat'

Asked by Gauthier

Dear developers,

I'm performing subsequent runs (on CP3's cluster) for the same processes (decays) with different parameters and, once in a while (a few/10 runs), I get something like
---
Idle: 0 Running: 6 Finish: 106
INFO: All jobs finished
Command "calculate_decay_widths squarkLSP_200_750_2 --cluster" interrupted with error:
IOError : [Errno 2] No such file or directory: '/nfs/home/fynu/gdurieux/SSDL_8t
ev/GEN_RPV_SIMP_widths_squarkLSP/SubProcesses/P0_ul_udxsxtx/G5.8/results.dat'
---
I cannot check if the file is really present because, when I look at it, I think it is replaced by the one of the next run.

The debug.log file indicates this occurs in filling the html files:
---
  File "madgraph/interface/madevent_interface.py", line 2505, in do_survey
    cross, error = sum_html.make_all_html_results(self)
  File "madgraph/various/sum_html.py", line 384, in make_all_html_results
    P_comb.add_results(name, pjoin(P_path,name,'results.dat'), mfactor)
  File "madgraph5/madgraph/various/sum_html.py", line 124, in add_results
    oneresult.read_results(filepath)
  File "madgraph5/madgraph/various/misc.py", line 155, in deco_f_retry
    return f(*args, **opt)
  File "madgraph5/madgraph/various/sum_html.py", line 56, in read_results
    for line in open(filepath):
---

In GX.X folder, the <my-run>_log.txt and <my-run>_results.dat are absent.
In Events folder, no <my-run> directory is found.

I guess this is not really a MG issue (other runs seem fine) but since this occurs quite frequently, I was wondering if there could be a workaround.
I've sometimes seen (for the fist run on which the error occurs?), the additional lines:
---
Start waiting for update on filesystem. (more info in debug mode)
fail to do <function read_results at 0x14a8ce60> function with <madgraph.various.sum_html.OneResult object at 0x14d02990>, GX.X/results.dat args. 1 try on a max of 5 (20 waiting time)
---
in the error message. Should I just try to increase the hard-coded "nb_try" parameter of "multiple_try" near line 51 in sum_html.py? Or is there something more subtle that could be tried?

Thanks a lot,
Gauthier

Question information

Language:
English Edit question
Status:
Answered
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi gauthier,

How are you?

Yes you can play with this parameter and see if it help.
For information, you have two value that you can play with
the nb_try and the sleep

at each try the waiting is increasing by the value of sleep
so if nb_try=3 and sleep=10
the total waiting time will be 60s (10+20+30)

So be carefull to not put a too high value for nb_try.

Thanks a lot for your help,

Olivier

Revision history for this message
Gauthier (gauthier.d) said :
#2

Hi,
After altering the parameters I got time to check the presence of the
file and indeed, even after 5 min of cumulated sleep, "results.dat"
wasn't appearing in the directory.
So I guess the real issue occurs while writing this file, not while
reading it (and therefore, increasing 'nb_try' or 'sleep' where I did it
was useless).
Cheers,
Gauthier

On 22/01/13 19:35, Olivier Mattelaer wrote:
> Your question #219819 on MadGraph5 changed:
> https://answers.launchpad.net/madgraph5/+question/219819
>
> Status: Open => Answered
>
> Olivier Mattelaer proposed the following answer:
> Hi gauthier,
>
> How are you?
>
> Yes you can play with this parameter and see if it help.
> For information, you have two value that you can play with
> the nb_try and the sleep
>
> at each try the waiting is increasing by the value of sleep
> so if nb_try=3 and sleep=10
> the total waiting time will be 60s (10+20+30)
>
> So be carefull to not put a too high value for nb_try.
>
>
> Thanks a lot for your help,
>
> Olivier

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) said :
#3

Hi Gauthier,

Could you send me by email the process, param_card, run_card as well as model ( if needed).
Will run the same script on my laptop and see If i reproduce the problem locally.

By experience the CP3 cluster (especially the disk) can be time to time VERY VERY slow, so 5 min might be sometimes not enough

Cheers,

Olivier

Can you help with this problem?

Provide an answer of your own, or ask Gauthier for more information if necessary.

To post a message you must log in.