Madgraph gridpack internal.common_run_interface.AlreadyRunning

Asked by Stephen Roche on 2020-06-29

Hi,

I've been following the directions here https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment#no1 to get started making some events with Condor.

Following the directions, it works perfectly, and writes the events exactly as I expect. However, if I try to execute ./run.sh a second time, I get the following error:

Now generating 10000 events with random seed 13 and granularity 1
Traceback (most recent call last):
  File "./madevent/bin/gridrun", line 86, in <module>
    cmd_line = cmd_interface.GridPackCmd(me_dir=root_path, nb_event=args[0], seed=args[1], gran=args[2])
  File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 6404, in __init__
    MadEventCmd.__init__(self, me_dir, *completekey, **stdin)
  File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 2107, in __init__
    CmdExtended.__init__(self, me_dir, options, *completekey, **stdin)
  File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 203, in __init__
    super(CmdExtended, self).__init__(me_dir, options, *arg, **opt)
  File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/common_run_interface.py", line 657, in __init__
    raise AlreadyRunning, message
internal.common_run_interface.AlreadyRunning: Another instance of the program is currently running.
                (for this exact same directory) Please wait that this is instance is
                closed. If no instance is running, you can delete the file
                /export/home/stroche/pheno-snowmass/ttbar/madevent/RunWeb and try again.
mv: cannot stat `./Events/GridRun_13/unweighted_events.lhe.gz': No such file or directory
write ./events.lhe.gz

Running one at a time, I can just follow the directions and delete madevent/RunWeb, and run.sh works again with no issues.
However, this doesn't work when I'm running 100 jobs in parallel in condor; it just throws a bunch of the error messages above.
Is there any way to fix this?

-Steve

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
2020-06-29
Last query:
2020-06-29
Last reply:
2020-06-29

Hi Stephen,

This file is suppose to be removed at the end of the run (and it is actually the case on the two machine (one mac and one linux) on which I have just tested.
Do you have any warning during the previous run that would explain why that file is not removed?

Finding why such file is not removed might take a while.
Now that read-only gridpack have been succesfully be validated in CMS (with 2.7.3) release.
I will update this link: https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment#no1 <https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment#no1>
with instructions for that mode.

That mode allow to have the gridpack directory in Read-Only and run the gridpack from an empty directory. In that mode you execute as many instance of the gridpack as you want simultaneously since they all write in their own direcoties.

Would that be a reasonable work around for you?

Cheers,

Olivier

> On 29 Jun 2020, at 18:20, Stephen Roche <email address hidden> wrote:
>
> New question #691598 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/691598
>
> Hi,
>
> I've been following the directions here https://cp3.irmp.ucl.ac.be/projects/madgraph/wiki/GridDevelopment#no1 to get started making some events with Condor.
>
> Following the directions, it works perfectly, and writes the events exactly as I expect. However, if I try to run ./run.sh a second time, I get the following error:
>
>
> Now generating 10000 events with random seed 13 and granularity 1
> Traceback (most recent call last):
> File "./madevent/bin/gridrun", line 86, in <module>
> cmd_line = cmd_interface.GridPackCmd(me_dir=root_path, nb_event=args[0], seed=args[1], gran=args[2])
> File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 6404, in __init__
> MadEventCmd.__init__(self, me_dir, *completekey, **stdin)
> File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 2107, in __init__
> CmdExtended.__init__(self, me_dir, options, *completekey, **stdin)
> File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/madevent_interface.py", line 203, in __init__
> super(CmdExtended, self).__init__(me_dir, options, *arg, **opt)
> File "/export/home/stroche/pheno-snowmass/ttbar/madevent/bin/internal/common_run_interface.py", line 657, in __init__
> raise AlreadyRunning, message
> internal.common_run_interface.AlreadyRunning: Another instance of the program is currently running.
> (for this exact same directory) Please wait that this is instance is
> closed. If no instance is running, you can delete the file
> /export/home/stroche/pheno-snowmass/ttbar/madevent/RunWeb and try again.
> mv: cannot stat `./Events/GridRun_13/unweighted_events.lhe.gz': No such file or directory
> write ./events.lhe.gz
>
>
> Running one at a time, I can just follow the directions and delete madevent/RunWeb, and run.sh works again with no issues.
> However, this doesn't work when I'm running 100 jobs in parallel in condor; it just throws a bunch of the error messages above.
> Is there any way to fix this?
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Stephen Roche (str55) said : #2

Hi Olivier,

That should presumably be fine. Thank you for the prompt reply!

-Steve

Stephen Roche (str55) said : #3

Thanks Olivier Mattelaer, that solved my question.

The documentation is now updated.

Cheers,

Olivier