Scanning many parameter points for the same process: how to parallelize

Asked by Anna Woodard

Dear experts,

I'm trying to use madgraph to scan a large number of parameter values for the same process (I don't necessarily need a huge number of events for each point.) There are too many points for me to run them in serial (even if subprocesses are sent to condor.)

My main questions:
My understanding is that parameters can't be changed once you've produced a gridpack: is that correct? If so, is my only option for parallelization to change the parameter on the condor worker node and compile on the condor worker node, or is there some other way of factorizing the compilation step out beforehand?

What I've tried so far:
I'm currently packing up my madgraph directory and sending it to the worker, then sourcing my usual start-up scripts so that the same gfortran, etc are in the path (they are accessible from the workers), then trying to call generate_events in single-machine mode. If I condor_ssh_to_job and call generate_events, it works fine, but otherwise, it fails with the error below. Mysteriously, this used to work for a while, and don't know why it's failing now.

Thanks for your help,
Anna

***************************************
^[[1;31mError detected in "generate_events -f"
write debug file /var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/run_01_tag_1_debug.log
If you need help with this issue please contact us on https://answers.launchpad.net/madgraph5
MadGraph5Error : A compilation Error occurs when trying to compile /var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source.
    The compilation fails with the following output message:
        gfortran -O -w -fbounds-check -ffixed-line-length-132 -c -o setrun.o setrun.f
        cd PDF; make
        cd MODEL; make
        make[1]: Entering directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/PDF'
        make[1]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
        make[1]: Entering directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/MODEL'
        make[1]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
        ar cru ../../lib/libpdf.a Ctq4Fn.o Ctq5Par.o Ctq5Pdf.o Partonx5.o Ctq6Pdf.o cteq3.o mrs98.o mrs98lo.o mrs98ht.o mrs99.o mrst2001.o mrst2002.o jeppe02.o pdfwrap.o opendata.o pdf.o PhotonFlux.o pdg2pdf.o
        gfortran -O -w -fbounds-check -ffixed-line-length-132 -c -o rw_para.o rw_para.f
        ranlib ../../lib/libpdf.a
        ../param_card.inc:29.13:
            Included at param_read.inc:5:
            Included at rw_para.f:25:

              CLW = -
                     1
        Error: Syntax error in expression at (1)
        make[1]: Leaving directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/PDF'
        make[1]: *** [rw_para.o] Error 1
        make[1]: Leaving directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/MODEL'
        make: *** [../lib/libmodel.a] Error 2
        make: *** Waiting for unfinished jobs....

    Please try to fix this compilations issue and retry.

Question information

Language:
English Edit question
Status:
Solved
For:
MadGraph5_aMC@NLO Edit question
Assignee:
No assignee Edit question
Solved by:
Olivier Mattelaer
Solved:
Last query:
Last reply:
Revision history for this message
Best Olivier Mattelaer (olivier-mattelaer) said :
#1

Hi,

The error means that the file param_read.inc is not created correctly.

This file is created by the “treatcards” command (./bin/madevent treatcards) which is called automatically by the
generate_events code. This command reads the param_card and create that file to be compile with the code.
Looks like it fails to write correctly that file. It might be a filesystem problem.
So one solution might be to first run that command, then wait that the file is actually written on the filesystem then run the usual code.

Another solution might be to run on a local disk of the node and not on a share disk, this should minimize the disk access problem.

Cheers,

Olivier

On Jun 19, 2014, at 6:07 AM, Anna Woodard <email address hidden> wrote:

> New question #250433 on MadGraph5_aMC@NLO:
> https://answers.launchpad.net/mg5amcnlo/+question/250433
>
> Dear experts,
>
> I'm trying to use madgraph to scan a large number of parameter values for the same process (I don't necessarily need a huge number of events for each point.) There are too many points for me to run them in serial (even if subprocesses are sent to condor.)
>
> My main questions:
> My understanding is that parameters can't be changed once you've produced a gridpack: is that correct? If so, is my only option for parallelization to change the parameter on the condor worker node and compile on the condor worker node, or is there some other way of factorizing the compilation step out beforehand?
>
> What I've tried so far:
> I'm currently packing up my madgraph directory and sending it to the worker, then sourcing my usual start-up scripts so that the same gfortran, etc are in the path (they are accessible from the workers), then trying to call generate_events in single-machine mode. If I condor_ssh_to_job and call generate_events, it works fine, but otherwise, it fails with the error below. Mysteriously, this used to work for a while, and don't know why it's failing now.
>
> Thanks for your help,
> Anna
>
>
> ***************************************
> ^[[1;31mError detected in "generate_events -f"
> write debug file /var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/run_01_tag_1_debug.log
> If you need help with this issue please contact us on https://answers.launchpad.net/madgraph5
> MadGraph5Error : A compilation Error occurs when trying to compile /var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source.
> The compilation fails with the following output message:
> gfortran -O -w -fbounds-check -ffixed-line-length-132 -c -o setrun.o setrun.f
> cd PDF; make
> cd MODEL; make
> make[1]: Entering directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/PDF'
> make[1]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
> make[1]: Entering directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/MODEL'
> make[1]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
> ar cru ../../lib/libpdf.a Ctq4Fn.o Ctq5Par.o Ctq5Pdf.o Partonx5.o Ctq6Pdf.o cteq3.o mrs98.o mrs98lo.o mrs98ht.o mrs99.o mrst2001.o mrst2002.o jeppe02.o pdfwrap.o opendata.o pdf.o PhotonFlux.o pdg2pdf.o
> gfortran -O -w -fbounds-check -ffixed-line-length-132 -c -o rw_para.o rw_para.f
> ranlib ../../lib/libpdf.a
> ../param_card.inc:29.13:
> Included at param_read.inc:5:
> Included at rw_para.f:25:
>
> CLW = -
> 1
> Error: Syntax error in expression at (1)
> make[1]: Leaving directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/PDF'
> make[1]: *** [rw_para.o] Error 1
> make[1]: Leaving directory `/var/condor/execute/dir_26970/worker-174873-26978/t.9661/HEL_UFO_ttZ_v1/Source/MODEL'
> make: *** [../lib/libmodel.a] Error 2
> make: *** Waiting for unfinished jobs....
>
> Please try to fix this compilations issue and retry.
>
> --
> You received this question notification because you are an answer
> contact for MadGraph5_aMC@NLO.

Revision history for this message
Anna Woodard (awoodard) said :
#2

Thanks Olivier Mattelaer, that solved my question.

Revision history for this message
Anna Woodard (awoodard) said :
#3

(In the end, it was a stupid parsing mistake on my part-- but looking at param_card.inc helped me see that.)