MadGraph5_aMC@NLO

long run on cluster

Asked by lorenzo marafatto on 2023-01-16

hi all
I have a problem executing a long process
I use madgraph and the runnig of the simulation takes one day or more
I set the ServerAliveInterval 30 in .ssh\config
but it is not enough: I get sometimes a broken pipe and so I must restart from the beginning
how can I do?
many thanks

Question information

Language:: English Edit question

Status:: Solved

For:: MadGraph5_aMC@NLO Edit question

Assignee:: No assignee Edit question

Solved by:: Olivier Mattelaer

Solved:: 2023-01-16

Last query:: 2023-01-16

Last reply:: 2023-01-16

Link existing bug

Revision history for this message

Olivier Mattelaer (olivier-mattelaer) said on 2023-01-16:

Hi,

I typically use one of the following solutions:
1) use "screen" to avoid such issue
2) submit the main job on the cluster itself (such that I have one core dedicated for controlling the rest of the application)
(in that case you have to request a single core for your main executable and be sure that this is fine for your cluster allowed node to submit jobs (typically this is allowed)

Other solution:
1) use nohup method.
2) use tmux

Cheers,

Olivier

Revision history for this message

lorenzo marafatto (lmaraf) said on 2023-01-16:

Thanks Olivier Mattelaer, that solved my question.

Revision history for this message

lorenzo marafatto (lmaraf) said on 2023-01-17 (last edit on 2023-01-17):

Revision history for this message

lorenzo marafatto (lmaraf) said on 2023-01-17:

I tried all these ways but also the last process was stopped and screen has been disconnected and therefore I have no result...
any suggestion on how I could a long process via ssh?
Thanks so much!
Lorenzo

Il martedì 17 gennaio 2023 07:15:50 CET, lorenzo marafatto <email address hidden> ha scritto:

Your question #704421 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704421

lorenzo marafatto posted a new comment:
Hi
I tried all of these ways but the program still closes after some time
maybe I could run somehow in a batch mode?
many thanks again
Lorenzo Marafatto

Il lunedì 16 gennaio 2023 15:50:37 CET, Olivier Mattelaer <email address hidden> ha scritto:

Your question #704421 on MadGraph5_aMC@NLO changed:
https://answers.launchpad.net/mg5amcnlo/+question/704421

Status: Open => Answered

Olivier Mattelaer proposed the following answer:
Hi,

Other solution:
1) use nohup method.
2) use tmux

Cheers,

Olivier

--
If this answers your question, please go to the following page to let us
know that it is solved:
https://answers.launchpad.net/mg5amcnlo/+question/704421/+confirm?answer_id=0

If you still need help, you can reply to this email or go to the
following page to enter your feedback:
https://answers.launchpad.net/mg5amcnlo/+question/704421

You received this question notification because you asked the question.

To post a message you must log in.

Ask a question

Edit question

MadGraph5_aMC@NLO

long run on cluster

Question information

Related bugs

Related FAQ:

Subscribers