Running pydolfin on a cluster with no compilers on the backend nodes

Asked by Patrick Farrell on 2013-03-15

I am trying to run dolfin on a cluster with no compilers on the backend nodes. Does pydolfin have a clean way of dealing with this?

One answer to this problem would be to run a dry-run on a very small mesh on the login nodes, so that all expressions and forms are compiled. However, this has several flaws that are fatal (for me). For simulations on meshes that are generated externally, it can be a major headache to make a relevant coarse mesh. Even if you manage to make a coarse mesh, your problem might diverge. Even if you manage to coarsen the mesh, the system administrators get mad at you for running computational jobs on shared login nodes.

One possible remedy would be to have a parameter["dummy_run"], that when set neuters all solves (they would just return immediately on entry into the function). Assuming most of your time is spent in the solver, this would let the forward model whiz through the entire computation, JITing as it goes. This would involve changing the LUSolver and KrylovSolver classes, but not much else.

Are there any other cleaner/easier/currently available ways to handle this situation?

Question information

Language:
English Edit question
Status:
Answered
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Last query:
2013-03-15
Last reply:
2013-03-21
Johan Hake (johan-hake) said : #1

On 03/15/2013 04:56 PM, Patrick Farrell wrote:
> New question #224387 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/224387
>
> I am trying to run dolfin on a cluster with no compilers on the
> backend nodes. Does pydolfin have a clean way of dealing with this?
>
> One answer to this problem would be to run a dry-run on a very small
> mesh on the login nodes, so that all expressions and forms are
> compiled. However, this has several flaws that are fatal (for me).
> For simulations on meshes that are generated externally, it can be a
> major headache to make a relevant coarse mesh. Even if you manage to
> make a coarse mesh, your problem might diverge. Even if you manage to
> coarsen the mesh, the system administrators get mad at you for
> running computational jobs on shared login nodes.

I guess installing a compiler on the nodes aren't always possible? I
have not had issues with this on some semi large to larger clusters I
have encountered. Even though clusters I have used have had compilers on
the nodes, I always make dry runs of the first time step on the front
node, which so far has covered my needs.

> One possible remedy would be to have a parameter["dummy_run"], that
> when set neuters all solves (they would just return immediately on
> entry into the function). Assuming most of your time is spent in the
> solver, this would let the forward model whiz through the entire
> computation, JITing as it goes. This would involve changing the
> LUSolver and KrylovSolver classes, but not much else.

Would this work in practice? A lot of scripts/programs include
postprocessing of data which would not mean anything if no solve was
performed. If this is applied I guess one could also add something
similar to assemble.

> Are there any other cleaner/easier/currently available ways to handle
> this situation?

Not that I am aware of. This has been an issue as far I know.

Johan

Garth Wells (garth-wells) said : #2

I think this issue requires some careful thought (and resisting the temptation of a quick hack). A related case that needs to be considered is cross compilation, i.e. when the login node is different from the compute nodes.

Jan Blechta (blechta) said : #3

I assume that there is somewhere in DOLFIN setting of JIT compiler call. So it seems logical that you need to set it to
    ssh user@machine_with_compiler c++ ...
instead of
    c++ ...
Of course you would need to prevent ssh asking for password.

Andy R Terrel (andy-terrel) said : #4

Patrick,

Let me give a few of my thoughts on the matter:

1) Large jobs cache the code and only need one node to compile. As far as I can tell, the current state would have every node compile the program in pydolfin. This does not scale and will break our filesystem

2) Commercial compilers require license checkouts and TACC specifically forbids compiling on the backend for this reason. The backend can get to gcc but then you loose quite a bit of speed. When I get back to Texas we can explore building a gcc stack, but the system admins are avoiding it.

One solution would be to have something that runs through your scripts and finds the bits necessary to compile to do so on the front end, but I think just running with a small domain to cache everything works well. Of course if we want to make the tool easy to use on these systems we should thing about making it so the user doesn't need to know these details

Please don'd do an ssh, if this gets called from every node you can take down the login node.

-- Andy

Jan Blechta (blechta) said : #5

> Please don'd do an ssh, if this gets called from every node you can
> take down the login node.

I thought JIT compiling mechanism in DOLFIN calls compiler only from
a node which reaches a request for complation first (by checking dirs
and files in cache dir), doesn't it? Then this wouldn't be an issue.

Jan

Andy R Terrel (andy-terrel) said : #6

On Sat, Mar 16, 2013 at 10:35 AM, Jan Blechta
<email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Jan Blechta proposed the following answer:
>> Please don'd do an ssh, if this gets called from every node you can
>> take down the login node.
>
> I thought JIT compiling mechanism in DOLFIN calls compiler only from
> a node which reaches a request for complation first (by checking dirs
> and files in cache dir), doesn't it? Then this wouldn't be an issue.

If I have a 1000 nodes hit about the same point, then all do a file
stat. Between the time your ssh connection and the compiler is called
all other nodes will hit this part of the code. Now you have a 1000
logins and all calling the compiler. The file system will be
constrained as well having 1000 processes trying to overwrite a file
in a parallel file system (which is already stressed). At best you
will get the output of the last compiler called but you can often end
up with a corrupted file. Now the system admin will shut you down
because a 1000 processes calling any compiler will stress the common
system.

My understanding is that there is no calls to force the jitting to
only one node. Even if there was this is also a bad solution. Take
the example where 1000 nodes all hit the jit line, now rank 0
compiles. You will have to the other 1000 nodes checking the
filesystem which stressed the parallel filesystem's metadata server,
once again hitting the tragedy of the commons and the system
administrators will contact you.

If you do require a compile for sure, we should use a free compiler
and save the cache in a local disk on each node.

>
> Jan
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Johan Hake (johan-hake) said : #7

On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
<email address hidden>> wrote:
>
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel posted a new comment:
> Patrick,
>
> Let me give a few of my thoughts on the matter:
>
> 1) Large jobs cache the code and only need one node to compile. As far
> as I can tell, the current state would have every node compile the
> program in pydolfin. This does not scale and will break our filesystem

It is only the rank 0 process that perform the compile. The other processes
wait. Once the 0 process is finish the other processes read from cache. No
file locking is done with only a simple if statement to check what process
is executed together with a barrier.

> 2) Commercial compilers require license checkouts and TACC specifically
> forbids compiling on the backend for this reason. The backend can get
> to gcc but then you loose quite a bit of speed. When I get back to
> Texas we can explore building a gcc stack, but the system admins are
> avoiding it.

Good point. I think this is the reason we shifted to cmake in jitting so we
used the same compiler for the main dolfin lib as for the jit compiled
modules on the nodes. But with no compiler on the nodes this won't work.

> One solution would be to have something that runs through your scripts
> and finds the bits necessary to compile to do so on the front end, but I
> think just running with a small domain to cache everything works well.
> Of course if we want to make the tool easy to use on these systems we
> should thing about making it so the user doesn't need to know these
> details

True but hpc is difficult as it is. A set with standard instructions for
running FEniCS on clusters might not be that bad.

> Please don'd do an ssh, if this gets called from every node you can take
> down the login node.

As it is only the process 0 that actually performs the jit it might not be
that bad. One could go so far as letting the compute node generate all code
while letting the front node do the actually compile. But it sounds
horrible to debug and prone for error.

Johan

> -- Andy
>
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Jan Blechta (blechta) said : #8

On Sat, 16 Mar 2013 23:06:01 -0000
Johan Hake <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Johan Hake proposed the following answer:
> On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
> <email address hidden>> wrote:
> >
> > Question #224387 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/224387
> >
> > Andy R Terrel posted a new comment:
> > Patrick,
> >
> > Let me give a few of my thoughts on the matter:
> >
> > 1) Large jobs cache the code and only need one node to compile. As
> > far as I can tell, the current state would have every node compile
> > the program in pydolfin. This does not scale and will break our
> > filesystem
>
> It is only the rank 0 process that perform the compile. The other
> processes wait. Once the 0 process is finish the other processes read
> from cache. No file locking is done with only a simple if statement
> to check what process is executed together with a barrier.
>
> > 2) Commercial compilers require license checkouts and TACC
> > specifically forbids compiling on the backend for this reason. The
> > backend can get to gcc but then you loose quite a bit of speed.
> > When I get back to Texas we can explore building a gcc stack, but
> > the system admins are avoiding it.
>
> Good point. I think this is the reason we shifted to cmake in jitting
> so we used the same compiler for the main dolfin lib as for the jit
> compiled modules on the nodes. But with no compiler on the nodes this
> won't work.
>
> > One solution would be to have something that runs through your
> > scripts and finds the bits necessary to compile to do so on the
> > front end, but I think just running with a small domain to cache
> > everything works well. Of course if we want to make the tool easy
> > to use on these systems we should thing about making it so the user
> > doesn't need to know these details
>
> True but hpc is difficult as it is. A set with standard instructions
> for running FEniCS on clusters might not be that bad.
>
> > Please don'd do an ssh, if this gets called from every node you can
> > take down the login node.
>
> As it is only the process 0 that actually performs the jit it might
> not be that bad. One could go so far as letting the compute node
> generate all code while letting the front node do the actually
> compile. But it sounds horrible to debug and prone for error.

It could be done this way, not tested.
change lines in file instant/instant/build.py
  cmd = "cmake -DDEBUG=TRUE . "
  cmd = "make VERBOSE=1 "
to
  cmd = 'ssh user@node "cd $PWD; init-dolfin; cmake -DDEBUG=TRUE . "'
  cmd = 'ssh user@node "cd $PWD; init-dolfin; make VERBOSE=1"'
where init-dolfin is command you use for initializing environment
variables for DOLFIN. Note that this tiny change of Instant can
everybody do by hand according to his needs.

Moving FFC work to another node would be more complicated.

Jan

>
> Johan
>
> > -- Andy
> >
> > You received this question notification because you are a member of
> > DOLFIN Team, which is an answer contact for DOLFIN.
>

Kent-Andre Mardal (kent-and) said : #9

On 17 March 2013 02:26, Jan Blechta <email address hidden>wrote:

> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Jan Blechta proposed the following answer:
> On Sat, 16 Mar 2013 23:06:01 -0000
> Johan Hake <email address hidden> wrote:
> > Question #224387 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/224387
> >
> > Johan Hake proposed the following answer:
> > On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
> > <email address hidden>> wrote:
> > >
> > > Question #224387 on DOLFIN changed:
> > > https://answers.launchpad.net/dolfin/+question/224387
> > >
> > > Andy R Terrel posted a new comment:
> > > Patrick,
> > >
> > > Let me give a few of my thoughts on the matter:
> > >
> > > 1) Large jobs cache the code and only need one node to compile. As
> > > far as I can tell, the current state would have every node compile
> > > the program in pydolfin. This does not scale and will break our
> > > filesystem
> >
> > It is only the rank 0 process that perform the compile. The other
> > processes wait. Once the 0 process is finish the other processes read
> > from cache. No file locking is done with only a simple if statement
> > to check what process is executed together with a barrier.
> >
> > > 2) Commercial compilers require license checkouts and TACC
> > > specifically forbids compiling on the backend for this reason. The
> > > backend can get to gcc but then you loose quite a bit of speed.
> > > When I get back to Texas we can explore building a gcc stack, but
> > > the system admins are avoiding it.
> >
> > Good point. I think this is the reason we shifted to cmake in jitting
> > so we used the same compiler for the main dolfin lib as for the jit
> > compiled modules on the nodes. But with no compiler on the nodes this
> > won't work.
> >
> > > One solution would be to have something that runs through your
> > > scripts and finds the bits necessary to compile to do so on the
> > > front end, but I think just running with a small domain to cache
> > > everything works well. Of course if we want to make the tool easy
> > > to use on these systems we should thing about making it so the user
> > > doesn't need to know these details
> >
> > True but hpc is difficult as it is. A set with standard instructions
> > for running FEniCS on clusters might not be that bad.
> >
> > > Please don'd do an ssh, if this gets called from every node you can
> > > take down the login node.
> >
> > As it is only the process 0 that actually performs the jit it might
> > not be that bad. One could go so far as letting the compute node
> > generate all code while letting the front node do the actually
> > compile. But it sounds horrible to debug and prone for error.
>
> It could be done this way, not tested.
> change lines in file instant/instant/build.py
> cmd = "cmake -DDEBUG=TRUE . "
> cmd = "make VERBOSE=1 "
> to
> cmd = 'ssh user@node "cd $PWD; init-dolfin; cmake -DDEBUG=TRUE . "'
> cmd = 'ssh user@node "cd $PWD; init-dolfin; make VERBOSE=1"'
> where init-dolfin is command you use for initializing environment
> variables for DOLFIN. Note that this tiny change of Instant can
> everybody do by hand according to his needs.
>
> Moving FFC work to another node would be more complicated.
>
> Jan
>

We could add user-defined compilation commands to instant if this solves
the problems.
This is a very simple solution compared to implementing dummy-run in
dolfin, it seems.

Kent

>
> >
> > Johan
> >
> > > -- Andy
> > >
> > > You received this question notification because you are a member of
> > > DOLFIN Team, which is an answer contact for DOLFIN.
> >
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Garth Wells (garth-wells) said : #10

On 17 March 2013 13:41, Kent-Andre Mardal
<email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Kent-Andre Mardal proposed the following answer:
> On 17 March 2013 02:26, Jan Blechta
> <email address hidden>wrote:
>
>> Question #224387 on DOLFIN changed:
>> https://answers.launchpad.net/dolfin/+question/224387
>>
>> Jan Blechta proposed the following answer:
>> On Sat, 16 Mar 2013 23:06:01 -0000
>> Johan Hake <email address hidden> wrote:
>> > Question #224387 on DOLFIN changed:
>> > https://answers.launchpad.net/dolfin/+question/224387
>> >
>> > Johan Hake proposed the following answer:
>> > On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
>> > <email address hidden>> wrote:
>> > >
>> > > Question #224387 on DOLFIN changed:
>> > > https://answers.launchpad.net/dolfin/+question/224387
>> > >
>> > > Andy R Terrel posted a new comment:
>> > > Patrick,
>> > >
>> > > Let me give a few of my thoughts on the matter:
>> > >
>> > > 1) Large jobs cache the code and only need one node to compile. As
>> > > far as I can tell, the current state would have every node compile
>> > > the program in pydolfin. This does not scale and will break our
>> > > filesystem
>> >
>> > It is only the rank 0 process that perform the compile. The other
>> > processes wait. Once the 0 process is finish the other processes read
>> > from cache. No file locking is done with only a simple if statement
>> > to check what process is executed together with a barrier.
>> >
>> > > 2) Commercial compilers require license checkouts and TACC
>> > > specifically forbids compiling on the backend for this reason. The
>> > > backend can get to gcc but then you loose quite a bit of speed.
>> > > When I get back to Texas we can explore building a gcc stack, but
>> > > the system admins are avoiding it.
>> >
>> > Good point. I think this is the reason we shifted to cmake in jitting
>> > so we used the same compiler for the main dolfin lib as for the jit
>> > compiled modules on the nodes. But with no compiler on the nodes this
>> > won't work.
>> >
>> > > One solution would be to have something that runs through your
>> > > scripts and finds the bits necessary to compile to do so on the
>> > > front end, but I think just running with a small domain to cache
>> > > everything works well. Of course if we want to make the tool easy
>> > > to use on these systems we should thing about making it so the user
>> > > doesn't need to know these details
>> >
>> > True but hpc is difficult as it is. A set with standard instructions
>> > for running FEniCS on clusters might not be that bad.
>> >
>> > > Please don'd do an ssh, if this gets called from every node you can
>> > > take down the login node.
>> >
>> > As it is only the process 0 that actually performs the jit it might
>> > not be that bad. One could go so far as letting the compute node
>> > generate all code while letting the front node do the actually
>> > compile. But it sounds horrible to debug and prone for error.
>>
>> It could be done this way, not tested.
>> change lines in file instant/instant/build.py
>> cmd = "cmake -DDEBUG=TRUE . "
>> cmd = "make VERBOSE=1 "
>> to
>> cmd = 'ssh user@node "cd $PWD; init-dolfin; cmake -DDEBUG=TRUE . "'
>> cmd = 'ssh user@node "cd $PWD; init-dolfin; make VERBOSE=1"'
>> where init-dolfin is command you use for initializing environment
>> variables for DOLFIN. Note that this tiny change of Instant can
>> everybody do by hand according to his needs.
>>
>> Moving FFC work to another node would be more complicated.
>>
>> Jan
>>
>
>
> We could add user-defined compilation commands to instant if this solves
> the problems.
> This is a very simple solution compared to implementing dummy-run in
> dolfin, it seems.
>

I think that this is the right way. Dummy runs, ssh, etc are messy and
in cases very system dependent. We basically want a way to insert
something into the Instant cache.

Garth

> Kent
>
>
>>
>> >
>> > Johan
>> >
>> > > -- Andy
>> > >
>> > > You received this question notification because you are a member of
>> > > DOLFIN Team, which is an answer contact for DOLFIN.
>> >
>>
>> --
>> You received this question notification because you are a member of
>> DOLFIN Team, which is an answer contact for DOLFIN.
>>
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Jan Blechta (blechta) said : #11

On Sun, 17 Mar 2013 13:41:10 -0000
Kent-Andre Mardal <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Kent-Andre Mardal proposed the following answer:
> On 17 March 2013 02:26, Jan Blechta
> <email address hidden>wrote:
>
> > Question #224387 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/224387
> >
> > Jan Blechta proposed the following answer:
> > On Sat, 16 Mar 2013 23:06:01 -0000
> > Johan Hake <email address hidden> wrote:
> > > Question #224387 on DOLFIN changed:
> > > https://answers.launchpad.net/dolfin/+question/224387
> > >
> > > Johan Hake proposed the following answer:
> > > On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
> > > <email address hidden>> wrote:
> > > >
> > > > Question #224387 on DOLFIN changed:
> > > > https://answers.launchpad.net/dolfin/+question/224387
> > > >
> > > > Andy R Terrel posted a new comment:
> > > > Patrick,
> > > >
> > > > Let me give a few of my thoughts on the matter:
> > > >
> > > > 1) Large jobs cache the code and only need one node to
> > > > compile. As far as I can tell, the current state would have
> > > > every node compile the program in pydolfin. This does not
> > > > scale and will break our filesystem
> > >
> > > It is only the rank 0 process that perform the compile. The other
> > > processes wait. Once the 0 process is finish the other processes
> > > read from cache. No file locking is done with only a simple if
> > > statement to check what process is executed together with a
> > > barrier.
> > >
> > > > 2) Commercial compilers require license checkouts and TACC
> > > > specifically forbids compiling on the backend for this reason.
> > > > The backend can get to gcc but then you loose quite a bit of
> > > > speed. When I get back to Texas we can explore building a gcc
> > > > stack, but the system admins are avoiding it.
> > >
> > > Good point. I think this is the reason we shifted to cmake in
> > > jitting so we used the same compiler for the main dolfin lib as
> > > for the jit compiled modules on the nodes. But with no compiler
> > > on the nodes this won't work.
> > >
> > > > One solution would be to have something that runs through your
> > > > scripts and finds the bits necessary to compile to do so on the
> > > > front end, but I think just running with a small domain to cache
> > > > everything works well. Of course if we want to make the tool
> > > > easy to use on these systems we should thing about making it so
> > > > the user doesn't need to know these details
> > >
> > > True but hpc is difficult as it is. A set with standard
> > > instructions for running FEniCS on clusters might not be that bad.
> > >
> > > > Please don'd do an ssh, if this gets called from every node you
> > > > can take down the login node.
> > >
> > > As it is only the process 0 that actually performs the jit it
> > > might not be that bad. One could go so far as letting the compute
> > > node generate all code while letting the front node do the
> > > actually compile. But it sounds horrible to debug and prone for
> > > error.
> >
> > It could be done this way, not tested.
> > change lines in file instant/instant/build.py
> > cmd = "cmake -DDEBUG=TRUE . "
> > cmd = "make VERBOSE=1 "
> > to
> > cmd = 'ssh user@node "cd $PWD; init-dolfin; cmake -DDEBUG=TRUE .
> > "' cmd = 'ssh user@node "cd $PWD; init-dolfin; make VERBOSE=1"'
> > where init-dolfin is command you use for initializing environment
> > variables for DOLFIN. Note that this tiny change of Instant can
> > everybody do by hand according to his needs.
> >
> > Moving FFC work to another node would be more complicated.
> >
> > Jan
> >
>
>
> We could add user-defined compilation commands to instant

User can do this customization by hand if needed. Probably most users
don't need this.

> if this
> solves the problems.

This would need testing.

> This is a very simple solution compared to implementing dummy-run in
> dolfin, it seems.
>
> Kent
>
>
> >
> > >
> > > Johan
> > >
> > > > -- Andy
> > > >
> > > > You received this question notification because you are a
> > > > member of DOLFIN Team, which is an answer contact for DOLFIN.
> > >
> >
> > --
> > You received this question notification because you are a member of
> > DOLFIN Team, which is an answer contact for DOLFIN.
> >
>

Jan Blechta (blechta) said : #12

On Sun, 17 Mar 2013 14:06:07 -0000
Garth Wells <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Garth Wells proposed the following answer:
> On 17 March 2013 13:41, Kent-Andre Mardal
> <email address hidden> wrote:
> > Question #224387 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/224387
> >
> > Kent-Andre Mardal proposed the following answer:
> > On 17 March 2013 02:26, Jan Blechta
> > <email address hidden>wrote:
> >
> >> Question #224387 on DOLFIN changed:
> >> https://answers.launchpad.net/dolfin/+question/224387
> >>
> >> Jan Blechta proposed the following answer:
> >> On Sat, 16 Mar 2013 23:06:01 -0000
> >> Johan Hake <email address hidden> wrote:
> >> > Question #224387 on DOLFIN changed:
> >> > https://answers.launchpad.net/dolfin/+question/224387
> >> >
> >> > Johan Hake proposed the following answer:
> >> > On Mar 16, 2013 4:51 PM, "Andy R Terrel" <
> >> > <email address hidden>> wrote:
> >> > >
> >> > > Question #224387 on DOLFIN changed:
> >> > > https://answers.launchpad.net/dolfin/+question/224387
> >> > >
> >> > > Andy R Terrel posted a new comment:
> >> > > Patrick,
> >> > >
> >> > > Let me give a few of my thoughts on the matter:
> >> > >
> >> > > 1) Large jobs cache the code and only need one node to
> >> > > compile. As far as I can tell, the current state would have
> >> > > every node compile the program in pydolfin. This does not
> >> > > scale and will break our filesystem
> >> >
> >> > It is only the rank 0 process that perform the compile. The other
> >> > processes wait. Once the 0 process is finish the other processes
> >> > read from cache. No file locking is done with only a simple if
> >> > statement to check what process is executed together with a
> >> > barrier.
> >> >
> >> > > 2) Commercial compilers require license checkouts and TACC
> >> > > specifically forbids compiling on the backend for this
> >> > > reason. The backend can get to gcc but then you loose quite a
> >> > > bit of speed. When I get back to Texas we can explore building
> >> > > a gcc stack, but the system admins are avoiding it.
> >> >
> >> > Good point. I think this is the reason we shifted to cmake in
> >> > jitting so we used the same compiler for the main dolfin lib as
> >> > for the jit compiled modules on the nodes. But with no compiler
> >> > on the nodes this won't work.
> >> >
> >> > > One solution would be to have something that runs through your
> >> > > scripts and finds the bits necessary to compile to do so on the
> >> > > front end, but I think just running with a small domain to
> >> > > cache everything works well. Of course if we want to make the
> >> > > tool easy to use on these systems we should thing about making
> >> > > it so the user doesn't need to know these details
> >> >
> >> > True but hpc is difficult as it is. A set with standard
> >> > instructions for running FEniCS on clusters might not be that
> >> > bad.
> >> >
> >> > > Please don'd do an ssh, if this gets called from every node
> >> > > you can take down the login node.
> >> >
> >> > As it is only the process 0 that actually performs the jit it
> >> > might not be that bad. One could go so far as letting the
> >> > compute node generate all code while letting the front node do
> >> > the actually compile. But it sounds horrible to debug and prone
> >> > for error.
> >>
> >> It could be done this way, not tested.
> >> change lines in file instant/instant/build.py
> >> cmd = "cmake -DDEBUG=TRUE . "
> >> cmd = "make VERBOSE=1 "
> >> to
> >> cmd = 'ssh user@node "cd $PWD; init-dolfin; cmake -DDEBUG=TRUE .
> >> "' cmd = 'ssh user@node "cd $PWD; init-dolfin; make VERBOSE=1"'
> >> where init-dolfin is command you use for initializing environment
> >> variables for DOLFIN. Note that this tiny change of Instant can
> >> everybody do by hand according to his needs.
> >>
> >> Moving FFC work to another node would be more complicated.
> >>
> >> Jan
> >>
> >
> >
> > We could add user-defined compilation commands to instant if this
> > solves the problems.
> > This is a very simple solution compared to implementing dummy-run in
> > dolfin, it seems.
> >
>
> I think that this is the right way. Dummy runs, ssh, etc are messy and

This proposed solution uses ssh...

> in cases very system dependent. We basically want a way to insert
> something into the Instant cache.
>
> Garth
>
> > Kent
> >
> >
> >>
> >> >
> >> > Johan
> >> >
> >> > > -- Andy
> >> > >
> >> > > You received this question notification because you are a
> >> > > member of DOLFIN Team, which is an answer contact for DOLFIN.
> >> >
> >>
> >> --
> >> You received this question notification because you are a member of
> >> DOLFIN Team, which is an answer contact for DOLFIN.
> >>
> >
> > --
> > You received this question notification because you are a member of
> > DOLFIN Team, which is an answer contact for DOLFIN.
>

Andy R Terrel (andy-terrel) said : #13

Johan,

Can you point me to the code the limits the compile to mpirank 0? All
I see is a lock from instant that is called from ffc. FFC doesn't
even have a dependency on mpi. Since instant is using a wait lock,
the ffc/jitcompiler.py:177 will enter the try block, wait for the lock
to free, and then call the compiler.

I'm not sure if I'm reading the logic right completely but when you
call mpirun demo...py every node will call the jitcompiler.

-- Andy

Jan Blechta (blechta) said : #14

On Mon, 18 Mar 2013 14:41:19 -0000
Andy R Terrel <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel proposed the following answer:
> Johan,
>
> Can you point me to the code the limits the compile to mpirank 0? All
> I see is a lock from instant that is called from ffc. FFC doesn't
> even have a dependency on mpi. Since instant is using a wait lock,
> the ffc/jitcompiler.py:177 will enter the try block, wait for the lock
> to free, and then call the compiler.

site-packages/dolfin/compilemodules/jit.py: mpi_jit_decorator

>
> I'm not sure if I'm reading the logic right completely but when you
> call mpirun demo...py every node will call the jitcompiler.

Is it some specific demo? Or can you supply code which reproduce it?

>
> -- Andy
>
> You received this question notification because you are a direct
> subscriber of the question.

Andy R Terrel (andy-terrel) said : #15

On Mon, Mar 18, 2013 at 7:51 AM, Jan Blechta
<email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Jan Blechta proposed the following answer:
> On Mon, 18 Mar 2013 14:41:19 -0000
> Andy R Terrel <email address hidden> wrote:
>> Question #224387 on DOLFIN changed:
>> https://answers.launchpad.net/dolfin/+question/224387
>>
>> Andy R Terrel proposed the following answer:
>> Johan,
>>
>> Can you point me to the code the limits the compile to mpirank 0? All
>> I see is a lock from instant that is called from ffc. FFC doesn't
>> even have a dependency on mpi. Since instant is using a wait lock,
>> the ffc/jitcompiler.py:177 will enter the try block, wait for the lock
>> to free, and then call the compiler.
>
> site-packages/dolfin/compilemodules/jit.py: mpi_jit_decorator

Thanks.

This code will hit the python import problem on larger scale runs
(this is a problem throughout dolfin and not specific to the jitting).
But I see now how it is intercepting the ffc jit.

So the ssh will not flood our login node, but I would still recommend
doing a dummy run.

>
>>
>> I'm not sure if I'm reading the logic right completely but when you
>> call mpirun demo...py every node will call the jitcompiler.
>
> Is it some specific demo? Or can you supply code which reproduce it?
>

No, I was thinking out loud.

-- Andy

>>
>> -- Andy
>>
>> You received this question notification because you are a direct
>> subscriber of the question.
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Jan Blechta (blechta) said : #16

On Mon, 18 Mar 2013 16:45:58 -0000
Andy R Terrel <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel proposed the following answer:
> On Mon, Mar 18, 2013 at 7:51 AM, Jan Blechta
> <email address hidden> wrote:
> > Question #224387 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/224387
> >
> > Jan Blechta proposed the following answer:
> > On Mon, 18 Mar 2013 14:41:19 -0000
> > Andy R Terrel <email address hidden> wrote:
> >> Question #224387 on DOLFIN changed:
> >> https://answers.launchpad.net/dolfin/+question/224387
> >>
> >> Andy R Terrel proposed the following answer:
> >> Johan,
> >>
> >> Can you point me to the code the limits the compile to mpirank 0?
> >> All I see is a lock from instant that is called from ffc. FFC
> >> doesn't even have a dependency on mpi. Since instant is using a
> >> wait lock, the ffc/jitcompiler.py:177 will enter the try block,
> >> wait for the lock to free, and then call the compiler.
> >
> > site-packages/dolfin/compilemodules/jit.py: mpi_jit_decorator
>
> Thanks.
>
> This code will hit the python import problem on larger scale runs
> (this is a problem throughout dolfin and not specific to the jitting).

Could you explain me, Andy, what are you talking about? I'm not sure...

Jan

> But I see now how it is intercepting the ffc jit.
>
> So the ssh will not flood our login node, but I would still recommend
> doing a dummy run.
>
> >
> >>
> >> I'm not sure if I'm reading the logic right completely but when you
> >> call mpirun demo...py every node will call the jitcompiler.
> >
> > Is it some specific demo? Or can you supply code which reproduce it?
> >
>
> No, I was thinking out loud.
>
> -- Andy
>
> >>
> >> -- Andy
> >>
> >> You received this question notification because you are a direct
> >> subscriber of the question.
> >
> > --
> > You received this question notification because you are a member of
> > DOLFIN Team, which is an answer contact for DOLFIN.
>
> You received this question notification because you are a direct
> subscriber of the question.

Andy R Terrel (andy-terrel) said : #17

>> This code will hit the python import problem on larger scale runs
>> (this is a problem throughout dolfin and not specific to the jitting).
>
> Could you explain me, Andy, what are you talking about? I'm not sure...
>

On parallel filesystems, when a dynamic code "imports" a module
(shared object or other), the filesystem can be easily overloaded.
Think of it this way, 1000 processors all ask for the same file, which
requires a metadata stat first (usually done by a single machine), and
then streams from the appropriate object server. As the number of
processors increase this resource gets completely saturated and you
have an anti-scaling property. Whith a compiled code, supercomputing
centers usually have scripts that read the code and use an ssh tree to
push it out to the compute nodes.

Python has been well documented to have this problem in spades.

* See Addressing the Catastrophic Loading Problem with Walla in
http://jarrodmillman.com/scipy2011/pdfs/mandli_etal.pdf
* SciPy Talk: http://www.youtube.com/watch?v=BpuykTOy4a0

This jitting approach makes it difficult to even apply some of the
often practiced approaches to solving this problem (such as running
over modules to find import statement, import hooks, or "freezing"
python into a static compile).

-- Andy

Jan Blechta (blechta) said : #18

On Mon, 18 Mar 2013 18:26:07 -0000
Andy R Terrel <email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel proposed the following answer:
> >> This code will hit the python import problem on larger scale runs
> >> (this is a problem throughout dolfin and not specific to the
> >> jitting).
> >
> > Could you explain me, Andy, what are you talking about? I'm not
> > sure...
> >
>
> On parallel filesystems, when a dynamic code "imports" a module
> (shared object or other), the filesystem can be easily overloaded.
> Think of it this way, 1000 processors all ask for the same file, which
> requires a metadata stat first (usually done by a single machine), and
> then streams from the appropriate object server. As the number of
> processors increase this resource gets completely saturated and you
> have an anti-scaling property. Whith a compiled code, supercomputing
> centers usually have scripts that read the code and use an ssh tree to
> push it out to the compute nodes.
>
> Python has been well documented to have this problem in spades.
>
> * See Addressing the Catastrophic Loading Problem with Walla in
> http://jarrodmillman.com/scipy2011/pdfs/mandli_etal.pdf
> * SciPy Talk: http://www.youtube.com/watch?v=BpuykTOy4a0
>
> This jitting approach makes it difficult to even apply some of the
> often practiced approaches to solving this problem (such as running
> over modules to find import statement, import hooks, or "freezing"
> python into a static compile).
>
> -- Andy
>
> You received this question notification because you are a direct
> subscriber of the question.

Thanks, Andy. Good explanation.

Johan Hake (johan-hake) said : #19

Andy!

Thanks for pointing out all these potential bottlenecks. People are now
starting to try out PyDolfin on HPC machines so we are most probably
going to meet the catastrophic loading problem. Instant, which we use
for the JIT compilation is not at all designed with this in mind,
however walla seems like an interesting library to look into, but it
seems to be blue gene specific? It also looks like it is not developed
anymore.

  https://bitbucket.org/wscullin/walla/wiki/Home

Johan

On 03/18/2013 07:26 PM, Andy R Terrel wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel proposed the following answer:
>>> This code will hit the python import problem on larger scale runs
>>> (this is a problem throughout dolfin and not specific to the jitting).
>>
>> Could you explain me, Andy, what are you talking about? I'm not sure...
>>
>
> On parallel filesystems, when a dynamic code "imports" a module
> (shared object or other), the filesystem can be easily overloaded.
> Think of it this way, 1000 processors all ask for the same file, which
> requires a metadata stat first (usually done by a single machine), and
> then streams from the appropriate object server. As the number of
> processors increase this resource gets completely saturated and you
> have an anti-scaling property. Whith a compiled code, supercomputing
> centers usually have scripts that read the code and use an ssh tree to
> push it out to the compute nodes.
>
> Python has been well documented to have this problem in spades.
>
> * See Addressing the Catastrophic Loading Problem with Walla in
> http://jarrodmillman.com/scipy2011/pdfs/mandli_etal.pdf
> * SciPy Talk: http://www.youtube.com/watch?v=BpuykTOy4a0
>
> This jitting approach makes it difficult to even apply some of the
> often practiced approaches to solving this problem (such as running
> over modules to find import statement, import hooks, or "freezing"
> python into a static compile).
>
> -- Andy
>
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Andy R Terrel (andy-terrel) said : #20

I think walla is dead, for TACC's Stampede machine we might be able to
move everything to local disk via some ssh-tree. I'll have to check
what is going on with the other solutions out there.

-- Andy

On Wed, Mar 20, 2013 at 3:21 AM, Johan Hake
<email address hidden> wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Johan Hake proposed the following answer:
> Andy!
>
> Thanks for pointing out all these potential bottlenecks. People are now
> starting to try out PyDolfin on HPC machines so we are most probably
> going to meet the catastrophic loading problem. Instant, which we use
> for the JIT compilation is not at all designed with this in mind,
> however walla seems like an interesting library to look into, but it
> seems to be blue gene specific? It also looks like it is not developed
> anymore.
>
> https://bitbucket.org/wscullin/walla/wiki/Home
>
> Johan
>
> On 03/18/2013 07:26 PM, Andy R Terrel wrote:
>> Question #224387 on DOLFIN changed:
>> https://answers.launchpad.net/dolfin/+question/224387
>>
>> Andy R Terrel proposed the following answer:
>>>> This code will hit the python import problem on larger scale runs
>>>> (this is a problem throughout dolfin and not specific to the jitting).
>>>
>>> Could you explain me, Andy, what are you talking about? I'm not sure...
>>>
>>
>> On parallel filesystems, when a dynamic code "imports" a module
>> (shared object or other), the filesystem can be easily overloaded.
>> Think of it this way, 1000 processors all ask for the same file, which
>> requires a metadata stat first (usually done by a single machine), and
>> then streams from the appropriate object server. As the number of
>> processors increase this resource gets completely saturated and you
>> have an anti-scaling property. Whith a compiled code, supercomputing
>> centers usually have scripts that read the code and use an ssh tree to
>> push it out to the compute nodes.
>>
>> Python has been well documented to have this problem in spades.
>>
>> * See Addressing the Catastrophic Loading Problem with Walla in
>> http://jarrodmillman.com/scipy2011/pdfs/mandli_etal.pdf
>> * SciPy Talk: http://www.youtube.com/watch?v=BpuykTOy4a0
>>
>> This jitting approach makes it difficult to even apply some of the
>> often practiced approaches to solving this problem (such as running
>> over modules to find import statement, import hooks, or "freezing"
>> python into a static compile).
>>
>> -- Andy
>>
>> You received this question notification because you are a member of
>> DOLFIN Team, which is an answer contact for DOLFIN.
>>
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Johan Hake (johan-hake) said : #21

Ok, cool!

Johan

On 03/20/2013 07:11 PM, Andy R Terrel wrote:
> Question #224387 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/224387
>
> Andy R Terrel proposed the following answer:
> I think walla is dead, for TACC's Stampede machine we might be able to
> move everything to local disk via some ssh-tree. I'll have to check
> what is going on with the other solutions out there.
>
> -- Andy
>
> On Wed, Mar 20, 2013 at 3:21 AM, Johan Hake
> <email address hidden> wrote:
>> Question #224387 on DOLFIN changed:
>> https://answers.launchpad.net/dolfin/+question/224387
>>
>> Johan Hake proposed the following answer:
>> Andy!
>>
>> Thanks for pointing out all these potential bottlenecks. People are now
>> starting to try out PyDolfin on HPC machines so we are most probably
>> going to meet the catastrophic loading problem. Instant, which we use
>> for the JIT compilation is not at all designed with this in mind,
>> however walla seems like an interesting library to look into, but it
>> seems to be blue gene specific? It also looks like it is not developed
>> anymore.
>>
>> https://bitbucket.org/wscullin/walla/wiki/Home
>>
>> Johan
>>
>> On 03/18/2013 07:26 PM, Andy R Terrel wrote:
>>> Question #224387 on DOLFIN changed:
>>> https://answers.launchpad.net/dolfin/+question/224387
>>>
>>> Andy R Terrel proposed the following answer:
>>>>> This code will hit the python import problem on larger scale runs
>>>>> (this is a problem throughout dolfin and not specific to the jitting).
>>>>
>>>> Could you explain me, Andy, what are you talking about? I'm not sure...
>>>>
>>>
>>> On parallel filesystems, when a dynamic code "imports" a module
>>> (shared object or other), the filesystem can be easily overloaded.
>>> Think of it this way, 1000 processors all ask for the same file, which
>>> requires a metadata stat first (usually done by a single machine), and
>>> then streams from the appropriate object server. As the number of
>>> processors increase this resource gets completely saturated and you
>>> have an anti-scaling property. Whith a compiled code, supercomputing
>>> centers usually have scripts that read the code and use an ssh tree to
>>> push it out to the compute nodes.
>>>
>>> Python has been well documented to have this problem in spades.
>>>
>>> * See Addressing the Catastrophic Loading Problem with Walla in
>>> http://jarrodmillman.com/scipy2011/pdfs/mandli_etal.pdf
>>> * SciPy Talk: http://www.youtube.com/watch?v=BpuykTOy4a0
>>>
>>> This jitting approach makes it difficult to even apply some of the
>>> often practiced approaches to solving this problem (such as running
>>> over modules to find import statement, import hooks, or "freezing"
>>> python into a static compile).
>>>
>>> -- Andy
>>>
>>> You received this question notification because you are a member of
>>> DOLFIN Team, which is an answer contact for DOLFIN.
>>>
>>
>> --
>> You received this question notification because you are a member of
>> DOLFIN Team, which is an answer contact for DOLFIN.
>
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Can you help with this problem?

Provide an answer of your own, or ask Patrick Farrell for more information if necessary.

To post a message you must log in.