instant JIT compilation problems

Asked by Damiaan on 2013-03-08

This is my test code:

#!/usr/bin/env /home/s/steinman/dhabets/Root/bin/python
from dolfin import *
import os

mesh = UnitCubeMesh(10,10,10)
print 'mesh ok'
parameters.swig_binary = "swig"
parameters.swig_path = "/scratch/s/steinman/dhabets/Root/bin"
V = VectorFunctionSpace(mesh, "CG", 1)
v = Function(V)

If I run it from the command line, then it works fine; if I then call it through MPI using mpirun, then it's fine too.

BUT

if I run instant-clean and then run it through MPI it fails:

(after printing mesh ok)

Process 0: Calling FFC just-in-time (JIT) compiler, this may take some time.
In instant.recompile: The module did not compile, see '/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log'
Traceback (most recent call last):
  File "test.py", line 35, in <module>
    V = VectorFunctionSpace(mesh, "CG", 1)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/functions/functionspace.py", line 533, in __init__
    FunctionSpaceBase.__init__(self, mesh, element)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/functions/functionspace.py", line 78, in __init__
    ufc_element, ufc_dofmap = jit(self._ufl_element)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/compilemodules/jit.py", line 70, in mpi_jit
    output = local_jit(*args, **kwargs)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/compilemodules/jit.py", line 154, in jit
    return jit_compile(form, parameters=p, common_cell=common_cell)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py", line 71, in jit
    return jit_element(ufl_object, parameters)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py", line 177, in jit_element
    compiled_form, module, form_data, prefix = jit_form(form, parameters)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py", line 145, in jit_form
    cache_dir = cache_dir)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ufc_utils/build.py", line 72, in build_ufc_module
    **kwargs)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/build.py", line 482, in build_module
    recompile(modulename, module_path, setup_name, new_compilation_checksum)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/build.py", line 105, in recompile
    "compile, see '%s'" % compile_log_filename_dest)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/output.py", line 49, in instant_error
    raise RuntimeError(text)
RuntimeError: In instant.recompile: The module did not compile, see '/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log'

/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log shows:

running build_ext
building '_instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2' extension
creating build
creating build/temp.linux-x86_64-2.7
/scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/s/steinman/dhabets/Root/include -I/scratch/s/steinman/dhabets/Root/include -I/scratch/s/steinman/dhabets/Root/include/python2.7 -c instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.cxx -o build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o -O0
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ [enabled by default]
creating build/lib.linux-x86_64-2.7
/scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicxx -pthread -shared build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o -L/scratch/s/steinman/dhabets/Root/lib -L/home/s/steinman/dhabets/Root/lib -lboost_math_tr1 -lpython2.7 -o build/lib.linux-x86_64-2.7/_instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2.so
g++: error: build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o: No such file or directory
error: command '/scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicxx' failed with exit status 1

BUT instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o exists in

/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2

not in

build/temp.linux-x86_64-2.7 or build/lib.linux-x86_64-2.7

What am I doing wrong here?

It's expecting it here:
build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o

but it's here:

/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o

It works fine from the command line and the object is in the right directory.

gpc-f104n084-$ ls build/temp.linux-x86_64-2.7/
instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o

Why the difference? Is this related to my previous ticket, where the problem turned out to be the shell call that was being done?

Question information

Language:
English Edit question
Status:
Solved
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Solved by:
Damiaan
Solved:
2013-03-14
Last query:
2013-03-14
Last reply:
2013-03-13
Damiaan (dhabets) said : #1

Possibly related to 219270 ? popen() calls.

Kent-Andre Mardal (kent-and) said : #2

On 8 March 2013 16:26, Damiaan <email address hidden> wrote:

> New question #223748 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/223748
>
> This is my test code:
>
> #!/usr/bin/env /home/s/steinman/dhabets/Root/bin/python
> from dolfin import *
> import os
>
> mesh = UnitCubeMesh(10,10,10)
> print 'mesh ok'
> parameters.swig_binary = "swig"
> parameters.swig_path = "/scratch/s/steinman/dhabets/Root/bin"
> V = VectorFunctionSpace(mesh, "CG", 1)
> v = Function(V)
>
>
> If I run it from the command line, then it works fine; if I then call it
> through MPI using mpirun, then it's fine too.
>
> BUT
>
> if I run instant-clean and then run it through MPI it fails:
>
> (after printing mesh ok)
>
> Process 0: Calling FFC just-in-time (JIT) compiler, this may take some
> time.
> In instant.recompile: The module did not compile, see
> '/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log'
> Traceback (most recent call last):
> File "test.py", line 35, in <module>
> V = VectorFunctionSpace(mesh, "CG", 1)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/functions/functionspace.py",
> line 533, in __init__
> FunctionSpaceBase.__init__(self, mesh, element)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/functions/functionspace.py",
> line 78, in __init__
> ufc_element, ufc_dofmap = jit(self._ufl_element)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/compilemodules/jit.py",
> line 70, in mpi_jit
> output = local_jit(*args, **kwargs)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/compilemodules/jit.py",
> line 154, in jit
> return jit_compile(form, parameters=p, common_cell=common_cell)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py",
> line 71, in jit
> return jit_element(ufl_object, parameters)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py",
> line 177, in jit_element
> compiled_form, module, form_data, prefix = jit_form(form, parameters)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ffc/jitcompiler.py",
> line 145, in jit_form
> cache_dir = cache_dir)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/ufc_utils/build.py",
> line 72, in build_ufc_module
> **kwargs)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/build.py",
> line 482, in build_module
> recompile(modulename, module_path, setup_name,
> new_compilation_checksum)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/build.py",
> line 105, in recompile
> "compile, see '%s'" % compile_log_filename_dest)
> File
> "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/instant/output.py",
> line 49, in instant_error
> raise RuntimeError(text)
> RuntimeError: In instant.recompile: The module did not compile, see
> '/scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log'
>
>
> /scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2/compile.log
> shows:
>
> running build_ext
> building '_instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2'
> extension
> creating build
> creating build/temp.linux-x86_64-2.7
> /scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicc -fno-strict-aliasing -g
> -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
> -I/home/s/steinman/dhabets/Root/include
> -I/scratch/s/steinman/dhabets/Root/include
> -I/scratch/s/steinman/dhabets/Root/include/python2.7 -c
> instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.cxx -o
> build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o
> -O0
> cc1plus: warning: command line option '-Wstrict-prototypes' is valid for
> C/ObjC but not for C++ [enabled by default]
> creating build/lib.linux-x86_64-2.7
> /scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicxx -pthread -shared
> build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o
> -L/scratch/s/steinman/dhabets/Root/lib -L/home/s/steinman/dhabets/Root/lib
> -lboost_math_tr1 -lpython2.7 -o
> build/lib.linux-x86_64-2.7/_instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2.so
> g++: error:
> build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o:
> No such file or directory
> error: command '/scinet/gpc/mpi/openmpi/1.6.0-gcc-v4.7.0/bin/mpicxx'
> failed with exit status 1
>
>
> BUT instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o exists
> in
>
>
> /scratch/s/steinman/dhabets/.instant/error/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2
>
> not in
>
> build/temp.linux-x86_64-2.7 or build/lib.linux-x86_64-2.7
>
> What am I doing wrong here?
>
> It's expecting it here:
>
> build/temp.linux-x86_64-2.7/instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o
>
> but it's here:
>
> /instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o
>
>
> It works fine from the command line and the object is in the right
> directory.
>
> gpc-f104n084-$ ls build/temp.linux-x86_64-2.7/
> instant_module_3dad9986bbd44a40c279eee89a703b4c1f024ae2_wrap.o
>
>
> Why the difference? Is this related to my previous ticket, where the
> problem turned out to be the shell call that was being done?
>
>
instant first compiles the module under a temp dir and moves it to the
cache. Since you don't include the absolute
paths above it is hard for me to determine whether you are inside the
instant cache or not.

Also, instant has recently been updated such that it employs cmake instead
of distutils. It appears from the file names
here that you still use the version with distutils. Right?

Kent

>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Damiaan (dhabets) said : #3

Kent, every relative path is under /scratch/s/steinman/dhabets/.instant.

I'll check on the instant version and will switch if needed.

Damiaan (dhabets) said : #4

I'm using 1.1.0, I'm assuming you were referring to 1.2.0 ?

Damiaan (dhabets) said : #5

1.2.x makes things worse:

OSError: [Errno 30] Read-only file system: '/home/s/steinman/dhabets/.instant'

it no longer seems to realize that this is set:

INSTANT_CACHE_DIR=/scratch/s/steinman/dhabets/.instant

Johannes Ring (johannr) said : #6

Setting INSTANT_CACHE_DIR works fine for me with 1.2.x.

Damiaan (dhabets) said : #7

Apparently INSTANT_ERROR_DIR needs to be set now too. 1.2.x did fix the initially reported problem though.

Johan Hake (johan-hake) said : #8

Is this problem solved and can you summarize what the actual problem was? Setting INSTANT_CACHE_DIR? Also did you set a cache dir which is shared by all MPI ranks (over some networked file system) or a dir at the local compute node?

Recently we have seen problems with popen hanging on clusters and wonder if it could be related to your problem.

Damiaan (dhabets) said : #9

Hi Johan, I had the popen problem before, I referenced the ticket in my 2nd reply.

I installed 1.2.x, INSTANT_CACHE_DIR + INSTANT_ERROR_DIR and set CC/CXX to gcc/g++. Only then will it compile properly.

The cache dir is on a shared fs.

Johan Hake (johan-hake) said : #10

> Hi Johan, I had the popen problem before, I referenced the ticket in my
> 2nd reply.

Yes, but what does that reference refer to? Is it a launchpad ticket?
Doesn't seems so. Could you post the full url?

> I installed 1.2.x, INSTANT_CACHE_DIR + INSTANT_ERROR_DIR and set CC/CXX
> to gcc/g++. Only then will it compile properly.

Ok, so the error issued from popen was most probably related to
insufficient write properties?

> The cache dir is on a shared fs.

Ok,

Johan

Damiaan (dhabets) said : #11

Johan, it's the question #:

https://answers.launchpad.net/dolfin/+question/219270

No, the error related to popen() wasn't related to write permissions; it was a few things: openMPI version used, inifiband infrastructure, etc.

The error above doesn't have anything to do with popen as far as I can tell.

I can make the error reoccur by setting CC=mpicc and CXX to mpi as well. It basically ends up compiling in the parent directory, not in the build folder.

Johan Hake (johan-hake) said : #12

On 03/20/2013 01:16 PM, Damiaan wrote:
> Question #223748 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223748
>
> Damiaan posted a new comment:
> Johan, it's the question #:
>
> https://answers.launchpad.net/dolfin/+question/219270

Ahh, thanks!

> No, the error related to popen() wasn't related to write permissions; it
> was a few things: openMPI version used, inifiband infrastructure, etc.

Ok.

> The error above doesn't have anything to do with popen as far as I can
> tell.
>
> I can make the error reoccur by setting CC=mpicc and CXX to mpi as well.
> It basically ends up compiling in the parent directory, not in the build
> folder.

strange...

Johan