python loop over cells for DG0 elements hangs in parallel

Asked by Marco Morandini

I'm using dolfin 1.0, built using dorsal.
The following code works fine in parallel (4 processors, mpirun -np 4 ) for nx = 13, but hangs, on my pc, as soon as I increase it to nx = 14

#-------------
from dolfin import *

nx = 13
mesh = UnitCube(nx,10,10)

DG = FunctionSpace(mesh, "DG", 0)
E = Function(DG)
#print cells(mesh)
i = 0
global_idx = mesh.parallel_data().global_entity_indices(3)
#print mesh
#print global_idx
print "Number of cells", mesh.num_cells()
for c in cells(mesh):
 i = i+1
 #E.vector()[c.index()]=1./c.volume()
 if c.index() > mesh.num_cells():
  print "XXXXX", c.index()
  break
 E.vector()[global_idx.array()[c.index()]]=1./c.volume()
print i
V = VectorFunctionSpace(mesh, "CG", 1)
print "Done VF"
#-----------

I understand I should use a compiled Expression here; I'll ask a different question for the problems I'm having with the expression.

Output and backtraces below.

Thanks,

Marco

Output for nx = 13:

marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D> mpirun -np 4 python xxx.py
Process 0: Number of global vertices: 1694
Process 0: Number of global cells: 7800
Small graph
Process 0: Partitioned mesh, edge cut is 775.
Process 1: Partitioned mesh, edge cut is 775.
Process 2: Partitioned mesh, edge cut is 775.
Process 3: Partitioned mesh, edge cut is 775.
Number of cells 1950
Number of cells 1950
Number of cells 1950
Number of cells 1950
1950
1950
1950
1950
Done VF
Done VF
Done VF
Done VF
marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D>

Output for nx = 14:

marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D> mpirun -np 4 python xxx.py
Process 0: Number of global vertices: 1815
Process 0: Number of global cells: 8400
Process 1: Partitioned mesh, edge cut is 790.
Process 2: Partitioned mesh, edge cut is 790.
Process 3: Partitioned mesh, edge cut is 790.
Small graph
Process 0: Partitioned mesh, edge cut is 790.
Number of cells 2098
Number of cells 2100
Number of cells 2102
Number of cells 2100
2098

and all the four processes stay there, three with this backtrace

#0 0x00007fda4a7fb4e8 in poll () from /lib64/libc.so.6
#1 0x00007fda4935e130 in poll_dispatch () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#2 0x00007fda4935ce52 in opal_event_base_loop () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#3 0x00007fda49351d21 in opal_progress () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#4 0x00007fda4981f3d5 in ompi_request_default_wait_all () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#5 0x00007fda24811cb6 in ompi_coll_tuned_allreduce_intra_recursivedoubling ()
   from /home/marco/local/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so
#6 0x00007fda4983393b in PMPI_Allreduce () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#7 0x00007fda2cb14f7c in VecAssemblyBegin_MPI(_p_Vec*) () from /home/marco/local/Fenics/lib/libpetsc.so
#8 0x00007fda2cafcc69 in VecAssemblyBegin(_p_Vec*) () from /home/marco/local/Fenics/lib/libpetsc.so
#9 0x00007fda33073dd5 in dolfin::PETScVector::apply (this=0x1ca9b10, mode=<optimized out>)
    at /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/la/PETScVector.cpp:350
#10 0x00007fda3361af65 in _set_vector_items_value (self=0x1ca9b10, op=<optimized out>, value=8400.0000000005512)
    at /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/swig/la_get_set_items.i:377
#11 0x00007fda3361b224 in _wrap__set_vector_items_value (args=<optimized out>)
    at /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dorsal_build_dir/dolfin/swig/dolfinPYTHON_wrap.cxx:47973
#12 ...... python stuff from here below

the last one (I think the one that has printed "2098") with this

#0 0x00007f078fa934e8 in poll () from /lib64/libc.so.6
#1 0x00007f078e5f6130 in poll_dispatch () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#2 0x00007f078e5f4e52 in opal_event_base_loop () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#3 0x00007f078e5e9d21 in opal_progress () from /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0
#4 0x00007f078eab73d5 in ompi_request_default_wait_all () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#5 0x00007f0769aa9cb6 in ompi_coll_tuned_allreduce_intra_recursivedoubling ()
   from /home/marco/local/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so
#6 0x00007f078eaa54cb in ompi_comm_nextcid () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#7 0x00007f078eaa117e in ompi_comm_dup () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#8 0x00007f078eacf876 in PMPI_Comm_dup () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so
#9 0x00007f077819840e in dolfin::MPI::barrier ()
    at /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/common/MPI.cpp:77
#10 0x00007f07787d13e3 in _wrap_MPI_barrier (args=<optimized out>)
    at /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dorsal_build_dir/dolfin/swig/dolfinPYTHON_wrap.cxx:7169
#11 ...... python stuff from here below

Question information

Language:
English Edit question
Status:
Solved
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Solved by:
Marco Morandini
Solved:
Last query:
Last reply:
Revision history for this message
Johan Hake (johan-hake) said :
#1

I actually do not know exactly why it does not work with nx = 14, but I see
that when the number of cells are equally distributed on each processor it
works.

If you just want the inverted volume of the cell you can just use:

  1./tetrahedron.volume

You can also make an expression with the same result, which should just work
in parallell.

###########################
from dolfin import *

nx = 14
mesh = UnitCube(nx,10,10)

cpp_code = """
class MyFunc : public Expression
{
public:

  boost::shared_ptr<Mesh> mesh;

  MyFunc() : Expression()
  {
  }

void eval(Array<double>& values, const Array<double>& x,
          const ufc::cell& c) const
  {
    dolfin_assert(mesh);
    const Cell cell(*mesh, c.index);
    values[0] = 1.0/cell.volume();
  }
};
"""

e = Expression(cpp_code)
e.mesh = mesh

# Use e as you would have used E

###########################

Johan

On Thursday January 19 2012 14:55:49 Marco Morandini wrote:
> New question #185196 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/185196
>
> I'm using dolfin 1.0, built using dorsal.
> The following code works fine in parallel (4 processors, mpirun -np 4 ) for
> nx = 13, but hangs, on my pc, as soon as I increase it to nx = 14
>
> #-------------
> from dolfin import *
>
> nx = 13
> mesh = UnitCube(nx,10,10)
>
> DG = FunctionSpace(mesh, "DG", 0)
> E = Function(DG)
> #print cells(mesh)
> i = 0
> global_idx = mesh.parallel_data().global_entity_indices(3)
> #print mesh
> #print global_idx
> print "Number of cells", mesh.num_cells()
> for c in cells(mesh):
> i = i+1
> #E.vector()[c.index()]=1./c.volume()
> if c.index() > mesh.num_cells():
> print "XXXXX", c.index()
> break
> E.vector()[global_idx.array()[c.index()]]=1./c.volume()
> print i
> V = VectorFunctionSpace(mesh, "CG", 1)
> print "Done VF"
> #-----------
>
> I understand I should use a compiled Expression here; I'll ask a different
> question for the problems I'm having with the expression.
>
> Output and backtraces below.
>
> Thanks,
>
> Marco
>
>
>
> Output for nx = 13:
>
> marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D> mpirun -np 4
> python xxx.py Process 0: Number of global vertices: 1694
> Process 0: Number of global cells: 7800
> Small graph
> Process 0: Partitioned mesh, edge cut is 775.
> Process 1: Partitioned mesh, edge cut is 775.
> Process 2: Partitioned mesh, edge cut is 775.
> Process 3: Partitioned mesh, edge cut is 775.
> Number of cells 1950
> Number of cells 1950
> Number of cells 1950
> Number of cells 1950
> 1950
> 1950
> 1950
> 1950
> Done VF
> Done VF
> Done VF
> Done VF
> marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D>
>
> Output for nx = 14:
>
> marco@pao:~/Projects/Cta/Alioli/BoundaryConditions/elas/3D> mpirun -np 4
> python xxx.py Process 0: Number of global vertices: 1815
> Process 0: Number of global cells: 8400
> Process 1: Partitioned mesh, edge cut is 790.
> Process 2: Partitioned mesh, edge cut is 790.
> Process 3: Partitioned mesh, edge cut is 790.
> Small graph
> Process 0: Partitioned mesh, edge cut is 790.
> Number of cells 2098
> Number of cells 2100
> Number of cells 2102
> Number of cells 2100
> 2098
>
> and all the four processes stay there, three with this backtrace
>
> #0 0x00007fda4a7fb4e8 in poll () from /lib64/libc.so.6
> #1 0x00007fda4935e130 in poll_dispatch () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #2
> 0x00007fda4935ce52 in opal_event_base_loop () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #3
> 0x00007fda49351d21 in opal_progress () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #4
> 0x00007fda4981f3d5 in ompi_request_default_wait_all () from
> /home/marco/local/openmpi-1.4.3/lib/libmpi.so #5 0x00007fda24811cb6 in
> ompi_coll_tuned_allreduce_intra_recursivedoubling () from
> /home/marco/local/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so #6
> 0x00007fda4983393b in PMPI_Allreduce () from
> /home/marco/local/openmpi-1.4.3/lib/libmpi.so #7 0x00007fda2cb14f7c in
> VecAssemblyBegin_MPI(_p_Vec*) () from
> /home/marco/local/Fenics/lib/libpetsc.so #8 0x00007fda2cafcc69 in
> VecAssemblyBegin(_p_Vec*) () from /home/marco/local/Fenics/lib/libpetsc.so
> #9 0x00007fda33073dd5 in dolfin::PETScVector::apply (this=0x1ca9b10,
> mode=<optimized out>) at
> /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/la/PETSc
> Vector.cpp:350 #10 0x00007fda3361af65 in _set_vector_items_value
> (self=0x1ca9b10, op=<optimized out>, value=8400.0000000005512) at
> /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/swig/la_
> get_set_items.i:377 #11 0x00007fda3361b224 in _wrap__set_vector_items_value
> (args=<optimized out>) at
> /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dorsal_build_di
> r/dolfin/swig/dolfinPYTHON_wrap.cxx:47973 #12 ...... python stuff from here
> below
>
> the last one (I think the one that has printed "2098") with this
>
> #0 0x00007f078fa934e8 in poll () from /lib64/libc.so.6
> #1 0x00007f078e5f6130 in poll_dispatch () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #2
> 0x00007f078e5f4e52 in opal_event_base_loop () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #3
> 0x00007f078e5e9d21 in opal_progress () from
> /home/marco/local/openmpi-1.4.3/lib/libopen-pal.so.0 #4
> 0x00007f078eab73d5 in ompi_request_default_wait_all () from
> /home/marco/local/openmpi-1.4.3/lib/libmpi.so #5 0x00007f0769aa9cb6 in
> ompi_coll_tuned_allreduce_intra_recursivedoubling () from
> /home/marco/local/openmpi-1.4.3/lib/openmpi/mca_coll_tuned.so #6
> 0x00007f078eaa54cb in ompi_comm_nextcid () from
> /home/marco/local/openmpi-1.4.3/lib/libmpi.so #7 0x00007f078eaa117e in
> ompi_comm_dup () from /home/marco/local/openmpi-1.4.3/lib/libmpi.so #8
> 0x00007f078eacf876 in PMPI_Comm_dup () from
> /home/marco/local/openmpi-1.4.3/lib/libmpi.so #9 0x00007f077819840e in
> dolfin::MPI::barrier ()
> at
> /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dolfin/common/M
> PI.cpp:77 #10 0x00007f07787d13e3 in _wrap_MPI_barrier (args=<optimized
> out>) at
> /home/marco/Programmi/Dolphin/src_from_dorsal/dolfin-1.0.0/dorsal_build_di
> r/dolfin/swig/dolfinPYTHON_wrap.cxx:7169 #11 ...... python stuff from here
> below

Revision history for this message
Marco Morandini (morandini) said :
#2

>Johan Hake proposed the following answer:
>I actually do not know exactly why it does not work with nx = 14, but I see
>that when the number of cells are equally distributed on each processor it
>works.

Interesting. If I'll not find the reason I'll file a bug report next week,
with just not to loose track of this.

>If you just want the inverted volume of the cell you can just use:
>
> 1./tetrahedron.volume

Thanks! I completely missed this.

Thanks again,

Marco

Revision history for this message
Garth Wells (garth-wells) said :
#3

This is an example of where direct indexing into vectors should be avoided. It will break in parallel.

Revision history for this message
Johan Hake (johan-hake) said :
#4

On Friday January 20 2012 12:55:47 Garth Wells wrote:
> Question #185196 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/185196
>
> Status: Open => Answered
>
> Garth Wells proposed the following answer:
> This is an example of where direct indexing into vectors should be
> avoided. It will break in parallel.

Did you read the code? He did actually use the global_entity_indices
MeshFunction to get the correct global index from the local index. :)

Johan

Revision history for this message
Garth Wells (garth-wells) said :
#5

On 20 January 2012 12:35, Johan Hake
<email address hidden> wrote:
> Question #185196 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/185196
>
> Johan Hake proposed the following answer:
> On Friday January 20 2012 12:55:47 Garth Wells wrote:
>> Question #185196 on DOLFIN changed:
>> https://answers.launchpad.net/dolfin/+question/185196
>>
>>     Status: Open => Answered
>>
>> Garth Wells proposed the following answer:
>> This is an example of where direct indexing into vectors should be
>> avoided. It will break in parallel.
>
> Did you read the code?

Yes.

> He did actually use the global_entity_indices
> MeshFunction to get the correct global index from the local index. :)
>

And where in DOLFIN is it defined whether to use local or global
numbering when indexing directly into a vector? When indexing into an
array (I know a Vector is not array, but permitting indexing treats it
like one) it is natural to work with local indices since one can
operate directly on the raw array.

Garth

> Johan
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Revision history for this message
Marco Morandini (morandini) said :
#6

On 01/23/2012 10:15 AM, Garth Wells wrote:
> And where in DOLFIN is it defined whether to use local or global
> numbering when indexing directly into a vector?

If I understand correctly the code

1)
la_post.i::__getitem__(self, indices):

calls directly, when "indices" is an integer,

la_get_set_items.i::_get_vector_single_item(self, indices)

2)
_get_vector_single_item( dolfin::GenericVector* self, int index )

sets

uint i = index;

and then calls

self->get_local(&value, 1, &i)

3)
Finally, PETScVector::get_local is based (again, if I understand
correctly the code), on global indices (VecGetValues indices are "in
global 1d numbering" according to PETSC's documentation).

I'm surely missing/misunderstanding something, but what?

Thanks again,

Marco

Revision history for this message
Johan Hake (johan-hake) said :
#7

On Monday January 23 2012 13:35:44 Marco Morandini wrote:
> Question #185196 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/185196
>
> Status: Answered => Open
>
> Marco Morandini is still having a problem:
>
> On 01/23/2012 10:15 AM, Garth Wells wrote:
> > And where in DOLFIN is it defined whether to use local or global
> > numbering when indexing directly into a vector?

Global. Then it is up to the user to make sure the index is owned by the
process. There is a bug related to this, which state that we should provide a
check for accessing indices owned by another process, at least for the Python
interface.

> If I understand correctly the code
>
> 1)
> la_post.i::__getitem__(self, indices):
>
> calls directly, when "indices" is an integer,
>
> la_get_set_items.i::_get_vector_single_item(self, indices)
>
> 2)
> _get_vector_single_item( dolfin::GenericVector* self, int index )
>
> sets
>
> uint i = index;
>
> and then calls
>
> self->get_local(&value, 1, &i)
>
> 3)
> Finally, PETScVector::get_local is based (again, if I understand
> correctly the code), on global indices (VecGetValues indices are "in
> global 1d numbering" according to PETSC's documentation).
>
> I'm surely missing/misunderstanding something, but what?

That should be correct. The version of PETScVector::get_local, which get
called by _get_vector_single_item, operates with global indices. But can only
return values with indices which resides on the processor. In your case that
should be safe.

The name of that function is really missleading, as it looks like it assume
one should use local indexing.

Johan

> Thanks again,
>
> Marco

Revision history for this message
Marco Morandini (morandini) said :
#8