MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

Asked by Damiaan

I can run several test problems without issue on 1 node, for example:

python ns channel piso dt_division=10 supg_bool=True u_order=1 refinement_level=0

works just fine. It requires 512 time steps.

But as soon as I try to run it using MPI I get:

Time step 511 finished in 0.04 seconds, 99.8% done (t=0.499, T=0.5; 00:00:00 remaining).
----------------------------------------------------------------------------------------
Process 0: Solving nonlinear variational problem.
  Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).
  Process 0: Newton iteration 1: r (abs) = 6.519e-04 (tol = 1.000e-05) r (rel) = 1.000e-02 (tol = 1.000e-05)
  Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).
  Process 0: Newton iteration 2: r (abs) = 6.519e-06 (tol = 1.000e-05) r (rel) = 1.000e-04 (tol = 1.000e-05)
  Process 0: Newton solver finished in 2 iterations and 18 linear solver iterations.
Process 0: Solving linear system of size 64 x 64 (PETSc Krylov solver).
Process 0: Solving linear system oProcess 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Solving nonlinear variational problem.
Process 1: Evaluating at x = <Point x = 1 y = 0.5 z = 0>
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Solving nonlinear variational problem.
Process 3: Evaluating at x = <Point x = 1 y = 0.5 z = 0>
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Solving nonlinear variational problem.
Process 7: Evaluating at x = <Point x = 1 y = 0.5 z = 0>
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Solving nonlinear variational problem.
Process 4: Evaluating at x = <Point x = 1 y = 0.5 z = 0>
f size 128 x 128 (PETSc Krylov solver).
Process 0: Solving linear system of size 64 x 64 (PETSc Krylov solver).
Process 0: Solving nonlinear variational problem.
  Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).
  Process 0: Newton iteration 1: r (abs) = 6.530e-08 (tol = 1.000e-05) r (rel) = 1.000e-02 (tol = 1.000e-05)
  Process 0: Newton solver finished in 1 iterations and 10 linear solver iterations.
Process 0: Evaluating at x = <Point x = 1 y = 0.5 z = 0>
Traceback (most recent call last):
  File "ns", line 217, in <module>
    result = main(args)
  File "ns", line 192, in main
    u, p = solver.solve(problem)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/piso.py", line 194, in solve
    self.update(problem, t, unc, p1)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/solverbase.py", line 127, in update
    M = problem.functional(t, u, p)
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/channel.py", line 87, in functional
    return self.uEval(u, 0, (1.0, 0.5))
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 108, in uEval
    return self.eval(func, point)[component]
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 99, in eval
    M[i] = MPI.sum(M[i])/N
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/cpp/common.py", line 588, in sum
    return _common.MPI_sum(*args)
Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer
Traceback (most recent call last):
  File "ns", line 217, in <module>
    result = main(args)
  File "ns", line 192, in main
    u, p = solver.solve(problem)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/piso.py", line 194, in solve
    self.update(problem, t, unc, p1)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/solverbase.py", line 127, in update
    M = problem.functional(t, u, p)
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/channel.py", line 87, in functional
    return self.uEval(u, 0, (1.0, 0.5))
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 108, in uEval
    return self.eval(func, point)[component]
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 99, in eval
    M[i] = MPI.sum(M[i])/N
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/cpp/common.py", line 588, in sum
    return _common.MPI_sum(*args)
Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer
Traceback (most recent call last):
  File "ns", line 217, in <module>
    result = main(args)
  File "ns", line 192, in main
    u, p = solver.solve(problem)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/piso.py", line 194, in solve
    self.update(problem, t, unc, p1)
  File "/scratch/s/steinman/dhabets/Testing/wip/solvers/solverbase.py", line 127, in update
    M = problem.functional(t, u, p)
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/channel.py", line 87, in functional
    return self.uEval(u, 0, (1.0, 0.5))
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 108, in uEval
    return self.eval(func, point)[component]
  File "/scratch/s/steinman/dhabets/Testing/wip/problems/problembase.py", line 99, in eval
    M[i] = MPI.sum(M[i])/N
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/dolfin/cpp/common.py", line 588, in sum
    return _common.MPI_sum(*args)
Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

Any ideas? A simple test script runs fine with MPI.

Question information

Language:
English Edit question
Status:
Solved
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Solved by:
Damiaan
Solved:
Last query:
Last reply:
Revision history for this message
Damiaan (dhabets) said :
#1

 Any suggestions?

Revision history for this message
Kent-Andre Mardal (kent-and) said :
#2

Mikael has made some tools for point evaluation in parallel: see
http://bazaar.launchpad.net/~mikael-mortensen/cbcpdesys/trunk/view/head:/cbc/cfd/tools/Probe.py

Maybe these are better.

I am not able to test the wip code on a cluster now since our local cluster
is shut down.
Hopefully, we have access to a new cluster on April 1.

Kent

On 14 March 2013 18:21, Damiaan <email address hidden>wrote:

> Question #223746 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223746
>
> Damiaan posted a new comment:
> Any suggestions?
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Revision history for this message
Damiaan (dhabets) said :
#3

Kent, that Probe.py script does this for 1 processor:

500 of 500 probes found on processor 0

for two processors:

Process 0: Number of global vertices: 1331
Process 0: Number of global cells: 6000
Traceback (most recent call last):
  File "Probe.py", line 234, in <module>
Traceback (most recent call last):
  File "Probe.py", line 234, in <module>
    sl = StructuredGrid(N, origin, tangents, dL, V)
    sl = StructuredGrid(N, origin, tangents, dL, V)
  File "Probe.py", line 140, in __init__
  File "Probe.py", line 140, in __init__
    Probes.__init__(self, self.x, V)
    Probes.__init__(self, self.x, V)
  File "Probe.py", line 106, in __init__
  File "Probe.py", line 106, in __init__
    self.append((i, Probe(array(p), V, max_probes=max_probes)))
    self.append((i, Probe(array(p), V, max_probes=max_probes)))
  File "Probe.py", line 26, in __init__
  File "Probe.py", line 26, in __init__
    raise RuntimeError('Probe not found on processor')
    raise RuntimeError('Probe not found on processor')
RuntimeError: RuntimeError: Probe not found on processor
Probe not found on processor

Which means?

Revision history for this message
Mikael Mortensen (mikael-mortensen) said :
#4

Hi,

I think you're probably not using the code correctly. Could you please send a bit more of the script you're running?

Best regards

Mikael

Den Mar 14, 2013 kl. 7:46 PM skrev Damiaan:

> Question #223746 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223746
>
> Status: Answered => Open
>
> Damiaan is still having a problem:
> Kent, that Probe.py script does this for 1 processor:
>
> 500 of 500 probes found on processor 0
>
> for two processors:
>
> Process 0: Number of global vertices: 1331
> Process 0: Number of global cells: 6000
> Traceback (most recent call last):
> File "Probe.py", line 234, in <module>
> Traceback (most recent call last):
> File "Probe.py", line 234, in <module>
> sl = StructuredGrid(N, origin, tangents, dL, V)
> sl = StructuredGrid(N, origin, tangents, dL, V)
> File "Probe.py", line 140, in __init__
> File "Probe.py", line 140, in __init__
> Probes.__init__(self, self.x, V)
> Probes.__init__(self, self.x, V)
> File "Probe.py", line 106, in __init__
> File "Probe.py", line 106, in __init__
> self.append((i, Probe(array(p), V, max_probes=max_probes)))
> self.append((i, Probe(array(p), V, max_probes=max_probes)))
> File "Probe.py", line 26, in __init__
> File "Probe.py", line 26, in __init__
> raise RuntimeError('Probe not found on processor')
> raise RuntimeError('Probe not found on processor')
> RuntimeError: RuntimeError: Probe not found on processor
> Probe not found on processor
>
> Which means?
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Revision history for this message
Damiaan (dhabets) said :
#5

Hi Mikael,

   are you referring to the wip code or Probe.py? If you have a suggestion for test code, then please let me know.

thanks,
Damiaan

Revision history for this message
Mikael Mortensen (mikael-mortensen) said :
#6

I was referring to the Probe code, I don't know what the other code is. If
you want to run in parallel and evaluate one single point in the mesh many
times during a simulation you can do this:

from cbc.cfd.tools.Probe import Probes

# Probe three locations in V in a 3D mesh
V = VectorFunctionSpace(mesh, 'CG', 2)
x = array([[0.5, 0.5, 0.5], [0.2, 0.3, 0.4], [0.8, 0.9, 1.0]])
p = Probes(x, V)
u = interpolate(Expression(("x[0]", "x[1]", "x[2]")), V) # Some Function to
probe
p(v0) # This makes the evaluation and typically goes inside the time loop
p(v0) # once more
p.dump("testing") # Finished with simulations. Dump all results
print p.tonumpy(0) # Alternative

p.dump creates three files called testing_0.probe, testing_1.probe,
testing_2.probe. Look at them using from numpy import load and then p0 =
load("testing_o.probe") etc.

p.tonumpy(0) returns all three values of u for the first probe evaluation
on process 0.

To install do bzr branch lp:cbcpdesys. Otherwise you can use the code you
find in Probe.py as is.

There are some tests at the bottom of Probe.py. You can for example probe
an entire 2D plane of a 3D mesh and then dump results for the plane to vtk.

Best regards

Mikael

On 14 March 2013 20:26, Damiaan <email address hidden>wrote:

> Question #223746 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223746
>
> Status: Answered => Open
>
> Damiaan is still having a problem:
> Hi Mikael,
>
> are you referring to the wip code or Probe.py? If you have a
> suggestion for test code, then please let me know.
>
> thanks,
> Damiaan
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Revision history for this message
Damiaan (dhabets) said :
#7

Thanks Mikael, I installed it and tried this:

-----
from dolfin import *
from cbc.cfd.tools.Probe import *

# Print log messages only from the root process in parallel
parameters["std_out_all_processes"] = False;
parameters["allow_extrapolation"] = True

mesh = UnitCubeMesh(10, 10, 10)
V = FunctionSpace(mesh, 'CG', 1)
Vv = VectorFunctionSpace(mesh, 'CG', 1)
W = V * Vv

# Just create some random data to be used for probing
u0 = interpolate(Expression('x[0]'), V)
y0 = interpolate(Expression('x[1]'), V)
z0 = interpolate(Expression('x[2]'), V)
u1 = interpolate(Expression('x[0]*x[0]'), V)
v0 = interpolate(Expression(('x[0]', 'x[1]', 'x[2]')), Vv)
w0 = interpolate(Expression(('x[0]', 'x[1]', 'x[2]', 'x[1]*x[2]')), W)

# Test StructuredGrid
origin = [0.4, 0.4, 0.5] # origin of slice
tangents = [[1, 0, 1], [0, 1, 1]] # directional tangent directions (scaled in StructuredGrid)
dL = [0.2, 0.3] # extent of slice in both directions
N = [25, 20] # number of points in each direction

#### Create a range of probes for a UnitSquare
N = 5
xx = linspace(0.25, 0.75, N)
xx = xx.repeat(N).reshape((N, N)).transpose()
yy = linspace(0.25, 0.75, N)
yy = yy.repeat(N).reshape((N, N))
x = zeros((N*N, 3))
for i in range(N):
  for j in range(N):
    x[i*N + j, 0 ] = xx[i, j] # x-value
    x[i*N + j, 1 ] = yy[i, j] # y-value

probesV = Probes(x, V, 1000, use_python=True)
-----

runs fine for number of processors set to 1, 2, 4, but but when using 3, 5, 6, etc. I had to comment out line in 142 in Probe.py due to IndexError: list index out of range :

# self.value_size = self[0][1].value_size()

No C++ Probe
No C++ Probe
No C++ Probe
Process 0: Number of global vertices: 1331
Process 0: Number of global cells: 6000
Traceback (most recent call last):
  File "testme.py", line 41, in <module>
0 of 25 probes found on processor 1 <---- why is this 0?
14 of 25 probes found on processor 2
12 of 25 probes found on processor 0
    probesV = Probes(x, V, 1000, use_python=True)
  File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/cbc/cfd/tools/Probe.py", line 142, in __init__
    self.value_size = self[0][1].value_size()
IndexError: list index out of range

I'm probably missing something very obvious and trivial here, but why would it assign 0 probes to processor 1? Also, why is it doing 14+12 = 26 probes?

thanks,
Damiaan

Revision history for this message
Mikael Mortensen (mikael-mortensen) said :
#8

Hi Damiaan

Den Mar 18, 2013 kl. 8:51 PM skrev Damiaan:

> Question #223746 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223746
>
> Status: Answered => Open
>
> Damiaan is still having a problem:
> Thanks Mikael, I installed it and tried this:
>
> -----
> from dolfin import *
> from cbc.cfd.tools.Probe import *
>
> # Print log messages only from the root process in parallel
> parameters["std_out_all_processes"] = False;
> parameters["allow_extrapolation"] = True
>
> mesh = UnitCubeMesh(10, 10, 10)
> V = FunctionSpace(mesh, 'CG', 1)
> Vv = VectorFunctionSpace(mesh, 'CG', 1)
> W = V * Vv
>
> # Just create some random data to be used for probing
> u0 = interpolate(Expression('x[0]'), V)
> y0 = interpolate(Expression('x[1]'), V)
> z0 = interpolate(Expression('x[2]'), V)
> u1 = interpolate(Expression('x[0]*x[0]'), V)
> v0 = interpolate(Expression(('x[0]', 'x[1]', 'x[2]')), Vv)
> w0 = interpolate(Expression(('x[0]', 'x[1]', 'x[2]', 'x[1]*x[2]')), W)
>
> # Test StructuredGrid
> origin = [0.4, 0.4, 0.5] # origin of slice
> tangents = [[1, 0, 1], [0, 1, 1]] # directional tangent directions (scaled in StructuredGrid)
> dL = [0.2, 0.3] # extent of slice in both directions
> N = [25, 20] # number of points in each direction
>
> #### Create a range of probes for a UnitSquare
> N = 5
> xx = linspace(0.25, 0.75, N)
> xx = xx.repeat(N).reshape((N, N)).transpose()
> yy = linspace(0.25, 0.75, N)
> yy = yy.repeat(N).reshape((N, N))
> x = zeros((N*N, 3))
> for i in range(N):
> for j in range(N):
> x[i*N + j, 0 ] = xx[i, j] # x-value
> x[i*N + j, 1 ] = yy[i, j] # y-value
>
> probesV = Probes(x, V, 1000, use_python=True)
> -----
>
> runs fine for number of processors set to 1, 2, 4, but but when using 3,
> 5, 6, etc. I had to comment out line in 142 in Probe.py due to
> IndexError: list index out of range :
>
> # self.value_size = self[0][1].value_size()
>

This is the number of spaces in the (mixed) function space you're probing. Change to

self.value_size = V.num_sub_spaces() if V.num_sub_spaces() > 0 else 1

and it should work.

> No C++ Probe
> No C++ Probe
> No C++ Probe
> Process 0: Number of global vertices: 1331
> Process 0: Number of global cells: 6000
> Traceback (most recent call last):
> File "testme.py", line 41, in <module>
> 0 of 25 probes found on processor 1 <---- why is this 0?
> 14 of 25 probes found on processor 2
> 12 of 25 probes found on processor 0
> probesV = Probes(x, V, 1000, use_python=True)
> File "/scratch/s/steinman/dhabets/Root/lib/python2.7/site-packages/cbc/cfd/tools/Probe.py", line 142, in __init__
> self.value_size = self[0][1].value_size()
> IndexError: list index out of range

You are creating 25 probes on a 5 * 5 structured grid. There are 26 probes because the same point is found on two processes. It is not an error.

The probe code is not very stable yet and much of what you see in Probe.py is being moved to C++. Let me know if it still doesn't work for your problem.

Best regards

Mikael

>
>
> I'm probably missing something very obvious and trivial here, but why would it assign 0 probes to processor 1? Also, why is it doing 14+12 = 26 probes?
>
> thanks,
> Damiaan
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

Revision history for this message
Damiaan (dhabets) said :
#9

Thanks Mikael,

   ok, that works fine, but, I still don't understand why there's processors without any probes assigned:

No C++ Probe
No C++ Probe
No C++ Probe
No C++ Probe
No C++ Probe
Process 0: Number of global vertices: 1331
Process 0: Number of global cells: 6000
0 of 25 probes found on processor 4
0 of 25 probes found on processor 1
13 of 25 probes found on processor 3
12 of 25 probes found on processor 0
0 of 25 probes found on processor 2

I would expect it to place 5 on each? Or is the above expected behavior?

thanks,
Damiaan

Revision history for this message
Mikael Mortensen (mikael-mortensen) said :
#10

On Monday, 18 March 2013, Damiaan wrote:

> Question #223746 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/223746
>
> Status: Answered => Open
>
> Damiaan is still having a problem:
> Thanks Mikael,
>
> ok, that works fine, but, I still don't understand why there's
> processors without any probes assigned:
>
> No C++ Probe
> No C++ Probe
> No C++ Probe
> No C++ Probe
> No C++ Probe
> Process 0: Number of global vertices: 1331
> Process 0: Number of global cells: 6000
> 0 of 25 probes found on processor 4
> 0 of 25 probes found on processor 1
> 13 of 25 probes found on processor 3
> 12 of 25 probes found on processor 0
> 0 of 25 probes found on processor 2
>
> I would expect it to place 5 on each? Or is the above expected behavior?

This is the expected behavior. There are 25 probes, all with different
coordinates. You don't know in advance which Process they live on. That
depends on how the mesh is distributed. The probes are not assigned to a
Process, they are located.

Mikael

>
> thanks,
> Damiaan
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Revision history for this message
Damiaan (dhabets) said :
#11

Thanks Mikael, makes sense, the mesh is split up; got it. This may solve my initial problem I think (makes me suspect the original author only used openMP and never tested the MPI part).

Thanks a lot!