# MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

I can run several test problems without issue on 1 node, for example:

python ns channel piso dt_division=10 supg_bool=True u_order=1 refinement_level=0

works just fine. It requires 512 time steps.

But as soon as I try to run it using MPI I get:

Time step 511 finished in 0.04 seconds, 99.8% done (t=0.499, T=0.5; 00:00:00 remaining).

-------

Process 0: Solving nonlinear variational problem.

Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).

Process 0: Newton iteration 1: r (abs) = 6.519e-04 (tol = 1.000e-05) r (rel) = 1.000e-02 (tol = 1.000e-05)

Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).

Process 0: Newton iteration 2: r (abs) = 6.519e-06 (tol = 1.000e-05) r (rel) = 1.000e-04 (tol = 1.000e-05)

Process 0: Newton solver finished in 2 iterations and 18 linear solver iterations.

Process 0: Solving linear system of size 64 x 64 (PETSc Krylov solver).

Process 0: Solving linear system oProcess 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Solving nonlinear variational problem.

Process 1: Evaluating at x = <Point x = 1 y = 0.5 z = 0>

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Solving nonlinear variational problem.

Process 3: Evaluating at x = <Point x = 1 y = 0.5 z = 0>

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Solving nonlinear variational problem.

Process 7: Evaluating at x = <Point x = 1 y = 0.5 z = 0>

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Solving nonlinear variational problem.

Process 4: Evaluating at x = <Point x = 1 y = 0.5 z = 0>

f size 128 x 128 (PETSc Krylov solver).

Process 0: Solving linear system of size 64 x 64 (PETSc Krylov solver).

Process 0: Solving nonlinear variational problem.

Process 0: Solving linear system of size 128 x 128 (PETSc Krylov solver).

Process 0: Newton iteration 1: r (abs) = 6.530e-08 (tol = 1.000e-05) r (rel) = 1.000e-02 (tol = 1.000e-05)

Process 0: Newton solver finished in 1 iterations and 10 linear solver iterations.

Process 0: Evaluating at x = <Point x = 1 y = 0.5 z = 0>

Traceback (most recent call last):

File "ns", line 217, in <module>

result = main(args)

File "ns", line 192, in main

u, p = solver.

File "/scratch/

self.

File "/scratch/

M = problem.

File "/scratch/

return self.uEval(u, 0, (1.0, 0.5))

File "/scratch/

return self.eval(func, point)[component]

File "/scratch/

M[i] = MPI.sum(M[i])/N

File "/scratch/

return _common.

Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

Traceback (most recent call last):

File "ns", line 217, in <module>

result = main(args)

File "ns", line 192, in main

u, p = solver.

File "/scratch/

self.

File "/scratch/

M = problem.

File "/scratch/

return self.uEval(u, 0, (1.0, 0.5))

File "/scratch/

return self.eval(func, point)[component]

File "/scratch/

M[i] = MPI.sum(M[i])/N

File "/scratch/

return _common.

Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

Traceback (most recent call last):

File "ns", line 217, in <module>

result = main(args)

File "ns", line 192, in main

u, p = solver.

File "/scratch/

self.

File "/scratch/

M = problem.

File "/scratch/

return self.uEval(u, 0, (1.0, 0.5))

File "/scratch/

return self.eval(func, point)[component]

File "/scratch/

M[i] = MPI.sum(M[i])/N

File "/scratch/

return _common.

Exception: MPI_Allreduce: MPI_ERR_BUFFER: invalid buffer pointer

Any ideas? A simple test script runs fine with MPI.

## Question information

- Language:
- English Edit question

- Status:
- Solved

- For:
- DOLFIN Edit question

- Assignee:
- No assignee Edit question

- Solved by:
- Damiaan

- Solved:
- 2013-03-18

- Last query:
- 2013-03-18

- Last reply:
- 2013-03-18

## This question was reopened

Damiaan (dhabets) said : | #1 |

Any suggestions?

Kent-Andre Mardal (kent-and) said : | #2 |

Mikael has made some tools for point evaluation in parallel: see

http://

Maybe these are better.

I am not able to test the wip code on a cluster now since our local cluster

is shut down.

Hopefully, we have access to a new cluster on April 1.

Kent

On 14 March 2013 18:21, Damiaan <email address hidden>wrote:

> Question #223746 on DOLFIN changed:

> https:/

>

> Damiaan posted a new comment:

> Any suggestions?

>

> --

> You received this question notification because you are a member of

> DOLFIN Team, which is an answer contact for DOLFIN.

>

Damiaan (dhabets) said : | #3 |

Kent, that Probe.py script does this for 1 processor:

500 of 500 probes found on processor 0

for two processors:

Process 0: Number of global vertices: 1331

Process 0: Number of global cells: 6000

Traceback (most recent call last):

File "Probe.py", line 234, in <module>

Traceback (most recent call last):

File "Probe.py", line 234, in <module>

sl = StructuredGrid(N, origin, tangents, dL, V)

sl = StructuredGrid(N, origin, tangents, dL, V)

File "Probe.py", line 140, in __init__

File "Probe.py", line 140, in __init__

Probes.

Probes.

File "Probe.py", line 106, in __init__

File "Probe.py", line 106, in __init__

self.append((i, Probe(array(p), V, max_probes=

self.append((i, Probe(array(p), V, max_probes=

File "Probe.py", line 26, in __init__

File "Probe.py", line 26, in __init__

raise RuntimeError('Probe not found on processor')

raise RuntimeError('Probe not found on processor')

RuntimeError: RuntimeError: Probe not found on processor

Probe not found on processor

Which means?

Hi,

I think you're probably not using the code correctly. Could you please send a bit more of the script you're running?

Best regards

Mikael

Den Mar 14, 2013 kl. 7:46 PM skrev Damiaan:

> Question #223746 on DOLFIN changed:

> https:/

>

> Status: Answered => Open

>

> Damiaan is still having a problem:

> Kent, that Probe.py script does this for 1 processor:

>

> 500 of 500 probes found on processor 0

>

> for two processors:

>

> Process 0: Number of global vertices: 1331

> Process 0: Number of global cells: 6000

> Traceback (most recent call last):

> File "Probe.py", line 234, in <module>

> Traceback (most recent call last):

> File "Probe.py", line 234, in <module>

> sl = StructuredGrid(N, origin, tangents, dL, V)

> sl = StructuredGrid(N, origin, tangents, dL, V)

> File "Probe.py", line 140, in __init__

> File "Probe.py", line 140, in __init__

> Probes.

> Probes.

> File "Probe.py", line 106, in __init__

> File "Probe.py", line 106, in __init__

> self.append((i, Probe(array(p), V, max_probes=

> self.append((i, Probe(array(p), V, max_probes=

> File "Probe.py", line 26, in __init__

> File "Probe.py", line 26, in __init__

> raise RuntimeError('Probe not found on processor')

> raise RuntimeError('Probe not found on processor')

> RuntimeError: RuntimeError: Probe not found on processor

> Probe not found on processor

>

> Which means?

>

> --

> You received this question notification because you are a member of

> DOLFIN Team, which is an answer contact for DOLFIN.

Damiaan (dhabets) said : | #5 |

Hi Mikael,

are you referring to the wip code or Probe.py? If you have a suggestion for test code, then please let me know.

thanks,

Damiaan

I was referring to the Probe code, I don't know what the other code is. If

you want to run in parallel and evaluate one single point in the mesh many

times during a simulation you can do this:

from cbc.cfd.tools.Probe import Probes

# Probe three locations in V in a 3D mesh

V = VectorFunctionS

x = array([[0.5, 0.5, 0.5], [0.2, 0.3, 0.4], [0.8, 0.9, 1.0]])

p = Probes(x, V)

u = interpolate(

probe

p(v0) # This makes the evaluation and typically goes inside the time loop

p(v0) # once more

p.dump("testing") # Finished with simulations. Dump all results

print p.tonumpy(0) # Alternative

p.dump creates three files called testing_0.probe, testing_1.probe,

testing_2.probe. Look at them using from numpy import load and then p0 =

load("testing_

p.tonumpy(0) returns all three values of u for the first probe evaluation

on process 0.

To install do bzr branch lp:cbcpdesys. Otherwise you can use the code you

find in Probe.py as is.

There are some tests at the bottom of Probe.py. You can for example probe

an entire 2D plane of a 3D mesh and then dump results for the plane to vtk.

Best regards

Mikael

On 14 March 2013 20:26, Damiaan <email address hidden>wrote:

> Question #223746 on DOLFIN changed:

> https:/

>

> Status: Answered => Open

>

> Damiaan is still having a problem:

> Hi Mikael,

>

> are you referring to the wip code or Probe.py? If you have a

> suggestion for test code, then please let me know.

>

> thanks,

> Damiaan

>

> --

> You received this question notification because you are a member of

> DOLFIN Team, which is an answer contact for DOLFIN.

>

Damiaan (dhabets) said : | #7 |

Thanks Mikael, I installed it and tried this:

-----

from dolfin import *

from cbc.cfd.tools.Probe import *

# Print log messages only from the root process in parallel

parameters[

parameters[

mesh = UnitCubeMesh(10, 10, 10)

V = FunctionSpace(mesh, 'CG', 1)

Vv = VectorFunctionS

W = V * Vv

# Just create some random data to be used for probing

u0 = interpolate(

y0 = interpolate(

z0 = interpolate(

u1 = interpolate(

v0 = interpolate(

w0 = interpolate(

# Test StructuredGrid

origin = [0.4, 0.4, 0.5] # origin of slice

tangents = [[1, 0, 1], [0, 1, 1]] # directional tangent directions (scaled in StructuredGrid)

dL = [0.2, 0.3] # extent of slice in both directions

N = [25, 20] # number of points in each direction

#### Create a range of probes for a UnitSquare

N = 5

xx = linspace(0.25, 0.75, N)

xx = xx.repeat(

yy = linspace(0.25, 0.75, N)

yy = yy.repeat(

x = zeros((N*N, 3))

for i in range(N):

for j in range(N):

x[i*N + j, 0 ] = xx[i, j] # x-value

x[i*N + j, 1 ] = yy[i, j] # y-value

probesV = Probes(x, V, 1000, use_python=True)

-----

runs fine for number of processors set to 1, 2, 4, but but when using 3, 5, 6, etc. I had to comment out line in 142 in Probe.py due to IndexError: list index out of range :

# self.value_size = self[0]

No C++ Probe

No C++ Probe

No C++ Probe

Process 0: Number of global vertices: 1331

Process 0: Number of global cells: 6000

Traceback (most recent call last):

File "testme.py", line 41, in <module>

0 of 25 probes found on processor 1 <---- why is this 0?

14 of 25 probes found on processor 2

12 of 25 probes found on processor 0

probesV = Probes(x, V, 1000, use_python=True)

File "/scratch/

self.value_size = self[0]

IndexError: list index out of range

I'm probably missing something very obvious and trivial here, but why would it assign 0 probes to processor 1? Also, why is it doing 14+12 = 26 probes?

thanks,

Damiaan

Hi Damiaan

Den Mar 18, 2013 kl. 8:51 PM skrev Damiaan:

> Question #223746 on DOLFIN changed:

> https:/

>

> Status: Answered => Open

>

> Damiaan is still having a problem:

> Thanks Mikael, I installed it and tried this:

>

> -----

> from dolfin import *

> from cbc.cfd.tools.Probe import *

>

> # Print log messages only from the root process in parallel

> parameters[

> parameters[

>

> mesh = UnitCubeMesh(10, 10, 10)

> V = FunctionSpace(mesh, 'CG', 1)

> Vv = VectorFunctionS

> W = V * Vv

>

> # Just create some random data to be used for probing

> u0 = interpolate(

> y0 = interpolate(

> z0 = interpolate(

> u1 = interpolate(

> v0 = interpolate(

> w0 = interpolate(

>

> # Test StructuredGrid

> origin = [0.4, 0.4, 0.5] # origin of slice

> tangents = [[1, 0, 1], [0, 1, 1]] # directional tangent directions (scaled in StructuredGrid)

> dL = [0.2, 0.3] # extent of slice in both directions

> N = [25, 20] # number of points in each direction

>

> #### Create a range of probes for a UnitSquare

> N = 5

> xx = linspace(0.25, 0.75, N)

> xx = xx.repeat(

> yy = linspace(0.25, 0.75, N)

> yy = yy.repeat(

> x = zeros((N*N, 3))

> for i in range(N):

> for j in range(N):

> x[i*N + j, 0 ] = xx[i, j] # x-value

> x[i*N + j, 1 ] = yy[i, j] # y-value

>

> probesV = Probes(x, V, 1000, use_python=True)

> -----

>

> runs fine for number of processors set to 1, 2, 4, but but when using 3,

> 5, 6, etc. I had to comment out line in 142 in Probe.py due to

> IndexError: list index out of range :

>

> # self.value_size = self[0]

>

This is the number of spaces in the (mixed) function space you're probing. Change to

self.value_size = V.num_sub_spaces() if V.num_sub_spaces() > 0 else 1

and it should work.

> No C++ Probe

> No C++ Probe

> No C++ Probe

> Process 0: Number of global vertices: 1331

> Process 0: Number of global cells: 6000

> Traceback (most recent call last):

> File "testme.py", line 41, in <module>

> 0 of 25 probes found on processor 1 <---- why is this 0?

> 14 of 25 probes found on processor 2

> 12 of 25 probes found on processor 0

> probesV = Probes(x, V, 1000, use_python=True)

> File "/scratch/

> self.value_size = self[0]

> IndexError: list index out of range

You are creating 25 probes on a 5 * 5 structured grid. There are 26 probes because the same point is found on two processes. It is not an error.

The probe code is not very stable yet and much of what you see in Probe.py is being moved to C++. Let me know if it still doesn't work for your problem.

Best regards

Mikael

>

>

> I'm probably missing something very obvious and trivial here, but why would it assign 0 probes to processor 1? Also, why is it doing 14+12 = 26 probes?

>

> thanks,

> Damiaan

>

> --

> You received this question notification because you are a member of

> DOLFIN Team, which is an answer contact for DOLFIN.

Damiaan (dhabets) said : | #9 |

Thanks Mikael,

ok, that works fine, but, I still don't understand why there's processors without any probes assigned:

No C++ Probe

No C++ Probe

No C++ Probe

No C++ Probe

No C++ Probe

Process 0: Number of global vertices: 1331

Process 0: Number of global cells: 6000

0 of 25 probes found on processor 4

0 of 25 probes found on processor 1

13 of 25 probes found on processor 3

12 of 25 probes found on processor 0

0 of 25 probes found on processor 2

I would expect it to place 5 on each? Or is the above expected behavior?

thanks,

Damiaan

On Monday, 18 March 2013, Damiaan wrote:

> Question #223746 on DOLFIN changed:

> https:/

>

> Status: Answered => Open

>

> Damiaan is still having a problem:

> Thanks Mikael,

>

> ok, that works fine, but, I still don't understand why there's

> processors without any probes assigned:

>

> No C++ Probe

> No C++ Probe

> No C++ Probe

> No C++ Probe

> No C++ Probe

> Process 0: Number of global vertices: 1331

> Process 0: Number of global cells: 6000

> 0 of 25 probes found on processor 4

> 0 of 25 probes found on processor 1

> 13 of 25 probes found on processor 3

> 12 of 25 probes found on processor 0

> 0 of 25 probes found on processor 2

>

> I would expect it to place 5 on each? Or is the above expected behavior?

This is the expected behavior. There are 25 probes, all with different

coordinates. You don't know in advance which Process they live on. That

depends on how the mesh is distributed. The probes are not assigned to a

Process, they are located.

Mikael

>

> thanks,

> Damiaan

>

> --

> You received this question notification because you are a member of

> DOLFIN Team, which is an answer contact for DOLFIN.

>

Damiaan (dhabets) said : | #11 |

Thanks Mikael, makes sense, the mesh is split up; got it. This may solve my initial problem I think (makes me suspect the original author only used openMP and never tested the MPI part).

Thanks a lot!