Parallel reproducibility

Asked by James Maddison

I am investigating the development of discrete adjoints using DOLFIN (in collaboration with the developers of dolfin-adjoint). When rerunning the forward model (as part of a checkpointing scheme, used when integrating the adjoint) it is highly desirable that the forward model is bit-wise reproducible. I have had no issues when running models in serial, but occasional and unpredictable reproducibility issues are encountered in parallel, e.g. when running the following code on 5 processes:

from dolfin import *
import numpy.random as random
random.seed(0)

mesh = UnitSquare(10, 10)
space = FunctionSpace(mesh, "CG", 2)
test, trial = TestFunction(space), TrialFunction(space)
F = Function(space)
F.vector().set_local(random.random(F.vector().array().shape[0]))
F.vector().apply("insert")

ref = assemble(inner(test, F) * dx).array()

for i in range(100):
  comp = assemble(inner(test, F) * dx).array()
  err = abs(ref - comp).max()
  dolfin.info("%i: %.17e" % (i + 1, err))
  assert(err == 0.0)

Is there any way to guarantee exact parallel reproducibility?

Question information

Language:
English Edit question
Status:
Answered
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Garth Wells (garth-wells) said :
#1

My first suggestion would be to avoid using any numpy-related functions in your test. DOLFIN provides everything your need to subtract vectors and compute norms, etc. DOLFIN numpy support in parallel is flaky and introduces an extra layer in which things can go wrong.

I have seen some very small variations in output (in the last few digits) for different linear algebra backends in parallel. I suspect that this is due to things like the order in which off-process entries are added. Try testing both the PETSc and Trilinos backends.

Revision history for this message
James Maddison (jamesmadd) said :
#2

Thanks, I'm not currently able to test with Trilinos, but in any case removing numpy usage doesn't change the behaviour.

Revision history for this message
Lawrence Mitchell (wence) said :
#3

PETSc is not guaranteed to give bit-wise identical results for addition into vectors in parallel. Remember, addition of floating point numbers isn't associative. When more than two non-local processes assemble a dof which contributes to a process local entry in a vector effectively what happens is:

MPI_Irecv(...., MPI_ANY_SOURCE, ....);

MPI_Waitany(...);

local_val += just_received_value;

Imagine that two processes contribute to local_val. So there are two Irecvs posted for the two contributions. But the local process cannot guarantee (with this scheme) the order in which it processes the incoming messages (remember, not message overtaking in MPI). So if process 2 happened to send the contribution before process 3, the addition will be:

local_val += proc_2_val;

local_val += proc_3_val;

and vice versa. But this addition isn't associative, so you might end up with a different answer.

tl;dr. You've got to work really hard for bitwise reproducibility. PETSc doesn't do the necessary.

Can you help with this problem?

Provide an answer of your own, or ask James Maddison for more information if necessary.

To post a message you must log in.