Parallel reproducibility
I am investigating the development of discrete adjoints using DOLFIN (in collaboration with the developers of dolfin-adjoint). When rerunning the forward model (as part of a checkpointing scheme, used when integrating the adjoint) it is highly desirable that the forward model is bit-wise reproducible. I have had no issues when running models in serial, but occasional and unpredictable reproducibility issues are encountered in parallel, e.g. when running the following code on 5 processes:
from dolfin import *
import numpy.random as random
random.seed(0)
mesh = UnitSquare(10, 10)
space = FunctionSpace(mesh, "CG", 2)
test, trial = TestFunction(
F = Function(space)
F.vector(
F.vector(
ref = assemble(
for i in range(100):
comp = assemble(
err = abs(ref - comp).max()
dolfin.info("%i: %.17e" % (i + 1, err))
assert(err == 0.0)
Is there any way to guarantee exact parallel reproducibility?
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- DOLFIN Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask James Maddison for more information if necessary.