Missing variable value when checkpointing enabled

Asked by Martin Sandve Alnæs

When I add the line

    adj_checkpointing(strategy='multistage',
                      steps=10,
                      snaps_on_disk=4,
                      snaps_in_ram=2,
                      verbose=True)

to my time-dependent optimization code, which works fine without checkpointing, I get

...
====== Revolve: Replay from equation 64 (first equation of timestep 9) to equation 70 (last equation of timestep 9). ======
Revolve: No need to replay equation 64.
Revolve: Checkpoint equation 64 in memory.
Revolve: No need to replay equation 65.
Revolve: No need to replay equation 66.
Revolve: No need to replay equation 67.
Revolve: No need to replay equation 68.
Revolve: No need to replay equation 69.
Revolve: No need to replay equation 70.
Revolve: Solving adjoint equation 70.
Warning: got zero RHS for the solve associated with variable u0:9:0:Adjoint[4dd46098ff23038846c6121aff3a45d0]
Revolve: Solving adjoint equation 69.
Solving linear system of size 1111 x 1111 (PETSc Krylov solver).
Revolve: Solving adjoint equation 68.
Solving linear system of size 8442 x 8442 (PETSc Krylov solver).
Revolve: Solving adjoint equation 67.
Revolve: Solving adjoint equation 66.
Revolve: Solving adjoint equation 65.
Revolve: Solving adjoint equation 64.
Revolve: Delete checkpoint equation 64.
====== Revolve: Replay from equation 50 (first equation of timestep 7) to equation 56 (last equation of timestep 7) =======
Revolve: Replaying equation 50.
Revolve: Replaying equation 51.
Revolve: Replaying equation 52.
Revolve: Replaying equation 53.
Revolve: Replaying equation 54.
Traceback (most recent call last):
...
    libadjoint.exceptions.LibadjointErrorNeedValue: Need a value for variable u0:6:0:Forward, but don't have one.

After this I see checkpoints for u0:0 through u0:5 but no u0:6 on disk:

martinal@...$ ls u0\:*
u0:0:1:Forward.xml u0:1:0:Forward.xml u0:2:0:Forward.xml u0:3:0:Forward.xml u0:4:0:Forward.xml u0:5:0:Forward.xml

Do I need to set snaps_on_disk and snaps_in_ram differently?
I'm sort of expecting dolfin-adjoint to pick up invalid parameters
as it has been pretty good at that so far :)

Question information

Language:
English Edit question
Status:
Solved
For:
dolfin-adjoint Edit question
Assignee:
No assignee Edit question
Solved by:
Martin Sandve Alnæs
Solved:
Last query:
Last reply:
Revision history for this message
Martin Sandve Alnæs (martinal) said :
#1

I used the latest versions of fenics and dolfin-adjoint.

Revision history for this message
Patrick Farrell (pefarrell) said :
#2

Hi Martin,

a) Can I get the code, to reproduce the problem?

b) Simon is the master of all checkpointing, and I think he's on holidays in Germany until next week.

Revision history for this message
Martin Sandve Alnæs (martinal) said :
#3

Emailed it.

Revision history for this message
Simon Funke (simon-funke) said :
#4

Ill be back on Monday and will have a look at it.

Revision history for this message
Simon Funke (simon-funke) said :
#5

I can now reproduce the error. Interestingly, the checkpointing works
in a non-optimisation run (i.e. only forward and adjoint run).
Ill look into it more detailed now.

2013/1/2 Martin Sandve Alnæs <email address hidden>:
> New question #218144 on dolfin-adjoint:
> https://answers.launchpad.net/dolfin-adjoint/+question/218144
>
> When I add the line
>
> adj_checkpointing(strategy='multistage',
> steps=10,
> snaps_on_disk=4,
> snaps_in_ram=2,
> verbose=True)
>
> to my time-dependent optimization code, which works fine without checkpointing, I get
>
> ...
> ====== Revolve: Replay from equation 64 (first equation of timestep 9) to equation 70 (last equation of timestep 9). ======
> Revolve: No need to replay equation 64.
> Revolve: Checkpoint equation 64 in memory.
> Revolve: No need to replay equation 65.
> Revolve: No need to replay equation 66.
> Revolve: No need to replay equation 67.
> Revolve: No need to replay equation 68.
> Revolve: No need to replay equation 69.
> Revolve: No need to replay equation 70.
> Revolve: Solving adjoint equation 70.
> Warning: got zero RHS for the solve associated with variable u0:9:0:Adjoint[4dd46098ff23038846c6121aff3a45d0]
> Revolve: Solving adjoint equation 69.
> Solving linear system of size 1111 x 1111 (PETSc Krylov solver).
> Revolve: Solving adjoint equation 68.
> Solving linear system of size 8442 x 8442 (PETSc Krylov solver).
> Revolve: Solving adjoint equation 67.
> Revolve: Solving adjoint equation 66.
> Revolve: Solving adjoint equation 65.
> Revolve: Solving adjoint equation 64.
> Revolve: Delete checkpoint equation 64.
> ====== Revolve: Replay from equation 50 (first equation of timestep 7) to equation 56 (last equation of timestep 7) =======
> Revolve: Replaying equation 50.
> Revolve: Replaying equation 51.
> Revolve: Replaying equation 52.
> Revolve: Replaying equation 53.
> Revolve: Replaying equation 54.
> Traceback (most recent call last):
> ...
> libadjoint.exceptions.LibadjointErrorNeedValue: Need a value for variable u0:6:0:Forward, but don't have one.
>
>
> After this I see checkpoints for u0:0 through u0:5 but no u0:6 on disk:
>
> martinal@...$ ls u0\:*
> u0:0:1:Forward.xml u0:1:0:Forward.xml u0:2:0:Forward.xml u0:3:0:Forward.xml u0:4:0:Forward.xml u0:5:0:Forward.xml
>
>
> Do I need to set snaps_on_disk and snaps_in_ram differently?
> I'm sort of expecting dolfin-adjoint to pick up invalid parameters
> as it has been pretty good at that so far :)
>
>
> --
> You received this question notification because you are a member of
> libadjoint developers, which is an answer contact for dolfin-adjoint.

--
Simon Wolfgang Funke

Postdoctoral Research Associate
Imperial College London
Applied Modelling and Computation Group
<email address hidden>

Revision history for this message
Simon Funke (simon-funke) said :
#6

Why did launchpad decide to mark this question as "answered"?

Revision history for this message
Simon Funke (simon-funke) said :
#7

This issue is now fixed in the most recent versions of libadjoint and dolfin-adjoint.

Revision history for this message
Martin Sandve Alnæs (martinal) said :
#8

Great, thanks!