0.3.8 init state machine broken?

Asked by Alex Nekrasov on 2009-05-27

I've found 3 and fixed (patched, more like) 2 problems. My question is - has this been found and fixed already in later versions?


1) The test below makes init dump core and restart, triggered by an assert in job.c::job_chile_reaper(). The PROCESS_POST_START case expects JOB_POST_START, but gets JOB_RUNNING.


  exit 0
end script

post-start script
 sleep 4
end script

As far as I could see JOB_RUNNING was legitimate, so I updated the assert condition to allow it, and respawning worked.

2) after the change above, stopping this job produced another core. This time PROCESS_POST_START got JOB_WAITING.

I patched this as well by allowing JOB_WAITING and setting state and stop to FALSE in this case.

I'm not sure the second fix is a good one.

I tried to compare against 0.5, but it's been re-written in C++, so there's no simple diff.

Question information

English Edit question
upstart Edit question
No assignee Edit question
Solved by:
Alex Nekrasov
Last query:
Last reply:
Launchpad Janitor (janitor) said : #1

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

This part of the code has been extensively re-written in later releases, so it's entirely possible the bug is fixed and there are all new bugs waiting to be found ;)

0.5 is not written in C++, it's still plain old C.

I suspect the bug is that the post start script ends *after* the running one? Technically it should remain in the post-start state for that, but it transitions out too early instead.

Alex Nekrasov (ennnot) said : #4

something's wrong. I've downloaded what I though was 0.5 code and looked in job.c to find the new child reaper and co. I saw classes. Ok, so may be I got some different version. Will need to check.

As to the issue, the same thing happens with other sections that can run in parallel with main. I added my attempt at fixing this to the bug report you opened.