instances ignoring or not getting dhcp-relay offer

Asked by Eric Hankins

Running a cluster of 4 compute hosts on version 2013.1.4

Config is linuxbridge w/ provider networks

We suffered a power failure recently and while everything seemed to come back properly, we've got some very puzzling issues with getting IP assignments to instances -- DHCP always times out on the instance

I boot an instance, verifying that the linuxbridge and dhcp agents are plumbing everything they're supposed to be.

I track down the instance's tap interface:

# virsh domiflist 16
Interface Type Source Model MAC
-------------------------------------------------------
tapccac32a4-da bridge brq2f7a66d9-15 virtio fa:16:3e:dc:71:32

Verify that it is indeed attached to the bridge via 'brctl show':

brq2f7a66d9-15 8000.06a99ca1d46f no bond0.3016
       tap7994e5e9-df
       tapccac32a4-da

I then sniff the tap interface to see what's going on:

# tshark -i tapccac32a4-da
Capturing on tapccac32a4-da
*snip*

  7.815685 0.0.0.0 -> 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0xfa2eb43a
  7.815697 0.0.0.0 -> 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0xfa2eb43a
  7.815951 10.127.16.30 -> 10.127.16.51 DHCP 357 DHCP Offer - Transaction ID 0xfa2eb43a
*snip*
 21.419885 0.0.0.0 -> 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0xfa2eb43a
 21.419897 0.0.0.0 -> 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0xfa2eb43a
 21.420102 10.127.16.30 -> 10.127.16.51 DHCP 357 DHCP Offer - Transaction ID 0xfa2eb43a

Everything looks correct! Instance simply ignores the offer! I mean, the instance has to be getting it if I see it on the tap interface right?

cloud-init start-local running: Wed, 26 Mar 2014 22:35:34 +0000. up 1.92 seconds
no instance data found in start-local
cloud-init-nonet waiting 120 seconds for a network device.
cloud-init-nonet gave up waiting for a network device.
ci-info: lo : 1 127.0.0.1 255.0.0.0 .
ci-info: eth0 : 1 . . fa:16:3e:dc:71:32
route_info failed

Any suggestions greatly appreciated...

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Eric Hankins
Solved:
Last query:
Last reply:
Revision history for this message
Eric Hankins (ehankins) said :
#1

So I ended up chasing my tail due to the lack of:

iptables -A POSTROUTING -t mangle -p udp --dport 68 -j CHECKSUM --checksum-fill

Which I think may have been due to the transition from 2013.1.2 to 2013.1.4 -- need to check the release notes or commit log

I'm back to my original issue of quantum-dhcp-agent not starting (or not being told to start by quantum-server) on my compute nodes...will post about that separately.