Instances don't get an IP from DHCP (Quantum, OVS, multi-node computes)

Asked by Emilien Macchi

Hi,

Since 2 weeks, I've been looking for a solution with a Quantum + OVS issue.

The situation :

2 servers :

Essex-1 - Eth0 : 10.68.1.40 - ETH1 : connected to br-int OVS bridge
-> Glance, Nova-*, Keystone, Horizon, Quantum-Server, KVM, OVS, Quantum-Agent
-> nova.conf : https://github.com/EmilienM/doc-openstack/blob/master/Configuration%20Files/Essex-1/nova.conf

Essex-2 - Eth0 : 10.68.1.45 - ETH1 : connected to br-int OVS bridge
-> nova-compute, KVM, Quantum-Agent
-> nova.conf : https://github.com/EmilienM/doc-openstack/blob/master/Configuration%20Files/Essex-1/nova.conf

I've followed http://openvswitch.org/openstack/documentation/ and http://docs.openstack.org/trunk/openstack-network/admin/content/

I've created th network with :
nova-manage network create --label=mysql --fixed_range_v4=192.168.113.0/24 --project_id=d2f0dc48a8944c6e96cb88c772376f06 --bridge=br-int --bridge_interface=eth1

What's not working :
-> When I create an instance from dashboard, the VM does not get an IP from DHCP server (hosted on ESSEX-1).
You can see the logs here : http://paste.openstack.org/show/17997/

What I did to investigate :
-> dhcpdump -i br-int : I can see DHCPDISCOVER on both servers (without answers)
-> ps -ef | grep dnsmasq :
nobody 6564 1 0 14:12 ? 00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-gw-0f427a46-3f.pid --listen-address=192.168.113.1 --except-interface=lo --dhcp-range=192.168.113.2,static,120s --dhcp-lease-max=256 --dhcp-hostsfile=/var/lib/nova/networks/nova-gw-0f427a46-3f.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
root 6565 6564 0 14:12 ? 00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --conf-file= --domain=novalocal --pid-file=/var/lib/nova/networks/nova-gw-0f427a46-3f.pid --listen-address=192.168.113.1 --except-interface=lo --dhcp-range=192.168.113.2,static,120s --dhcp-lease-max=256 --dhcp-hostsfile=/var/lib/nova/networks/nova-gw-0f427a46-3f.conf --dhcp-script=/usr/bin/nova-dhcpbridge --leasefile-ro
root 16536 6192 0 14:40 pts/14 00:00:00 grep --color=auto dnsm

Is my nova.conf correct ?
What's wrong with my configuration ?
Is there a problem with DNSMASQ ?

I would apreciate any idea !

Regards

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Emilien Macchi
Solved:
Last query:
Last reply:
Revision history for this message
dan wendlandt (danwent) said :
#1

Hi Emilian,

Thanks for the detailed report. Two possible causes of the issue jump out to me:

First, for your nova-compute, I don't see you setting the correct vif-driver flags:

libvirt_ovs_bridge=br-int
libvirt_vif_type=ethernet
libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

(see "Nova Compute Node Configuration: http://openvswitch.org/openstack/documentation/)

Second, Quantum does not pay attention to the --bridge or --bridge_interface options to "nova-manage network create". In theory, this is harmless, but perhaps you're expecting that adding this parameter is linking br-int to the physical network (this is not the case). You will have to add the NIC to br-int as described in the same section of the OVS documentation. For example:

ovs-vsctl add-port br-int eth1

Other than that, I good trick to debug connectivity issues is tcpdump in different locations along the path. This includes tap devices for VMs, physical NICs carrying the traffic to the physical network, and the gw-* devices that dnsmasq binds to.

Btw, it looks like you're using Essex, so VLANS is the your best option. Using OVS in tunnel mode would actually be much easier as you don't need to deal with VLANs on the physical network, but there are a couple bugs in the Essex release for this feature, so until we back-port them, we have not been publicly documenting tunnel mode.

Revision history for this message
Emilien Macchi (emilienm) said :
#2

Hi Dan,

Thank's for quick answer.

Here my nova-compute.conf for both servers :

--libvirt_type=kvm
--libvirt_use_virtio_for_bridges=true
--libvirt_ovs_bridge=br-int
--libvirt_vif_type=ethernet
--libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

And of course, I did "ovs-vsctl add-port br-int eth1" on both servers.

I use Essex with Ubuntu 12.04 up to date.

If I understand, I have to investigate with tcpdump on all interfaces.

So, what you mean, is it's not possible yet to use multi node compute architecture with Quantum - OVS with VLANS ?

Thank's for your help :-)

Regards

Revision history for this message
dan wendlandt (danwent) said :
#3

Oh, sorry, I was unclear. Its definitely possible to use OVS in a multi-node setup with VLANs. All I was saying is that with Essex there are some bugs with using OVS on multi-node with Tunneling (an alternative to VLANs), so using VLANs is your best option.

In the above setup, I would try tcpdumping on interface "gw-0f427a46-3f" to see if you see DHCP requests + responses. If you do, then I would tcpdump in the tap device associated with the VM to see if it is getting the reply. If it is not, I would then TCP dump on both of the physical NICs to see if you see the packet leaving the first physical server and arriving at the second.

This will help you figure out whether this is a guest issue, physical network issue, or OVS issue.

Revision history for this message
Emilien Macchi (emilienm) said :
#4

Tcpdump of gw-0f427a46-3f (gateway of my private network) :

http://paste.openstack.org/show/18007/

Note that fa:16:3e:42:e8:d0 is my instance.

Tcpdump of TAP device associated with the instance :

http://paste.openstack.org/show/18008/

Revision history for this message
askstack (askstack) said :
#5

I have the same problem as Emilien. tcpdump showes client requesting IP but not getting an answer. dnsmasq is running on the controller.
I did not do "ovs-vsctl add-port br-int eth1"

/var/log/message

May 18 15:49:25 core01 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- --may-exist add-port br-int tap926d429e-53 -- set Interface tap926d429e-53 external-ids:iface-id=926d429e-5313-4d3b-b93a-afe0d99a98b1 -- set Interface tap926d429e-53 external-ids:iface-status=active -- set Interface tap926d429e-53 external-ids:attached-mac=fa:16:3e:2d:bb:a8 -- set Interface tap926d429e-53 external-ids:vm-uuid=b4f94e0f-77bb-4aff-8131-e1f97caf595f
May 18 15:49:25 core01 kernel: [92915.553134] device tap926d429e-53 entered promiscuous mode
May 18 15:49:25 core01 nova-compute[27734]: 2012-05-18 15:49:25 INFO nova.virt.libvirt.connection [req-167a5e21-2e24-4cb4-8395-13deb6f66155 aa46ceb969f9411494aa2fba527c19e7 76c41e7de2d0408489e94f8adb5b28ee] [instance: b4f94e0f-77bb-4aff-8131-e1f97caf595f] Creating image
May 18 15:49:26 core01 kernel: [92916.276732] ADDRCONF(NETDEV_CHANGE): tap926d429e-53: link becomes ready
May 18 15:49:27 core01 nova-compute[27734]: 2012-05-18 15:49:27 INFO nova.virt.libvirt.connection [-] [instance: b4f94e0f-77bb-4aff-8131-e1f97caf595f] Instance spawned successfully.
May 18 15:49:27 core01 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Port tap926d429e-53 tag=4
May 18 15:49:30 core01 ntpd[1054]: Listen normally on 35 tap926d429e-53 fe80::e092:c5ff:fe99:20f UDP 123
May 18 15:49:30 core01 ntpd[1054]: peers refreshed
May 18 15:49:44 core01 dnsmasq-dhcp[32577]: DHCPDISCOVER(gw-4b4f2f7e-7c) fa:16:3e:2d:bb:a8 no address available
May 18 15:49:49 core01 dnsmasq-dhcp[32577]: DHCPDISCOVER(gw-4b4f2f7e-7c) fa:16:3e:2d:bb:a8 no address available
May 18 15:49:53 core01 dnsmasq-dhcp[32577]: DHCPDISCOVER(gw-4b4f2f7e-7c) fa:16:3e:2d:bb:a8 no address available

Revision history for this message
dan wendlandt (danwent) said :
#6

These two issues do seem similar. Both suggest that OVS is sending the DHCP request to dnsnmasq, but dnsmasq does not respond. Thus, it doesn't seem like a problem with OVS forwarding. Instead, it appears like QuantumManager in nova may not be correctly populating the host file for dnsmasq.

Emilien, can you post the content of the dnsmasq hosts file? Based on the output above it would be: /var/lib/nova/networks/nova-gw-0f427a46-3f.conf

Also, can you post the content of the fixed IPs table? The following should work if you insert your DB username and password:

mysql -u<user> -p<password> nova -e "Select address,instance_id,allocated from fixed_ips"

Revision history for this message
Emilien Macchi (emilienm) said :
#7

Dan,

That's strange :

/var/lib/nova/networks/nova-gw-0f427a46-3f.conf is empty.

And MySQL result :

http://pastebin.com/QF7cgCcg

Revision history for this message
askstack (askstack) said :
#8

My /var/lib/nova/networks/nova-gw-4b4f2f7e-7c.conf is empty.

http://paste.openstack.org/show/18026/

Revision history for this message
Emilien Macchi (emilienm) said :
#9

Ok, so we have the same problem.

Someone can confirm that my nova.conf (here :https://github.com/EmilienM/doc-openstack/blob/master/Configuration%20Files/Essex-1/nova.conf) is correct :

--flat_network_bridge=br-int

or maybe I need to change it withe something else ?

It's really strange, my configuration is basic : Dual-node with Quantum - OVS - KVM

Revision history for this message
Hyunsun Moon (hyunsun-moon) said :
#10

Hello,

Have you guys set trunk mode on your physical switch? If not, try it.
I also had to change VLAN_MIN / VLAN_MAX values on quantum/plugins/openvswitch/ovs_quantum_plugin.py to the values defined on my switch.
It defaults to VLAN_MIN=1 / VLAN_MAX=4094. Guess these values would be set as a flag.

Hope this helps.

Revision history for this message
dan wendlandt (danwent) said :
#11

there's no need to set --flat_network_bridge to "br-int", but it shouldn't hurt anything.

Based on the database tables you posted, it seems like there is a single running VM with the following IP: 192.168.113.5 . Is that correct, or are there other VMs running?

Revision history for this message
Emilien Macchi (emilienm) said :
#12

I have one VM in my infrastructure. ( It's not a big cloud yet :-) )

Revision history for this message
askstack (askstack) said :
#13

Hyunsun Moon
I am using a trunk port, my vlan tag is 777. It is in the range of 1-4094. How did you change the
"VLAN_MIN=1 / VLAN_MAX=4094" settings?

[root@core01 nova]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.20.0.250 0.0.0.0 UG 0 0 0 em1.777
10.0.0.0 0.0.0.0 255.255.255.0 U 0 0 0 gw-4b4f2f7e-7c
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 em2
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 em1.777
172.20.0.0 0.0.0.0 255.255.255.0 U 0 0 0 em1.777

Revision history for this message
Hyunsun Moon (hyunsun-moon) said :
#14

askask,

in your '/var/log/message' log above, the actual VLAN ID assigned to your tap device is 4 I guess.
"May 18 15:49:27 core01 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Port tap926d429e-53 tag=4"
If VLAN 4 is not defined on your physical switch, as far as I know, any packets from the tap device could be discared.
Check what VLAN ID is actually assigned to your tap device using 'ovs-vsctl show', tag number is VLAN ID.

Following page might help.
http://openvswitch.org/support/config-cookbooks/vlan-configuration-cookbook/

VLAN_MIN / VLAN_MAX is hard coded in quantum/plugins/openvswitch/ovs_quantum_plugin.py
So I modified the source code.
When I change the values to random numbers which I haven't defined on my physical switch, VM network failed.
Don't forget to restart quantum service once you change the code.

Have you tried VLAN mode first?
I guess VLAN_MIN is the same value with the --vlan_start flag on your nova.conf

Hope this helps.

Revision history for this message
askstack (askstack) said :
#15

Hyunsun Moon
Thanks for your help. I do want to use the VLAN mode. I just didn't find a good tutorial for it.

By looking at the /var/log/message, it appears nova+quantum are doing the same thing as the cookbook.
May 22 18:47:39 core01 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- --may-exist add-port br-int tapb8dc6e11-26 -- set Interface tapb8dc6e11-26 external-ids:iface-id=b8dc6e11-2640-4950-b383-1df2645dd73c -- set Interface tapb8dc6e11-26 external-ids:iface-status=active -- set Interface tapb8dc6e11-26 external-ids:attached-mac=fa:16:3e:14:26:c7 -- set Interface tapb8dc6e11-26 external-ids:vm-uuid=b0e42af1-8376-4d36-979d-aafbfc47fb57
May 22 18:47:40 core01 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Port tapb8dc6e11-26 tag=10

Probably I am stll doing something wrong with the trunk port or I need to
"Networks created using other Nova network managers are not compatible with Quantum. When using Quantum with nova, you must start with a fresh Nova database to make sure that no previous networks created using other network managers exist. ", according the admin guide.

Revision history for this message
askstack (askstack) said :
#16

I droped the nova db and created a new one, did a "nova-manage db sync". After that I was able to launch VMs with dhcp IP addresses.
Thanks Hyunsun Moon and dan wendlandt

Revision history for this message
Emilien Macchi (emilienm) said :
#17

@astack :

Can you give more details ?

In multi-node compute architecture ? With OVS ?

Revision history for this message
askstack (askstack) said :
#18

Emilien
This is only a single node setup with quantum+ovs. I will be working on a second node though.

Revision history for this message
Emilien Macchi (emilienm) said :
#19

Hi Askstack,

Can you share with me your configurations files (nova.conf, nova-compute.conf, OVS commands you did)

Thank's

Revision history for this message
askstack (askstack) said :
#20

Emilien
I have a second node working now. I don't have nova-compute.conf. Both hosts are using nova.conf.
http://paste.openstack.org/show/18157/

Revision history for this message
Emilien Macchi (emilienm) said :
#21

Thank's askstack, I'll have a look next week, and report here if my issue will be solved.

Revision history for this message
Emilien Macchi (emilienm) said :
#22

Askstack, can you please show me your network configuration ?

In advance thank you.

Revision history for this message
Emilien Macchi (emilienm) said :
#23

To all, here the situation today.

-> Node 1 : nova.conf : https://github.com/EmilienM/doc-openstack/blob/master/Configuration%20Files/Essex-1/nova.conf
-> Node 2 : nova.conf : https://github.com/EmilienM/doc-openstack/blob/master/Configuration%20Files/Essex-2/nova.conf

Here my nova-compute.conf for both servers :

--libvirt_type=kvm
--libvirt_use_virtio_for_bridges=true
--libvirt_ovs_bridge=br-int
--libvirt_vif_type=ethernet
--libvirt_vif_driver=nova.virt.libvirt.vif.LibvirtOpenVswitchDriver

On both servers :
ovs-vsctl add-port br-int eth1

My /var/lib/nova/networks/nova-gw-bcc5fbd4-a1.conf is NOT empty :
fa:16:3e:3b:c6:3b,host-192.168.113.9.novalocal,192.168.113.9
fa:16:3e:59:ed:5a,host-192.168.113.10.novalocal,192.168.113.10

But, no VM gets an IP from DHCP (neither VMs hosted by ESSEX-1 nor ESSEX-2).

When i do a tcpdump -i br-int | grep DHCP on both servers, I can see a lot of requests :
10:13:00.942132 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:4b:25:21 (oui Unknown), length 300
10:13:01.230608 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:4b:24:ab (oui Unknown), length 300

But it seems DNSMASQ does not answer with a network configuration.

What do you think :

- an issue with DNSMASQ ?
- a problem with the bridge ?

If someone has a multinode architecture with Quantum + OVS which is working, please provide any information about what he did.

Thank you

Revision history for this message
Emilien Macchi (emilienm) said :
#24

I update with my day work :

Now, my VMs hosted by ESSEX-1 have network, but VMs hosted by ESSEX-2 not.

root@essex-1:/var/lib/nova/networks# more nova-gw-fbbd1249-c5.conf
fa:16:3e:5e:e2:f3,host-192.168.113.2.novalocal,192.168.113.2
fa:16:3e:34:bf:36,host-192.168.113.3.novalocal,192.168.113.3

root@essex-2:~# tcpdump -i eth1 | grep DHCP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:29:50.653177 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:34:bf:36 (oui Unknown), length 300
16:29:53.315672 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:34:bf:36 (oui Unknown), length 300

For information : fa:16:3e:34:bf:36 is the MAC Adresse of the VM Nic hosted by ESSEX-2.

So, If I resume :

- The OVS bridge is working
- DNSMASQ is working (in local that sure, but not with br-int)
- VMs hosted by ESSEX-1 have network
- VMs hosted by ESSEX-2 don't have network

Any idea guys ?

Revision history for this message
askstack (askstack) said :
#25

Emilien
Sorry I did not get back to you sooner, I was on vacation for a week.
Have you tried using a ethernet directly connecting the two eth1 ports. This way it will by pass the switch and no packets will get dropped.

my network setting.
http://paste.openstack.org/show/18351/

Revision history for this message
Emilien Macchi (emilienm) said :
#26

I close this question, and here is the following stuffs :

https://answers.launchpad.net/quantum/+question/199823

Thank's to everybody :-)