VM instance is not able to get IP address

Asked by Anil Vishnoi

Hi All,

I know i might be repeating the same question again, but that's because i didn't find any solution for this problem.

I did setup of 9 physical server, where one server is acting as a controller node + networking node and rest 8 servers are compute node. All these servers are running Ubuntu 12.04 and has network namespace support. When i install all the open stack services i enabled namespace for these services (l3_agent + dhcp agent). Following is my controller node configuration :

Ifconfig :

root@management:~/openstack# ifconfig
br-eth1 Link encap:Ethernet HWaddr 00:11:25:8e:a8:2d
          inet addr:10.10.3.1 Bcast:10.10.255.255 Mask:255.255.0.0
          inet6 addr: fe80::211:25ff:fe8e:a82d/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:420146 errors:0 dropped:0 overruns:0 frame:0
          TX packets:339580 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:174225852 (174.2 MB) TX bytes:238554794 (238.5 MB)

br-ex Link encap:Ethernet HWaddr 00:11:25:8e:a8:2c
          inet addr:9.126.108.143 Bcast:9.126.108.255 Mask:255.255.255.0
          inet6 addr: fe80::211:25ff:fe8e:a82c/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:435547 errors:0 dropped:16395 overruns:0 frame:0
          TX packets:156208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:46251450 (46.2 MB) TX bytes:135765850 (135.7 MB)

br-int Link encap:Ethernet HWaddr 2e:69:40:c7:d1:4b
          inet6 addr: fe80::2c69:40ff:fec7:d14b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:5282 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:365517 (365.5 KB) TX bytes:468 (468.0 B)

eth0 Link encap:Ethernet HWaddr 00:11:25:8e:a8:2c
          inet6 addr: fe80::211:25ff:fe8e:a82c/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:436071 errors:0 dropped:18 overruns:0 frame:0
          TX packets:160434 errors:8 dropped:0 overruns:0 carrier:0
          collisions:13583 txqueuelen:1000
          RX bytes:48038908 (48.0 MB) TX bytes:122612206 (122.6 MB)
          Interrupt:16

eth1 Link encap:Ethernet HWaddr 00:11:25:8e:a8:2d
          inet6 addr: fe80::211:25ff:fe8e:a82d/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:529317 errors:0 dropped:8 overruns:0 frame:0
          TX packets:413050 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:183756993 (183.7 MB) TX bytes:245056300 (245.0 MB)
          Interrupt:16

int-br-eth1 Link encap:Ethernet HWaddr f2:cb:af:74:21:d0
          inet6 addr: fe80::f0cb:afff:fe74:21d0/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:18845 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1322857 (1.3 MB) TX bytes:936 (936.0 B)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:11428927 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11428927 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2378537968 (2.3 GB) TX bytes:2378537968 (2.3 GB)

phy-br-eth1 Link encap:Ethernet HWaddr 0e:ca:4d:6d:8a:89
          inet6 addr: fe80::cca:4dff:fe6d:8a89/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:12 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18845 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:936 (936.0 B) TX bytes:1322857 (1.3 MB)

ifconfig -a: Same output as above

ovs-vsctl output:

root@management:~/openstack# ovs-vsctl show
64ee78f8-1624-49bb-a261-e40862672c91
    Bridge br-ex
        Port "eth0"
            Interface "eth0"
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "tap9fdb5c15-26"
            tag: 1
            Interface "tap9fdb5c15-26"
                type: internal
        Port "int-br-eth1"
            Interface "int-br-eth1"
    Bridge "br-eth1"
        Port "eth1"
            Interface "eth1"
        Port "phy-br-eth1"
            Interface "phy-br-eth1"
        Port "br-eth1"
            Interface "br-eth1"
                type: internal
    ovs_version: "1.4.0+build0"

ovs-dpctl output:
root@management:~/openstack# ovs-dpctl show
system@br-eth1:
 lookups: hit:683982 missed:128186 lost:0
 flows: 3
 port 0: br-eth1 (internal)
 port 1: eth1
 port 2: phy-br-eth1
system@br-int:
 lookups: hit:940 missed:67832 lost:0
 flows: 0
 port 0: br-int (internal)
Feb 07 17:32:40|00001|netdev_linux|WARN|/sys/class/net/tap9fdb5c15-26/carrier: open failed: No such file or directory
 port 1: tap9fdb5c15-26 (internal)
 port 2: int-br-eth1
system@br-ex:
 lookups: hit:386505 missed:206232 lost:3
 flows: 14
 port 0: br-ex (internal)
 port 1: eth0

root@management:~/openstack# brctl show
bridge name bridge id STP enabled interfaces
br-eth1 0000.0011258ea82d no eth1
       phy-br-eth1
br-ex 0000.0011258ea82c no eth0
br-int 0000.2e6940c7d14b no int-br-eth1
       tap9fdb5c15-26
NOTE-1: I manually enabled br-int bridge interface.
NOTE-2: linux bridge module is not loaded

root@management:~/openstack# nova-manage service list
2013-02-07 17:54:37 DEBUG nova.utils [req-1108df8f-b7c9-4190-b872-dd2ba0a9cf74 None None] backend <module 'nova.db.sqlalchemy.api' from '/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.pyc'> __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:502
Binary Host Zone Status State Updated_At
nova-cert management nova enabled :-) 2013-02-07 12:24:28
nova-consoleauth management nova enabled :-) 2013-02-07 12:24:33
nova-scheduler management nova enabled :-) 2013-02-07 12:24:33
nova-compute irldxph010 nova enabled :-) 2013-02-07 12:24:31

root@management:~/openstack# service quantum-server status
quantum-server start/running, process 8459
root@management:~/openstack# service quantum-dhcp-agent status
quantum-dhcp-agent start/running, process 8493
root@management:~/openstack# service quantum-l3-agent status
quantum-l3-agent start/running, process 10903
root@management:~/openstack# service quantum-plugin-openvswitch-agent status
quantum-plugin-openvswitch-agent start/running, process 6101

I created one private network and created one subnet for that network 192.168.0.0/24. I am able to spawn VM instance and i can access the cirros console. But ifconfig shows that there is no IP assigned to the eth0 of the VM.

I googled - web and openstack archive to debug it further. This mailing list was really helped me a lot in debugging this issue further. Thanks! to all the members for your efforts.

So in my case i can see that DHCP request from the VM is coming to the controller node. But there is no response from the DHCP agent. DHCP agent looks fine to me :

root@management:~/openstack# ps -elf |grep dhcp
4 S quantum 8493 1 0 80 0 - 20040 ep_pol 13:12 ? 00:00:00 python /usr/bin/quantum-dhcp-agent --config-file=/etc/quantum/quantum.conf --config-file=/etc/quantum/dhcp_agent.ini --log-file=/var/log/quantum/dhcp-agent.log
5 S nobody 8751 1 0 80 0 - 6886 poll_s 13:12 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap9fdb5c15-26 --except-interface=lo --domain=openstacklocal --pid-file=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/host --dhcp-optsfile=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,192.168.0.0,static,120s
1 S root 8752 8751 0 80 0 - 6879 pipe_w 13:12 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap9fdb5c15-26 --except-interface=lo --domain=openstacklocal --pid-file=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/pid --dhcp-hostsfile=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/host --dhcp-optsfile=/var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/opts --dhcp-script=/usr/bin/quantum-dhcp-agent-dnsmasq-lease-update --leasefile-ro --dhcp-range=set:tag0,192.168.0.0,static,120s
0 S root 14276 24653 0 80 0 - 2028 pipe_w 17:59 pts/1 00:00:00 grep --color=auto dhcp

Host File :

root@management:~/openstack# cat /var/lib/quantum/dhcp/ed76c5be-c839-4f4c-aa78-c100db9b8d82/host
fa:16:3e:16:c5:e3,192-168-0-2.openstacklocal,192.168.0.2
fa:16:3e:93:74:73,192-168-0-3.openstacklocal,192.168.0.3

192.168.0.3 is actually assigned to my VM and i can see it in openstack dashboard. To further debug it, i restarted the VM and dumped the flows on br-int in controller node. I can see following rules in the switch

ovs-ofctl output:
root@management:~/openstack# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=11422.615s, table=0, n_packets=16711, n_bytes=1178562, priority=2,in_port=2 actions=drop
 cookie=0x0, duration=11421.593s, table=0, n_packets=0, n_bytes=0, priority=3,in_port=2,dl_vlan=1 actions=mod_vlan_vid:1,NORMAL
 cookie=0x0, duration=11424.054s, table=0, n_packets=6, n_bytes=468, priority=1 actions=NORMAL

All the incoming DHCP packets are actually matching the first rule, and its action is DROP.

Ovs-dpctl shows following rule

in_port(2),eth(src=fa:16:3e:93:74:73,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=0.0.0.0,dst=255.255.255.255,proto=17,tos=0,ttl=64,frag=no),udp(src=68,dst=67), packets:2, bytes:644, used:5.444s, actions:drop

So looking at the src/dst port i believe this rule was installed for DHCP packet match, and here also you can see that its dropping the packet. So this seems to be the reason that DHCP agent is not getting the DHCP request packet and hence not able to offer any IP address.

Do anybody can point out what can be the possible reason br-int is dropping these packet? Or at first place whether its seems the root cause of this problem to you ? :).

Please let me know if you need any other details to understand my setup/configuration. I will debug it further and will post if anything interesting comes up. Meanwhile if you have any suggestion to further debug it, please do suggest, i am stuck and its very annoying :).

Thanks!
Anil

Question information

Language:
English Edit question
Status:
Expired
For:
neutron Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Silvan Kaiser (silvan-kaiser) said :
#1

Hi Anil!
I seem to be stuck in a similar place. I'm using a slightly different setup with a dedicated network node but the issues are the same. I've been trying to find out why dhcp requests in the VMs are not successful and just verified the same issue as you describe in here.
You're not alone. :-)
Best
Silvan

Revision history for this message
Anil Vishnoi (vishnoianil) said :
#2

Hi Team,

I further debugged this issue, and figure out one workaround. I really don't want to say it a "workaround" but moreover its a hack.

As i mentioned in the above description that because of the action=drop, DHCP packets were getting dropped and not reaching to the DHCP agent, and hence it was not able to respond with the DHCPOFFER response.

First i resolve this error [Feb 07 17:32:40|00001|netdev_linux|WARN|/sys/class/net/tap9fdb5c15-26/carrier: open failed: ] with the following steps :
1. disable the network namespace for l3_agent and dhcp agent by modifying the use_namespace=false in the respective configuration file.
2. Delete the port (tap9fdb5c15-26) from the br-int bridge.
[Quick instructions :
root@management:~# ovs-vsctl del-port tap9fdb5c15-26
root@management:~# ovs-vsctl add-port br-int tap9fdb5c15-26
root@management:~# ovs-vsctl set port tap9fdb5c15-26 tag=1
root@management:~# ovs-vsctl set Interface tap9fdb5c15-26 type=internal
]
3. Restart both the services and it will create tap devices outside the network name space.

If network namespace is enabled, ifconfig will not show this tap device in its output, but if you fire command 'ip netns exec dhcpnsXXXX ip -d link' it will show you the device.

In my setup i followed the above step, but even if you don't want to disable namespace, you can stop dhcp agent, delete the port from br-int and restart the service. It possibly will resolve this error ( it did worked in my setup).

So in my setup, namespace is disabled. And following is the output of ovs-dpctl

root@management:~# ovs-dpctl show
system@br-eth1:
 lookups: hit:151651 missed:37759 lost:0
 flows: 3
 port 0: br-eth1 (internal)
 port 1: eth1
 port 3: phy-br-eth1
system@br-int:
 lookups: hit:1183 missed:23283 lost:0
 flows: 1
 port 0: br-int (internal)
 port 6: tap9fdb5c15-26 (internal)
 port 7: int-br-eth1
system@br-ex:
 lookups: hit:96895 missed:67156 lost:0
 flows: 16
 port 0: br-ex (internal)
 port 1: eth0

DHCP request packet is broadcast packet and it takes following path to reach the br-int port 1: eth1 (br-eth1) --> port 7: int-br-eth1(br-int) and this packet gets drop here because of the following rule installed on br-int bridge

 cookie=0x0, duration=11422.615s, table=0, n_packets=16711, n_bytes=1178562, priority=2,in_port=7 actions=drop

Ideally it should be forwarded to port 6: tap9fdb5c15-26 (internal) (br-int) and that way it can reach DHCP agent. So i modified above flow to following flow

cookie=0x0, duration=3169.501s, table=0, n_packets=2562, n_bytes=228241, priority=2,in_port=7 actions=output:6

and also installed following rule to route back the DHCPOFFER packet

cookie=0x0, duration=4536.551s, table=0, n_packets=233, n_bytes=28896, priority=2,in_port=6 actions=output:7

So after installing these two flow rules, DHCP agent got the request and responded with the DHCPOFFER response.

root@management:~# tail -f /var/log/syslog
Feb 8 03:26:16 management dnsmasq-dhcp[25811]: DHCPREQUEST(tap9fdb5c15-26) 192.168.0.3 fa:16:3e:93:74:73
Feb 8 03:26:16 management dnsmasq-dhcp[25811]: DHCPACK(tap9fdb5c15-26) 192.168.0.3 fa:16:3e:93:74:73 192-168-0-3

DHCP response packet will take following path port 6: tap9fdb5c15-26 (internal)(br-int) ---> port 7: int-br-eth1(br-int) ---> port 3: phy-br-eth1 (br-eth1) ---> port 1: eth1 (br-eth1) and that way this packet will go out of controller node. But on br-eth1 bridge another rule was installed which was dropping the response

cookie=0x0, duration=2669.22s, table=0, n_packets=173, n_bytes=18144, priority=2,in_port=3 actions=drop

and i changed this flow to

cookie=0x0, duration=2669.22s, table=0, n_packets=173, n_bytes=18144, priority=2,in_port=3 actions=output:1

so now packet can escape from the controller machine. Now follows the story of compute node side.

Following is ovs-dpctl output of my compute node :

system@br-eth1:
 lookups: hit:404442 missed:110048 lost:0
 flows: 1
 port 0: br-eth1 (internal)
 port 1: eth1
 port 3: phy-br-eth1
system@br-int:
 lookups: hit:1884 missed:71022 lost:0
 flows: 0
 port 0: br-int (internal)
 port 3: int-br-eth1
 port 4: qvo819abf08-ca
 port 6: tap718d359b-d1 <<VM Connected to this tap device

Response packet should take following path: port 1: eth1(br-eth1) ---> port 3: int-br-eth1 (br-int) --->port 6: tap718d359b-d1 (br-int), but on br-int bridge following flow rule was installed which was dropping the response packet

cookie=0x0, duration=1671.356s, table=0, n_packets=1068, n_bytes=99127, priority=2,in_port=3 actions=drop

so i modified this flow to

cookie=0x0, duration=1671.356s, table=0, n_packets=1068, n_bytes=99127, priority=2,in_port=3 actions=output:6

and that way it was forwarding the packet to my VM, and i can now see that IP address 192.168.0.3 is now assigned to my machine. Ideally this is the job of quantum plug-in, but not sure why its dropping all the packets from both the sides.
Above exercise establishes the fact that dhcp agent is working fine here, its the network routing which is causing the issue, and that too openvswitch plug-in as per my understanding.

Seeking suggestion from the networking experts on the list, what possibly can cause this issue, do openvswitch plug-in has any dependency on linux bridge or brcompat module to work properly ? because on controller node neither bridge module nor brcompat module is loaded. Obviously this hack won't work for all other cases, so we need to resolve the issue at the plugin level. Please suggest!

Thanks
Anil

Revision history for this message
Launchpad Janitor (janitor) said :
#3

This question was expired because it remained in the 'Open' state without activity for the last 15 days.