VMs get no network

Asked by Graham Hemingway

I had to reboot my cloud controller (Folsom + Quantum). I am running the Provider Router network scenario with tunneling and non-overlapping tenant networks. Now when I launch a VM it boots up but does not have a network. Here is what I see in the VM console log:

...
[ 0.986623] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
Begin: Running /scripts/local-bottom ... [ 1.168151] Refined TSC clocksource calibration: 2393.999 MHz.
[ 1.177455] vda: vda1
GROWROOT: CHANGED: partition=1 start=16065 old: size=4176900 end=4192965 new: size=41913585,end=41929650
[ 1.310365] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
done.
done.
Begin: Running /scripts/init-bottom ... done.
cloud-init start-local running: Wed, 31 Oct 2012 15:24:47 +0000. up 3.70 seconds
no instance data found in start-local
cloud-init-nonet waiting 120 seconds for a network device.
cloud-init-nonet gave up waiting for a network device.
ci-info: lo : 1 127.0.0.1 255.0.0.0 .
ci-info: eth0 : 1 . . fa:16:3e:29:81:de
route_info failed
 * Stopping Handle applying cloud-config [ OK ]
Waiting for network configuration...
Waiting up to 60 more seconds for network configuration...
Booting system without full network configuration...
...

From nova-compute.log I know that this VM's network device is named qbr2e6b03f2-40. I can see that device using "ip a"

...
53: qbr2e6b03f2-40: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 56:12:49:d6:c1:da brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d058:2ff:fe1e:ea13/64 scope link
       valid_lft forever preferred_lft forever
54: qvo2e6b03f2-40: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 12:33:63:43:35:17 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::1033:63ff:fe43:3517/64 scope link
       valid_lft forever preferred_lft forever
55: qvb2e6b03f2-40: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr2e6b03f2-40 state UP qlen 1000
    link/ether 56:12:49:d6:c1:da brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5412:49ff:fed6:c1da/64 scope link
       valid_lft forever preferred_lft forever
...

Here is what ovs-vsctl show gives:

41c7ac3f-238a-441e-a213-b47aecfe9bd8
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="10.0.0.3"}
        Port "gre-3"
            Interface "gre-3"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="10.0.0.27"}
    Bridge br-int
        Port "qvo6706f109-3d"
            tag: 1
            Interface "qvo6706f109-3d"
        Port "qvo2e6b03f2-40"
            tag: 1
            Interface "qvo2e6b03f2-40"
        Port br-int
            Interface br-int
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
    ovs_version: "1.4.0+build0"

I don't know what qvo6706f109-3d. No other VMs are on this machine.

I look in the logs for all of the quantum and nova services but I don't see any obvious errors. Everything looks like how I think it should. What openstack service is the most involved in providing a NIC to the VM? Where should I start looking?

Thank you,
   Graham

Question information

Language:
English Edit question
Status:
Expired
For:
neutron Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Graham Hemingway (graham-hemingway) said :
#1

Some additional information. Looking at the syslog on the cloud controller I see the dnsmasq DHCP responding to the VM's requests:

Oct 31 10:24:50 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:24:50 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
Oct 31 10:24:54 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:24:54 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
Oct 31 10:25:00 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:25:00 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
Oct 31 10:25:15 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:25:15 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
Oct 31 10:25:24 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:25:24 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
Oct 31 10:25:34 cloudfront2 dnsmasq-dhcp[633]: DHCPDISCOVER(tapdbd89f9a-05) fa:16:3e:29:81:de
Oct 31 10:25:34 cloudfront2 dnsmasq-dhcp[633]: DHCPOFFER(tapdbd89f9a-05) 10.5.5.3 fa:16:3e:29:81:de
....

This continues for almost 5 minutes.

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#2

I am continue to find signatures of things happening, but I still can't figure out where it is broken. Here is what I see from the syslog of the compute node:

Oct 31 10:24:21 openstack26 kernel: [83504.382006] ADDRCONF(NETDEV_UP): qvb2e6b03f2-40: link is not ready
Oct 31 10:24:21 openstack26 kernel: [83504.443509] device qvb2e6b03f2-40 entered promiscuous mode
Oct 31 10:24:21 openstack26 kernel: [83504.507032] ADDRCONF(NETDEV_CHANGE): qvb2e6b03f2-40: link becomes ready
Oct 31 10:24:21 openstack26 kernel: [83504.568050] device qvo2e6b03f2-40 entered promiscuous mode
Oct 31 10:24:21 openstack26 kernel: [83504.690703] qbr2e6b03f2-40: port 1(qvb2e6b03f2-40) entering forwarding state
Oct 31 10:24:21 openstack26 kernel: [83504.690714] qbr2e6b03f2-40: port 1(qvb2e6b03f2-40) entering forwarding state
Oct 31 10:24:21 openstack26 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- --may-exist add-port br-int qvo2e6b03f2-40 -- set Interface qvo2e6b03f2-40 external-ids:iface-id=2e6b03f2-40b0-4744-be42-c937f9c8464c external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:29:81:de external-ids:vm-uuid=c44c7f50-923a-4095-b44a-5a3cc101b388
Oct 31 10:24:22 openstack26 kernel: [83504.984384] nbd15: p1
Oct 31 10:24:22 openstack26 kernel: [83505.341795] EXT4-fs (nbd15p1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 10:24:23 openstack26 ovs-vsctl: 00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl --timeout=2 set Port qvo2e6b03f2-40 tag=1
Oct 31 10:24:24 openstack26 kernel: [83507.325691] block nbd15: NBD_DISCONNECT
Oct 31 10:24:24 openstack26 kernel: [83507.325931] block nbd15: Receive control failed (result -32)
Oct 31 10:24:24 openstack26 kernel: [83507.326454] block nbd15: queue cleared
Oct 31 10:24:25 openstack26 ntpd[2778]: Listen normally on 50 qvb2e6b03f2-40 fe80::5412:49ff:fed6:c1da UDP 123
Oct 31 10:24:25 openstack26 ntpd[2778]: Listen normally on 51 qvo2e6b03f2-40 fe80::1033:63ff:fe43:3517 UDP 123
Oct 31 10:24:25 openstack26 ntpd[2778]: Listen normally on 52 qbr2e6b03f2-40 fe80::d058:2ff:fe1e:ea13 UDP 123
Oct 31 10:24:25 openstack26 ntpd[2778]: peers refreshed
Oct 31 10:24:25 openstack26 ntpd[2778]: new interface(s) found: waking up resolver
Oct 31 10:24:32 openstack26 kernel: [83514.929396] type=1400 audit(1351697072.039:56): apparmor="STATUS" operation="profile_load" name="libvirt-c44c7f50-923a-4095-b44a-5a3cc101b388" pid=18551 comm="apparmor_parser"
Oct 31 10:24:32 openstack26 kernel: [83515.103507] qbr2e6b03f2-40: no IPv6 routers present
Oct 31 10:24:32 openstack26 kernel: [83515.199354] qvo2e6b03f2-40: no IPv6 routers present
Oct 31 10:24:32 openstack26 kernel: [83515.438865] qvb2e6b03f2-40: no IPv6 routers present
Oct 31 10:24:32 openstack26 kernel: [83515.626646] device vnet0 entered promiscuous mode
Oct 31 10:24:32 openstack26 kernel: [83515.724289] qbr2e6b03f2-40: port 2(vnet0) entering forwarding state
Oct 31 10:24:32 openstack26 kernel: [83515.724349] qbr2e6b03f2-40: port 2(vnet0) entering forwarding state
Oct 31 10:24:36 openstack26 ntpd[2778]: Listen normally on 53 vnet0 fe80::fc16:3eff:fe29:81de UDP 123
Oct 31 10:24:36 openstack26 ntpd[2778]: peers refreshed
Oct 31 10:24:36 openstack26 ntpd[2778]: new interface(s) found: waking up resolver
Oct 31 10:24:36 openstack26 kernel: [83519.671576] qbr2e6b03f2-40: port 1(qvb2e6b03f2-40) entering forwarding state
Oct 31 10:24:43 openstack26 kernel: [83526.395983] vnet0: no IPv6 routers present
Oct 31 10:24:47 openstack26 kernel: [83530.724582] qbr2e6b03f2-40: port 2(vnet0) entering forwarding state

It looks like the network interface is getting setup correctly, but traffic just doesn't seem to be getting to it. I will start digging through the libvirt/KVM/Qemu logs now.

Revision history for this message
Weiwen Chen (wei-wen-chen) said :
#3

Can you list ifaces and all bridge ports on l3 agent? I hit the same problem today. The issue seems I configured multiple routers without name space. There is limitation only 1 router/agent without name space, and agent just quietly skip all routers' synch.
I think this is a bug. At least it should warn or stop agent since there is fundamental configure issue.

I will try to configure the router ID specifically for L3 agent and see if that helps.

Revision history for this message
yong sheng gong (gongysh) said :
#4

how did u start your dhcp agent, can u give out the dhcp agent log and configure ini file?

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#5

Weiwen and Yong, thanks for your ideas. I am not using namespaces (non-overlapping IPs) but double checked. It doesn't looktried looking into namespaces, but I am not certain how to check this.

Also, I fired up another VM and captured tcpdump output for that interface. Here it is:

root@openstack26:~# tcpdump -i qbr1a174020-40
tcpdump: WARNING: qbr1a174020-40: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qbr1a174020-40, link-type EN10MB (Ethernet), capture size 65535 bytes
09:45:29.081518 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
09:45:29.100641 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:45:29.101809 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:29.357593 IP6 :: > ff02::1:ff1a:1777: ICMP6, neighbor solicitation, who has fe80::f816:3eff:fe1a:1777, length 24
09:45:30.101369 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:31.101428 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:31.833149 IP 0.0.0.0 > all-systems.mcast.net: igmp query v2
09:45:31.833156 IP6 fe80::98cc:27ff:fe74:96e0 > ip6-allnodes: HBH ICMP6, multicast listener querymax resp delay: 1000 addr: ::, length 24
09:45:32.119475 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:45:32.119869 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:32.225168 IP6 fe80::98cc:27ff:fe74:96e0 > ff02::1:ff74:96e0: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff74:96e0, length 24
09:45:32.397791 IP6 fe80::5c2a:aaff:fe95:918b > ff02::1:ff95:918b: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff95:918b, length 24
09:45:32.485773 IP6 fe80::d8e3:5aff:fe92:b74b > ff02::1:ff92:b74b: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff92:b74b, length 24
09:45:33.117420 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:34.117412 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:34.736597 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:45:34.905563 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
09:45:38.852385 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:45:38.852824 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:39.849399 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:40.849415 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:42.725650 IP6 :: > ff02::1:ff1a:1777: HBH ICMP6, multicast listener reportmax resp delay: 0 addr: ff02::1:ff1a:1777, length 24
09:45:48.681233 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:45:48.682317 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:49.681398 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:45:50.681425 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:08.179926 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:46:08.180989 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:09.177419 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:10.177447 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:25.973599 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:46:25.974809 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:26.973420 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:27.973451 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:38.237306 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:46:38.238438 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:39.237391 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:40.237356 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:58.002958 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:46:58.004052 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:46:59.001409 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:00.001450 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:06.753525 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:47:06.754649 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:07.753378 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:08.753368 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:16.614573 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:47:16.615719 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:17.613396 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:18.613425 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:29.964148 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from fa:16:3e:1a:17:77 (oui Unknown), length 300
09:47:29.965252 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:30.961421 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28
09:47:31.961415 ARP, Request who-has 10.5.5.3 tell 10.5.5.1, length 28

Clearly the DHCP responses are getting back to the interface, but are not getting into the VM for some reason.
I look in virsh and can see that the domain has an interface and is receiving packets:

virsh # dominfo 14
Id: 14
Name: instance-00000014
UUID: cb3c15cf-697f-4bdc-bbba-ed692437ca05
OS Type: hvm
State: running
CPU(s): 1
CPU time: 16.7s
Max memory: 2097152 KiB
Used memory: 2097152 KiB
Persistent: yes
Autostart: disable
Managed save: no
Security model: apparmor
Security DOI: 0
Security label: libvirt-cb3c15cf-697f-4bdc-bbba-ed692437ca05 (enforcing)

virsh # domiflist 14
Interface Type Source Model MAC
-------------------------------------------------------
vnet0 bridge qbr1a174020-40 virtio fa:16:3e:1a:17:77

virsh # domifstat 14 vnet0
vnet0 rx_bytes 13560
vnet0 rx_packets 122
vnet0 rx_errs 0
vnet0 rx_drop 0
vnet0 tx_bytes 9580
vnet0 tx_packets 34
vnet0 tx_errs 0
vnet0 tx_drop 0

Why are the packets not getting back into the actual VM?

Revision history for this message
Graham Hemingway (graham-hemingway) said :
#6

I now see that this could be the same problem as https://answers.launchpad.net/quantum/+question/211021. Looking at my routes I see:

default via 99.59.104.1 dev qg-21f69e18-c0
10.0.0.0/24 dev eth4 proto kernel scope link src 10.0.0.3
10.5.5.0/24 dev qr-c89b0922-f7 proto kernel scope link src 10.5.5.1
10.5.5.0/24 dev tapdbd89f9a-05 proto kernel scope link src 10.5.5.2
99.59.104.0/23 dev qg-21f69e18-c0 proto kernel scope link src 99.59.105.185
192.168.49.0/24 dev eth3 proto kernel scope link src 192.168.49.244
192.168.50.0/24 dev eth2 proto kernel scope link src 192.168.50.244

There are two routes for the VM subnet (10.5.5.0). One goes to the ext_net gateway port (qr-c89b0922-f7) the other to the DHCP service (tapdbd89f9a-05). A tcpdump on the cloud controller where dhcp-agent runs, show that nothing comes out on the tapdbd89f9a-05 interface.

Is this the problem? If so, how do I fix it?

Revision history for this message
Kyle Brandt (kyle-kbrandt) said :
#7

Seeing similar behavior it seems. In my case, I am (attempting) to use nova.network.manager.FlatDHCPManager with nova-network.

I see the DHCP offers being made, counters increase in virsh as your describe. If I manually going into the instance and set the IP, network traffic flows (using the cirros image). But DHCP offers seem to be ignored.

Revision history for this message
Kyle Brandt (kyle-kbrandt) said :
#8
Revision history for this message
Launchpad Janitor (janitor) said :
#9

This question was expired because it remained in the 'Open' state without activity for the last 15 days.

Revision history for this message
Thiago Martins (martinx) said :
#10

Hi!

 I followed this guide: http://docs.openstack.org/folsom/basic-install/content/basic-install_intro.html more than 10 times.

 Always I hit this problem:

----------
cloud-init start-local running: Thu, 03 Jan 2013 04:20:52 +0000. up 8.53 seconds

no instance data found in start-local

cloud-init-nonet waiting 120 seconds for a network device.

cloud-init-nonet gave up waiting for a network device.

ci-info: lo : 1 127.0.0.1 255.0.0.0 .

ci-info: eth0 : 1 . . fa:16:3e:83:06:4f

route_info failed

Waiting for network configuration...

Waiting up to 60 more seconds for network configuration...

Booting system without full network configuration...
----------

 So, no network connectivity within my Instance.

 How can I debug it?

Tks