Ubuntu 12.10 + Folsom + Quantum + OVS + GRE Problems

Asked by Joshua Dotson

Hi,

I am having OVS issues with my 3-node (control, network, compute) deployment (Ubuntu 12.10 + Folsom + Quantum + OVS + GRE). From what I can tell, my VM's are given a vnet# interface on the compute node by OVS, but they never are able to reach the network node for DHCP, etc.

I ran a tcpdump last night, which showed a new VM trying repeatedly to get an answer from dnsmasq, which I confirmed is running on 67 udp on the network node.

I'm most concerned by the fact that 'ip netns list' returns nothing on any of the three nodes (running the command as 'root').

It's beginning to look like I should just start again from scratch. I dropped the Quantum db last night and remade it, hoping to pull it all together again, but no matter what I do, nothing seems to be working to fix this issue.

Here is the guide I'm debugging/following. It's based on skible's stable/GRE guide.

https://github.com/josh-wrale/OpenStack-Folsom-Install-guide/blob/master/OpenStack_Folsom_Install_Guide_WebVersion.rst

Here's a paste dump of the situation:

http://pastebin.com/raw.php?i=NF34hqMX

Thank you very much for your assistance.

-Joshua

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
yong sheng gong
Solved:
Last query:
Last reply:
Revision history for this message
Joshua Dotson (tns9) said :
#1

Here is a tcpdump of vnet0 on the compute node, when its only VM is rebooted.

http://pastebin.com/3K1fe5Dd

Revision history for this message
Joshua Dotson (tns9) said :
#2

Below, I'm including the full log of the VM booting. Please note that after witnessing the "eth0: IPv6 duplicate address" error in this log, I disabled ipv6 in the sysctl.conf of all three machines and performed a reboot. The log below is from a hard reboot of the VM, after the reboot of the compute node (hypervisor), control node and network node to disable ipv6.

http://pastebin.com/raw.php?i=5PVfUiFw

Revision history for this message
yong sheng gong (gongysh) said :
#3

It seems you have not run quantum-openvswitch-agent on your network node right.
Do u make sure you are using the same ovs_quantum_plugin.ini file on network node as one on the compute node?

network work node should have br-tun too. We should also have gre port on br-tun on both compute node and network node.

Revision history for this message
yong sheng gong (gongysh) said :
#4

Can u give out the:
quantum net-show example-net|ext_net?

Revision history for this message
Joshua Dotson (tns9) said :
#5

Here are the net-show commands:

http://pastebin.com/raw.php?i=fZw68Rfx

It seems that br-tun should be added automatically, because I configured it in the .ini files. Is br-tun a port which needs to be manually added using ovs-vsctl?

Here is the /etc/quantum/plugins/openvswitch/ovs_quantum_plugin.ini from the network node:

http://pastebin.com/K41uHUyv

And the compute node:

http://pastebin.com/DJvEhtj4

Thanks,
Joshua

Revision history for this message
yong sheng gong (gongysh) said :
#6

can u try to run
quantum-openvswitch-agent --config-file /etc/quantum/quantum.conf --config-file /etc/quantum/plugins/openvswitch/ovs_quantum_plugin.ini --debug true

on your network node. Of course stop the current one before it.

and paste out the log.

Revision history for this message
Joshua Dotson (tns9) said :
#7

Here is the output of quantum-openvswitch-agent:

http://pastebin.com/D22zhiEY

That command appears to have updated the state of OVS:

root@knet-hj29:/etc/init.d# ovs-vsctl show
a4de515d-6d78-45b8-870f-abcd97509190
    Bridge br-tun
        Port br-tun
            Interface br-tun
                type: internal
        Port "gre-1"
            Interface "gre-1"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.20.10.53"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex
        Port "eth2"
            Interface "eth2"
        Port br-ex
            Interface br-ex
                type: internal
    Bridge br-int
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
    ovs_version: "1.4.3"
root@knet-hj29:/etc/init.d#

Here is the agent run on the compute node:

http://pastebin.com/aRhgQeHR

And ovs-vsctl show on the compute node, from after the agent:

root@khyp-c49x:/etc/init.d# service quantum-plugin-openvswitch-agent startquantum-plugin-openvswitch-agent start/running, process 9967
root@khyp-c49x:/etc/init.d# service quantum-plugin-openvswitch-agent stopstop: Unknown instance:
root@khyp-c49x:/etc/init.d# service quantum-plugin-openvswitch-agent start
quantum-plugin-openvswitch-agent start/running, process 10096
root@khyp-c49x:/etc/init.d# ovs-vsctl show
801aa35e-5de6-483d-a692-6555ea348f87
    Bridge br-int
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port br-int
            Interface br-int
                type: internal
        Port "qvoe4bc93cc-d5"
            tag: 1
            Interface "qvoe4bc93cc-d5"
    Bridge br-tun
        Port "gre-2"
            Interface "gre-2"
                type: gre
                options: {in_key=flow, out_key=flow, remote_ip="172.20.10.52"}
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    ovs_version: "1.4.3"
root@khyp-c49x:/etc/init.d#

I just tested another VM. It is still unable to reach DCHP. It seems like something is making the agents crash. Is the following behavior normal? It seems strange to see the agents on both compute and network exit'ing with code 1 in dmesg.

(tail of dmesg on compute node)
[36275.758262] block nbd15: queue cleared
[36277.235971] type=1400 audit(1355061400.942:23): apparmor="STATUS" operation="profile_load" name="libvirt-3c597e88-1a5a-4147-8217-43f447c240d2" pid=10822 comm="apparmor_parser"
[36277.349062] device vnet0 entered promiscuous mode
[36277.355463] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36277.355488] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36278.958207] kvm: 10992: cpu0 unhandled rdmsr: 0xc0010112
[36289.833629] qbrfe616ec2-11: port 1(qvbfe616ec2-11) entered forwarding state
[36292.390485] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
root@khyp-c49x:/etc/init.d# service quantum-plugin-openvswitch-agent stop
stop: Unknown instance:
root@khyp-c49x:/etc/init.d# service quantum-plugin-openvswitch-agent start
quantum-plugin-openvswitch-agent start/running, process 11341
root@khyp-c49x:/etc/init.d# dmesg|tail
[36275.757391] block nbd15: Unexpected reply (ffff883fd06c5c48)
[36275.758262] block nbd15: queue cleared
[36277.235971] type=1400 audit(1355061400.942:23): apparmor="STATUS" operation="profile_load" name="libvirt-3c597e88-1a5a-4147-8217-43f447c240d2" pid=10822 comm="apparmor_parser"
[36277.349062] device vnet0 entered promiscuous mode
[36277.355463] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36277.355488] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36278.958207] kvm: 10992: cpu0 unhandled rdmsr: 0xc0010112
[36289.833629] qbrfe616ec2-11: port 1(qvbfe616ec2-11) entered forwarding state
[36292.390485] qbrfe616ec2-11: port 2(vnet0) entered forwarding state
[36407.439814] init: quantum-plugin-openvswitch-agent main process (11341) terminated with status 1
root@khyp-c49x:/etc/init.d#

And also this (on network node):

root@knet-hj29:/etc/init.d# dmesg|tail
[ 129.254818] type=1400 audit(1355022858.851:26): apparmor="STATUS" operation="profile_load" name="/usr/sbin/tcpdump" pid=1485 comm="apparmor_parser"
[ 132.682595] openvswitch: Open vSwitch switching datapath 1.4.3, built Dec 8 2012 22:05:31
[ 132.733944] init: quantum-dhcp-agent main process (1532) terminated with status 1
[ 132.812775] init: quantum-plugin-openvswitch-agent main process (1533) terminated with status 1
[ 133.539604] device br-int entered promiscuous mode
[ 133.540034] device br-ex entered promiscuous mode
[ 134.120299] init: quantum-l3-agent main process (1534) terminated with status 1
[ 2225.210200] init: quantum-plugin-openvswitch-agent main process (2224) terminated with status 1
[37180.858158] device br-tun entered promiscuous mode
[37412.770128] init: quantum-plugin-openvswitch-agent main process (4103) terminated with status 1
root@knet-hj29:/etc/init.d# service quantum-plugin-openvswitch-agent stop
stop: Unknown instance:
root@knet-hj29:/etc/init.d# service quantum-plugin-openvswitch-agent start
quantum-plugin-openvswitch-agent start/running, process 4358
root@knet-hj29:/etc/init.d# dmesg|tail
[ 132.682595] openvswitch: Open vSwitch switching datapath 1.4.3, built Dec 8 2012 22:05:31
[ 132.733944] init: quantum-dhcp-agent main process (1532) terminated with status 1
[ 132.812775] init: quantum-plugin-openvswitch-agent main process (1533) terminated with status 1
[ 133.539604] device br-int entered promiscuous mode
[ 133.540034] device br-ex entered promiscuous mode
[ 134.120299] init: quantum-l3-agent main process (1534) terminated with status 1
[ 2225.210200] init: quantum-plugin-openvswitch-agent main process (2224) terminated with status 1
[37180.858158] device br-tun entered promiscuous mode
[37412.770128] init: quantum-plugin-openvswitch-agent main process (4103) terminated with status 1
[39008.879120] init: quantum-plugin-openvswitch-agent main process (4358) terminated with status 1
root@knet-hj29:/etc/init.d# service quantum-plugin-openvswitch-agent stop
stop: Unknown instance:
root@knet-hj29:/etc/init.d#

Thanks,
Joshua

Revision history for this message
Best yong sheng gong (gongysh) said :
#8

try to run agents on both network and compute nodes with direct command line.
It seems your dhcp agent is not running, either. You should start it on your network node too:
sudo quantum-dhcp-agent --config-file /etc/quantum/quantum.conf --config-file /etc/quantum/dhcp_agent.ini --debug true

By the way, I don't know why your service way to start the agents do not work.

Revision history for this message
Joshua Dotson (tns9) said :
#9

Your hint led me to resolve a major issue: In trying to debug the network and compute nodes, I purged OVS/Quantum, backed up /var/log and 'rm -rf /var/log/*'. Well, that was a stupid idea, because the /var/log/quantum and /var/log/upstart directories are not automatically spawned if not present. This behavior was keeping the quantum services from starting. I recreated them, per the control node's example, including permissions. All of the agents appear to be started now.

However, my VM is still not able to get a DHCP address.

The following tcpdump comes from the network node, where the l3-agent and dhcp-agent live. It confirms that the DHCP request is arriving on the 'tap' of the network node.

http://pastebin.com/xbUVtVjR

I'm now working on the next layer of the onion. :-)

Thanks,
Joshua

Revision history for this message
yong sheng gong (gongysh) said :
#10

Don't forget your network node also need run l2 agent quantum-openvswitch-agent.
and please 'ovs-vsctl show' on both network and compute node when u are certain all agents run well.

Revision history for this message
Joshua Dotson (tns9) said :
#11

All agents appear to be running well at this time, though my problem persists.

Here is the outputs from 'ovs-vsctl show':

http://pastebin.com/NU2kcX1K

Here is something interesting: Per /var/log/syslog on the network node, dnsmasq is trying to answer the 3x discovers. It seems the path back to the compute node is the issue.

root@knet-hj29:/var/log# grep dhcp syslog
Dec 10 10:27:08 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/host
Dec 10 10:27:08 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/opts
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/host
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/opts
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/host
Dec 10 10:27:42 knet-hj29 dnsmasq-dhcp[6186]: read /var/lib/quantum/dhcp/cea4c3cf-a226-43d9-b7c3-a6fc04a636af/opts
Dec 10 10:27:50 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(tap3e2fb05e-53) fa:16:3e:20:de:56
Dec 10 10:27:50 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(tap3e2fb05e-53) 10.5.5.4 fa:16:3e:20:de:56
Dec 10 10:27:53 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(tap3e2fb05e-53) fa:16:3e:20:de:56
Dec 10 10:27:53 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(tap3e2fb05e-53) 10.5.5.4 fa:16:3e:20:de:56
Dec 10 10:27:56 knet-hj29 dnsmasq-dhcp[6186]: DHCPDISCOVER(tap3e2fb05e-53) fa:16:3e:20:de:56
Dec 10 10:27:56 knet-hj29 dnsmasq-dhcp[6186]: DHCPOFFER(tap3e2fb05e-53) 10.5.5.4 fa:16:3e:20:de:56
root@knet-hj29:/var/log#

Thanks again,
Joshua

Revision history for this message
Joshua Dotson (tns9) said :
#12

This may be of help, too. It's another tcpdump of the network node. This time only showing gre proto on eth1:

http://pastebin.com/raw.php?i=pqmXuJdE

Is it possible that this is a routing issue?

NETWORK:

root@knet-hj29:~# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.5.1 0.0.0.0 UG 0 0 0 br-ex
10.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
192.168.5.0 0.0.0.0 255.255.255.0 U 0 0 0 br-ex
172.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
root@knet-hj29:~#

COMPUTE:

root@khyp-c49x:/var/log# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.20.10.52 0.0.0.0 UG 0 0 0 eth1
10.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
172.20.10.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
root@khyp-c49x:/var/log#

Here is iptables output for both hosts:

http://pastebin.com/raw.php?i=qcym4jzf

Thanks,
Joshua

Revision history for this message
Joshua Dotson (tns9) said :
#13

So, strangely, when I do a tcpdump on the network node's tap and q* interfaces, this happens:

root@knet-hj29:/home/boss# ifconfig -a
br-ex Link encap:Ethernet HWaddr 00:10:18:c8:b0:08
          inet addr:192.168.5.108 Bcast:192.168.5.255 Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fec8:b008/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:308 errors:0 dropped:0 overruns:0 frame:0
          TX packets:220 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:40832 (40.8 KB) TX bytes:24567 (24.5 KB)

br-int Link encap:Ethernet HWaddr 82:ae:7d:d6:e6:4e
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

br-tun Link encap:Ethernet HWaddr f2:fb:a3:b5:30:41
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

eth0 Link encap:Ethernet HWaddr 00:24:e8:2e:80:d3
          inet addr:10.20.10.52 Bcast:10.20.10.255 Mask:255.255.255.0
          inet6 addr: fe80::224:e8ff:fe2e:80d3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:273 errors:0 dropped:0 overruns:0 frame:0
          TX packets:265 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:62270 (62.2 KB) TX bytes:45001 (45.0 KB)
          Interrupt:33

eth1 Link encap:Ethernet HWaddr 00:10:18:c8:b0:0a
          inet addr:172.20.10.52 Bcast:172.20.10.255 Mask:255.255.255.0
          inet6 addr: fe80::210:18ff:fec8:b00a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:492 (492.0 B)

eth2 Link encap:Ethernet HWaddr 00:10:18:c8:b0:08
          inet6 addr: fe80::210:18ff:fec8:b008/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
          RX packets:462 errors:0 dropped:71 overruns:0 frame:0
          TX packets:222 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:56920 (56.9 KB) TX bytes:25521 (25.5 KB)

eth3 Link encap:Ethernet HWaddr 00:24:e8:2e:80:d4
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
          Interrupt:37

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:348 errors:0 dropped:0 overruns:0 frame:0
          TX packets:348 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:30488 (30.4 KB) TX bytes:30488 (30.4 KB)

qg-a0f57edd-6c Link encap:Ethernet HWaddr ae:a3:6f:5f:fc:b9
          inet addr:192.168.5.110 Bcast:192.168.5.255 Mask:255.255.255.0
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:34 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4162 (4.1 KB) TX bytes:0 (0.0 B)

qr-443b6d3d-71 Link encap:Ethernet HWaddr 5a:be:bf:27:05:f5
          inet addr:10.5.5.1 Bcast:10.5.5.255 Mask:255.255.255.0
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

tap3e2fb05e-53 Link encap:Ethernet HWaddr 26:0a:0b:32:2e:ef
          inet addr:10.5.5.3 Bcast:10.5.5.255 Mask:255.255.255.0
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

root@knet-hj29:/home/boss# tcpdump -i tap3e2fb05e-53
tcpdump: tap3e2fb05e-53: That device is not up
root@knet-hj29:/home/boss# tcpdump -i qr-443b6d3d-71
tcpdump: qr-443b6d3d-71: That device is not up
root@knet-hj29:/home/boss# tcpdump -i qg-a0f57edd-6c
tcpdump: qg-a0f57edd-6c: That device is not up
root@knet-hj29:/home/boss#

It seems to me that Quantum should be ifconfig up'ing these. Do you know why it isn't?

Thanks,
Joshua

Revision history for this message
Joshua Dotson (tns9) said :
#14

I'm rebuilding from scratch, so this can be closed. Thanks for your help! -Joshua

Revision history for this message
Joshua Dotson (tns9) said :
#15

Thanks yong sheng gong, that solved my question.

Revision history for this message
Adel (adelgacem) said :
#16

Hi,

Did you rebuilded ?
Does it works ?
If yes could you provide, from both compute & network nodes :

# route -n
# ifconfig
# dpkg -l |grep openv

Thank's for your help.