Essex - Quantum - OVS - Multi-Node Architecture -> Working Partially !

Asked by Emilien Macchi

Hi Stackers,

I will be the more precise as possible.

I'm working in a multi-node architecture with Ubuntu 12.04 / Essex up to date.

My architecture is clean and all was working in VLAN-Manager Mode. From now, I switch to Quantum Manager.

My docs references :

http://docs.openstack.org/trunk/openstack-network/admin/content/index.html
http://openvswitch.org/openstack/documentation/

- Node 1 : Controller

MySQL, Rabbit-MQ, nova-volume, nova-api, nova-network, nova-schedule, quantum-server with OVS plugin

nova.conf : http://paste.openstack.org/show/18401/

ovs-vsctl add-br br-int
ovs-vsctl add-port br-int eth1

ovs-vsctl br-set-external-id br-int bridge br-int (useful ?)
[Edit : I've rebuilt by bridge without this command]

I use default mode of Quantum (Without tunneling).

nova-manage network create --label=public --fixed_range_v4=192.168.15.0/24

DNSMASQ is running well on controller node (which is also nova-network).

/etc/network/interfaces with eth1 :

[..]

iface eth1 inet manual
        up ifconfig $IFACE 0.0.0.0 up
        up ip link set $IFACE promisc on
        down ip link set $IFACE promisc off
        down ifconfig $IFACE down

- Node 2 : Compute1 and Node 3 Compute2 :

nova.conf -> same as controller

nova-compute.conf -> http://paste.openstack.org/show/18403/

----------------------------------------------------------------------------------------------------

I explain now some stuff I've seen :

- When I create an instance, it's does not get an IP address from DNSMASQ. After many hours to looking for why, I can see I'm not alone to be in this situation. I did not find someone in the OpenStack community with Essex + Quantum + OVS working in Multi-Node Architecture !
That's why I'm doing an investigation as I can, and I think to have localized the issue.

- On the compute node :

root@compute1:~# ovs-vsctl show
    Bridge br-int
        Port "eth1"
            Interface "eth1"
        Port br-int
            Interface br-int
                type: internal
        Port "tap771bf804-eb"
            tag: 4095
            Interface "tap771bf804-eb"
    ovs_version: "1.4.0+build0"

My first question :

Why we have a 4095 tag for the TAP interface (which is vNIC of VM) ?

What I found :

If I delete TAP interface after VM creation, and I recreate it, my VM gets an IP !!! :

ovs-vsctl del-port tap771bf804-eb
ovs-vsctl add-port br-int tap771bf804-eb

After that, if my VM asks for an IP, she gets an IP.

I know that's not clean, but I try to find what's wrong with OVS Plugin in https://github.com/openstack/quantum/blob/master/quantum/plugins/openvswitch/agent/ovs_quantum_agent.py

Maybe a issue with :

self.int_br.add_flow(priority=2,
                                         in_port=p.ofport,
                                         actions="drop")

?

- Other problem now, I can connect to the VM with this tips, but I can't connect from other hosts than my controller (ans nova-network as well). And also my VM does not have Internet.

Second question :

What's wrong with IPtables ? My security groups allow SSH + ICMP.

I think to have isolated the issue, but now we have to debug it and to understand what's wrong with OVS + Quantum in multi-node architecture.

Thank's for help, and please let me know if something is wrong in my configuration, if you have any idea, if you need logs files or other stuffs.

I continue my investigation.

Question information

Language:
English Edit question
Status:
Solved
For:
neutron Edit question
Assignee:
No assignee Edit question
Solved by:
Emilien Macchi
Solved:
Last query:
Last reply:
Revision history for this message
Emilien Macchi (emilienm) said :
#1

Here you can find Agent logs when I create a VM :

DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 list-ports br-int
DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 get Interface eth1 external_ids
DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 get Interface eth1 ofport
DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 get Interface tap2d31bc15-08 external_ids
DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 get Interface tap2d31bc15-08 ofport
DEBUG:root:## running command: sudo ovs-vsctl --timeout=2 set Port tap2d31bc15-08 tag=4095
DEBUG:root:## running command: sudo ovs-ofctl add-flow br-int priority=2,in_port=2,actions=drop

Someone knows if actions=drop is normal ?

Revision history for this message
Emilien Macchi (emilienm) said :
#2

After more investigations with Pedro Navarro Pérez,

We found why we have "actions=drop", it's because my computes nodes were not able to access to Database (my fault).

But my VMs does not have access yet to the network.

In ovs_quantum database, I can see the ports (my VM TAP and the Gateway).
The port of my Gateway is state "ACTIVE" but in op_status "DOWN". Is it normal ?

What can I do ?

I try several times to delete / recreate networks, but always the same situation.

I continue investigations.

Revision history for this message
Emilien Macchi (emilienm) said :
#3

The port of my Gateway is state "ACTIVE" but in op_status "DOWN" because I did not run OVS Agent on controller (also nova-network).

So now, I run the agent on the controller, and my gateway is ACTIVE & UP.

If I read http://openvswitch.org/openstack/documentation, it's written that the agent must be run from computes nodes, or maybe I did not understand :-).

My VMs does not have access yet to the network. Coming soon I hope !!

Revision history for this message
dan wendlandt (danwent) said :
#4

Hi Emilien,

You are correct, the OVS doc page was missing the comment that you need to create an integration bridge and run the ovs_quantum_agent.py on the nova-network node.... sorry about that. I've updated the page. Thanks for letting us know!

If you run into issues in the future and want to see how a working setup, you should be able to use the instructions for multi-node devstack, as those are used pretty regularly and should be up-to-date: http://wiki.openstack.org/QuantumDevstack

Here are a couple suggestions for debugging:

- look at the tables in ovs_quantum database. There should be a table for networks, ports, and vlan_bindings. The vlan_bindings table describes which vlan id has been allocated to a particular network. When a device appears on br-int, the ovs_quantum_agent.py finds the external-ids:iface-id attribute of that device from the ovsdb Interfaces table (this is the per-host OVS database that comes with OVS, not the OVS plugin database), and then queries the centralized OVS plugin database to find the associated port, network, and vlan. It will then set the vlan of the port to be the vlan from vlan_bindings. If you see a port with a vlan of 4095, this means we were unable to find a quantum port associated with that external-id:iface-id value.

Revision history for this message
Emilien Macchi (emilienm) said :
#5

@Dan

As usual, thank's for support.

- For the doc, don't worry, I'm glad my work can be useful for everybody.

- I like DevStack but I want to build my architecture from scratch with the goal to be a Jedi in Quantum :-) (and OpenStack also). I always use DevStack as a reference when I need to confirm an information about configuring or something else. But in my case I can see everythink is like DevStack without Melange (Should I use it ?) and I did not launch "ovs-vsctl --no-wait br-set-external-id $OVS_BRIDGE bridge-id br-int" (-> Is it usefull ? Can you explain me why DevStack is using it ?)

- For debugging, of course I use Databases and tomorrow I will continue to post here the news about my work.

Thank's again Dan !

Revision history for this message
dan wendlandt (danwent) said :
#6

Hi Emiliean,

"ovs-vsctl --no-wait br-set-external-id $OVS_BRIDGE bridge-id br-int" is required for use with the NVP plugin, but not currently required for correct operation with the OVS plugin. For simplicity we try to keep a single set of instructions for setting up br-int whether its being used with the OVS plugin or the NVP plugin, which is why its included there.

Revision history for this message
Emilien Macchi (emilienm) said :
#7

Dan,

- Today I've setup DevStack and I've compared all the stuffs : I've exactly the same configuration.

- When I dump the traffic I can see something strange :

  *tape3f5c3b2-c5 sends DHCP DISCOVER

  *All eth1 (connected to br-int) can see this DHCP packet

  *gw-385e6c3d-18 can't see the DHCO packet.

On my controller / nova-network / Quantum-Server :

    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "eth1"
            Interface "eth1"
        Port "gw-385e6c3d-18"
            tag: 1
            Interface "gw-385e6c3d-18"
                type: internal

On my compute node :

    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "eth1"
            Interface "eth1"
        Port "tape3f5c3b2-c5"
            tag: 1
            Interface "tape3f5c3b2-c5"

DNSMASQ is running on controller -> http://paste.openstack.org/show/18447/

And on the controller I can see my VM IP Configuration in /var/lib/nova/networks/nova-gw-385e6c3d-18.conf :
fa:16:3e:72:48:2d,host-192.168.22.3.novalocal,192.168.22.3

In my ovs_quantum database, all the ports are ACTIVE / UP (gateway + TAP).

So... What's wrong ?

The nova.conf of my controller is it correct ? -> http://paste.openstack.org/show/18448/

Nova-compute.conf of my computes nodes : http://paste.openstack.org/show/18449/

Thank's

Revision history for this message
dan wendlandt (danwent) said :
#8

can you post the output of running the following two commands *while DHCP requests are being made*?

ovs-ofctl dump-flow br-int
ovs-dpctl dump-flows br-int

Revision history for this message
Emilien Macchi (emilienm) said :
#9
Revision history for this message
Emilien Macchi (emilienm) said :
#10

2 more precisions :

- From my last message, we don't care about 10.15.10.0/24 and the public IP.

- I connect to the VM by VNC from the dashboard, and I configure network manually. I send ping to my gateway.

From my compute which host the VM :

root@compute1:~# tcpdump -i tapc1599451-b2
16:16:00.786189 ARP, Request who-has 192.168.22.1 tell 192.168.22.6, length 28
...

From my controller :
root@controller~# tcpdump -i gw-385e6c3d-18
nothing...

Strange because my gateway is connected to the br-int :

    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "eth1"
            Interface "eth1"
        Port "gw-385e6c3d-18"
            tag: 1
            Interface "gw-385e6c3d-18"
                type: internal

Revision history for this message
Emilien Macchi (emilienm) said :
#11

I can see also the ARP request on br-int !

ovs-ofctl dump-flows br-int :

in_port(1),eth(src=fa:16:3e:11:44:d3,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.22.6,tip=192.168.22.1,op=1,sha=fa:16:3e:11:44:d3,tha=00:00:00:00:00:00), packets:409, bytes:24540, used:0.732s, actions:0

So the problem is between gw-* and br-int. What can I do more ?

Revision history for this message
dan wendlandt (danwent) said :
#12

can you do a "ovs-dpctl show" as well as "ovs-ofctl dump-flows br-int"?

Revision history for this message
Emilien Macchi (emilienm) said :
#13

root@controller:~# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=954.119s, table=0, n_packets=5498, n_bytes=575432, priority=0 actions=NORMAL
root@controller:~# ovs-dpctl show
system@br-int:
        lookups: hit:42822 missed:2962 lost:0
        flows: 11
        port 0: br-int (internal)
        port 1: eth1
        port 2: gw-385e6c3d-18 (internal)

Revision history for this message
dan wendlandt (danwent) said :
#14

I assume the output in response #11 is from "ovs-dpctl", not "ovs-ofctl", is that correct?

If so, the odd thing here is that the packet arriving on br-int does not seem to have a VLAN tag. The output above shows that "gw-385e6c3d-18" has tag=1, meaning that it will only receive packets that arrive on br-int with a VLAN tag = 1. When you see the packets arriving on eth1 (which is connected to br-int) are you seeing them tagged with VLAN 1?

If you don't see traffic arriving at the controller host NIC with VLAN tag 1, can you run "ovs-dpctl show", "ovs-dpctl dump-flows br-int" and "ovs-ofctl dump-flows br-int" on the compute host as well?

Revision history for this message
Emilien Macchi (emilienm) said :
#15

-> I assume the output in response #11 is from "ovs-dpctl", not "ovs-ofctl", is that correct?

Yes, my bad.

-> When you see the packets arriving on eth1 (which is connected to br-int) are you seeing them tagged with VLAN 1 ?

On the compute :

root@compute1:~# tcpdump -n -e -vv -ttt -i eth1 | grep 192.168.22.6
00:00:00.000000 fa:16:3e:11:44:d3 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 46: vlan 1, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.22.1 (ff:ff:ff:ff:ff:ff) tell 192.168.22.6, length 28

On the controller :

root@controller:~# tcpdump -n -e -vv -ttt -i eth1 | grep fa:16:3e:11:44:d3
00:00:00.000000 fa:16:3e:11:44:d3 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.22.1 (ff:ff:ff:ff:ff:ff) tell 192.168.22.6, length 46

(I can't see more)

-> If you don't see traffic arriving at the controller host NIC with VLAN tag 1, can you run "ovs-dpctl show", "ovs-dpctl dump-flows br-int" and "ovs-ofctl dump-flows br-int" on the compute host as well?

Here : http://paste.openstack.org/show/18454/ (you can see others packets from my network and please ignore it)

Revision history for this message
dan wendlandt (danwent) said :
#16

From those tcpdumps, it seems like packets are leaving the eth1 on compute1 tagged with VLAN 1 (correct), but arrive at eth1 of controller without a VLAN tag (incorrect).

Are you seeing any VLAN tagged traffic arriving at eth1 on the controller node? This could be an issue with the NIC being unable to receive vlan traffic, or it could be that VLAN 1 specifically is not being carried by the physical network between the compute and the controller hosts.

Revision history for this message
Emilien Macchi (emilienm) said :
#17

I can see a lot of VLAN tagged traffic on ETH1 Controller node :

root@controller:~# tcpdump -n -e -vv -ttt -i eth1
00:00:00.000000 00:10:db:ff:10:01 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 4019, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has X.X.X.X tell X.X.X.X, length 46

(...)

But not the traffic coming from my VM :-(

Revision history for this message
Emilien Macchi (emilienm) said :
#18

I have Cisco Hardware. Maybe the IOS blocks VLAN 1 by security ?

We check tomorrow the switch, and I will let you know here if something is new.

If VLAN 1 is the problem, maybe should I hack https://github.com/openstack/quantum/blob/master/quantum/plugins/openvswitch/agent/ovs_quantum_agent.py and change default VLAN ?

Revision history for this message
dan wendlandt (danwent) said :
#19

you could actually just change the field in the vlan_bindings table of the ovs_quantum database on the controller node to be something else. My guess is that VLAN 1 is not being trunked by your physical network.

Revision history for this message
askstack (askstack) said :
#20

Emilien

Have you tried using a ethernet directly connecting the two eth1 ports? This way it will by pass the switch and no packets will get dropped.

Revision history for this message
Emilien Macchi (emilienm) said :
#21

@Dan :

I change to VLAN 7 but same :

root@compute1:~# tcpdump -n -e -vv -ttt -i eth1 | grep "vlan 7"
00:00:00.254417 fa:16:3e:38:08:c8 > 33:33:00:00:00:02, ethertype 802.1Q (0x8100), length 74: vlan 7, p 0, ethertype IPv6, (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::f816:3eff:fe38:8c8 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16

root@controller:~# tcpdump -n -e -vv -ttt -i eth1 | grep "vlan 7"

... nothing

@asktask :

No I didn't yet. I use 2 computes nodes and 1 controller. I want to stay in this configuration and to solve my issues with real hardware.

Revision history for this message
Emilien Macchi (emilienm) said :
#22

Following to my last message, you can see I'm using VLAN 7 now :

root@controller:~# ovs-vsctl show
    Bridge br-int
        Port "eth1"
            Interface "eth1"
        Port br-int
            Interface br-int
                type: internal
        Port "gw-48f95c51-8d"
            tag: 7
            Interface "gw-48f95c51-8d"
                type: internal

root@compute1:~# ovs-vsctl show
    Bridge br-int
        Port "eth1"
            Interface "eth1"
        Port br-int
            Interface br-int
                type: internal
        Port "tap6fb71cab-af"
            tag: 7
            Interface "tap6fb71cab-af"

Revision history for this message
Emilien Macchi (emilienm) said :
#23

If I resume the situation :

root@controller:~# tcpdump -nnei eth1 | grep fa:16:3e:38:08:c8
22:15:00.567171 fa:16:3e:38:08:c8 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 342: 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:38:08:c8, length 300

root@controller:~# tcpdump -nnei gw-48f95c51-8d
Nothing...

root@compute1:~# tcpdump -nnei eth1 | grep fa:16:3e:38:08:c8
22:14:24.537103 fa:16:3e:38:08:c8 > 33:33:00:00:00:16, ethertype IPv6 (0x86dd), length 90: :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28

Notes :
- gw-48f95c51-8d is my gateway hosted on controller (also Nova-Network & Quantum-Server)
- fa:16:3e:38:08:c8 is the MAC of my VMs wich request an IP address.

OVS conf : #22

So I think we can say the issue is on the controller between gw-* & eth1 on br-int or with the physical switch (investigation in progress)

Revision history for this message
Emilien Macchi (emilienm) said :
#24

Problem fixed.

I change the VLAN_ID in the database, restart the agent and it's working now.

Dan, do think it's a good idea to change VLAN_MIN = 1 to VLAN_MIN = 2 ?

Thank's for your help, I've learnt a lot of stuffs.

Revision history for this message
Emilien Macchi (emilienm) said :
#25

Also what do you think to add a new flag in ovs_quantum_plugin.ini in which we can specify a value for the native VLAN ?

Revision history for this message
dan wendlandt (danwent) said :
#26

Yeah, assuming all VLANs are available is a bad idea in general. For a long time we've been meaning to add a configuration option to the ovs plugin config to let the user specify VLAN_MIN and VLAN_MAX.

Revision history for this message
dan wendlandt (danwent) said :
#27

that would be great. i created a bug here: https://bugs.launchpad.net/quantum/+bug/1012223

one thing to be aware of is that there is a review almost complete for quantum to switch over to the new openstack.common config, so it might be best to wait until those config related changes are in: https://review.openstack.org/#/c/8101/

Revision history for this message
tiadobatima (gbaratto-3) said :
#28

Hi Dan, Emilien...

Check out this cisco global setting:

vlan dot1q tag native

If your native vlan is "1" and "vlan dot1q tag native" = false the packet going out of the port is gonna have its tag removed.

Dan, it would be great if we could specify various ranges of vlans, instead of just min, and max, so, we can better utilise a switch with existing vlans. Something like:

Allowed_vlans = 3-5,8,10,500-2000

Cheers,
g.

Revision history for this message
Emilien Macchi (emilienm) said :
#29

Hi,

The discussion is here : https://bugs.launchpad.net/quantum/+bug/1012223

It will be handled as part of a blueprint (https://blueprints.launchpad.net/quantum/+spec/provider-networks)

Regards